ruby html+css->pdf

Posted by wxianfeng Tue, 19 Oct 2010 21:48:00 GMT

环境:centos 5.5 + ruby 1.8.7 + pdfkit 0.4.6

最近给我的blog生成了pdf文档,每半年生成一个pdf文档,尝试不少开源组件,发现好用的不多,其中比较好的两个事prawnpdfkit ,

最后实验下来pdfkit 很好用,可以把html + css 转化为pdf文档,底层使用了wkhtmltopdf , 而且wkhtmltopdf可以为shell直接调用,非常之方便

1,安装pdfkit

>gem install pdfkit

>sudo pdfkit --install-wkhtmltopdf

报 lzcat 找不到 , 安装之

>yum update
>yum install lzma

2,生成pdf文档的脚本

#注意 http://wxianfeng.com 必须存在 a 链接 , 因为 wkhtmltopdf 可以直接对 url 抓取生成 pdf

require 'rubygems'
require 'pdfkit/source'  # require "pdfkit" 报错,提示找不到PDFKit 
require 'pdfkit/pdfkit'
require 'pdfkit/middleware'
require 'pdfkit/configuration'

PDFKit.configure do |config|  
  config.wkhtmltopdf = '/usr/local/bin/wkhtmltopdf'
end

range_t = [
  ["2009-06-30","2009-12-12 23:59:59"],
  ["2009-12-12","2010-06-31 23:59:59"],
  ["2010-06-31","2010-12-12 23:59:59"]
]

path = "/usr/local/system/src/wxianfeng_com_pdf/"
exist_files = Dir.open(path).to_a.select{|x| x != '.' &&  x!= '..' && x != '.svn' && x != 'Thumbs.db'}

range_t.each do |ele|
  next if exist_files.include?("wxianfeng.com_#{ele.first}~#{ele.last.slice(/\d+-\d+-\d+/)}.pdf")
  posts = Content.all(:conditions=>["published_at BETWEEN ? AND ?",ele.first,ele.last])
  kit , html = nil , ''
  posts.each do |i|
    p i.published_at.to_s(:db) + " " +i.title
    html << "<strong>" + i.title + "</strong><br/><br/>" + i.html(:all).gsub(/[\s\n<br\/>]([a-zA-z]+:\/\/[^\s<>"]*)/,'<a href="\1">\1</a>') + "<br/><br/><br/><br/>"
  end  
  kit = PDFKit.new(html)
  # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/main.css"
  # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/print.css"
  # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/local.css"
  kit.to_pdf
  kit.to_file "/usr/local/system/src/wxianfeng_com_pdf/wxianfeng.com_#{ele.first}~#{ele.last.slice(/\d+-\d+-\d+/)}.pdf"
end

3,运行

ruby script/runner script/tools/generate_pdf.rb

注意script/runner 调用的事development指定的db


mysql 中文乱码解决

Posted by wxianfeng Tue, 19 Oct 2010 21:11:00 GMT

环境:centos 5.5 + mysql 5.0

最近给我的blog生成了pdf文档,发现上了服务器后生成的pdf中文都是乱码,本地都是好的,后来发现我进入mysql 的console中select出来的中文都是乱码,解决之:

查看字符编码:

mysql> SHOW VARIABLES LIKE 'charac%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | utf8                       | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

这个是正确的设置,原来查看出来的是latin1,即mysql自带的编码集,发现rails中只要你database.yml中设置了encoding:utf8就可以了,哪怕mysql的character_set_*是latin1,那么你存入数据库中的中文也不是乱码,so,rails中设置encoding非常之重要

那么如何解决mysql console中乱码,修改配置文件:

>vim /etc/my.cnf

[mysqld]
default-character-set = utf8

[client]
default-character-set = utf8

mysqld 和 clcient 区域中配置上面的两句即可

然后重启mysql

如果出现 unknown variable 'default-character-set=utf8' ,可能是 mysql版本是 5.5 以上, 在 mysqld 部分改为:

character_set_server=utf8