ruby html+css->pdf

Posted by wxianfeng Tue, 19 Oct 2010 21:48:00 GMT

环境:centos 5.5 + ruby 1.8.7 + pdfkit 0.4.6

最近给我的blog生成了pdf文档,每半年生成一个pdf文档,尝试不少开源组件,发现好用的不多,其中比较好的两个事prawnpdfkit ,

最后实验下来pdfkit 很好用,可以把html + css 转化为pdf文档,底层使用了wkhtmltopdf , 而且wkhtmltopdf可以为shell直接调用,非常之方便

1,安装pdfkit

>gem install pdfkit

>sudo pdfkit --install-wkhtmltopdf

报 lzcat 找不到 , 安装之

>yum update
>yum install lzma

2,生成pdf文档的脚本

#注意 http://wxianfeng.com 必须存在 a 链接 , 因为 wkhtmltopdf 可以直接对 url 抓取生成 pdf

require 'rubygems'
require 'pdfkit/source'  # require "pdfkit" 报错,提示找不到PDFKit 
require 'pdfkit/pdfkit'
require 'pdfkit/middleware'
require 'pdfkit/configuration'

PDFKit.configure do |config|  
  config.wkhtmltopdf = '/usr/local/bin/wkhtmltopdf'
end

range_t = [
  ["2009-06-30","2009-12-12 23:59:59"],
  ["2009-12-12","2010-06-31 23:59:59"],
  ["2010-06-31","2010-12-12 23:59:59"]
]

path = "/usr/local/system/src/wxianfeng_com_pdf/"
exist_files = Dir.open(path).to_a.select{|x| x != '.' &&  x!= '..' && x != '.svn' && x != 'Thumbs.db'}

range_t.each do |ele|
  next if exist_files.include?("wxianfeng.com_#{ele.first}~#{ele.last.slice(/\d+-\d+-\d+/)}.pdf")
  posts = Content.all(:conditions=>["published_at BETWEEN ? AND ?",ele.first,ele.last])
  kit , html = nil , ''
  posts.each do |i|
    p i.published_at.to_s(:db) + " " +i.title
    html << "<strong>" + i.title + "</strong><br/><br/>" + i.html(:all).gsub(/[\s\n<br\/>]([a-zA-z]+:\/\/[^\s<>"]*)/,'<a href="\1">\1</a>') + "<br/><br/><br/><br/>"
  end  
  kit = PDFKit.new(html)
  # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/main.css"
  # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/print.css"
  # kit.stylesheets << "#{RAILS_ROOT}/themes/lindholmen/stylesheets/local.css"
  kit.to_pdf
  kit.to_file "/usr/local/system/src/wxianfeng_com_pdf/wxianfeng.com_#{ele.first}~#{ele.last.slice(/\d+-\d+-\d+/)}.pdf"
end

3,运行

ruby script/runner script/tools/generate_pdf.rb

注意script/runner 调用的事development指定的db