使用旧Rust 2015
0.1.1 | 2018年4月12日 |
0.1.0 | 2018年4月12日 |
658 代码行
crates.io | docs.rs | github.com
使用wkhtmltopdf。因此,要使用它,您需要在您的机器上安装wkhtmltopdf。使用Homebrew在macOS上安装wkhtmltopdf非常简单。只需在终端中输入brew install Caskroom/cask/wkhtmltopdf
即可。对于其他系统或如果您没有Homebrew,您需要自己安装wkhtmltopdf,但也许在某个时候我会查找如何在不同的设置中安装它的说明并将其包括在这里。至于版本,我仅在wkhtmltopdf 0.12.4上进行了测试。
extern crate reqwest;
extern crate urls2disk;
use std::fs;
use std::path::Path;
use urls2disk::{wkhtmltopdf, ClientBuilder, Result, SimpleDocument, Url};
// This function will download Apple, Inc.'s annual reports for the years 2010 to 2017
// from the SEC's website to your disk. It will download two copies of each annual
// report: one of just the raw html and another that has been converted to PDF.
fn run() -> Result<()> {
// Create an output directory.
let output_directory = Path::new("./data");
if !output_directory.exists() {
// Create a vector of urls we would like to download.
// These urls represent the annual reports for Apple, Inc. from 2010 to 2017.
let base = "https://www.sec.gov/Archives/edgar/data/";
let urls = vec![
.map(|stem| format!("{}{}", &base, stem))
// Turn the vector of urls into a vector of boxed Document trait objects (here we'll
// be using the SimpleDocument struct as one possible implementer of the Document trait).
// For this batch, we set the wkhtmltopdf option to false; so when we feed this list
// to the Client it will just download the raw webpages in html format instead of
// first converting them to PDF.
let html_documents = urls.iter()
.map(|(i, url_string)| {
let filename = format!("Apple 10-K {}.html", i + 2010);
let path = output_directory.join(&filename);
let url = url_string.parse::<Url>()?;
let wkhtmltopdf = false;
let document = SimpleDocument::new(path, url, wkhtmltopdf);
// Turn the vector of urls into another vector of boxed Document trait objects
// (to show off additional functionality). This time we'll set the wkhtmltopdf
// option to true; so when we feed this list to the Client it will first convert
// the wepages to PDF before writing them to disk.
let pdf_documents = urls.iter()
.map(|(i, url_string)| {
let filename = format!("Apple 10-K {}.pdf", i + 2010);
let path = output_directory.join(&filename);
let url = url_string.parse::<Url>()?;
let wkhtmltopdf = true;
let document = SimpleDocument::new(path, url, wkhtmltopdf);
// Combine our two vectors into one vector of Box<SimpleDocument>.
let mut documents = [&html_documents[..], &pdf_documents[..]].concat();
// Create the client.
// Here, we're showing several customization options, but if you want to use
// just the default settings, you could simply build the client with
// `let client = ClientBuilder::default().build()?;`
let client = ClientBuilder::default()
// Let the client go. It will download and write to disk all the
// documents while simultaneously respecting the 'requests per second' and
// other limits we provided. If you already have the documents on disk,
// the client will not redownload them.
client.get_documents(&mut documents)?;
// Note: Here, if you want to, you can now access the raw bytes of all the urls
// you downloaded, since they are now stored on each SimpleDocument in addition
// to being saved on your disk.
fn main() {
~414K SLoC