2 个不稳定版本
0.2.0 | 2023 年 5 月 13 日 |
---|---|
0.1.0 | 2023 年 5 月 12 日 |
7 在 #archiver
每月 24 次下载
41KB
790 代码行
lolchive
本地临时的页面存档器
目前还不支持 Windows
这个将把网页保存到您指定的路径上,所以
google.com/path/to/this
是
google.com/
|_/path
|_/to
|_/this
|_/date
|/css
|/images
|/js
|_index.html
将是要保存的文件夹路径。
使用
fantoccini 存档器使用 fantoccini,用于这些目的使用 geckodriver,基本的存档器只使用 reqwest
FantocciniArchiver
use lolchive::web_archiver::FantocciniArchiver
use dirs;
let url = "https://www.merriam-webster.com/dictionary/fantoccini";
//use the connection string to pass in, this is where geckodriver is running
let connection_string = "https://127.0.0.1:4444";
//set up absolute pathe to where you want it to store archive
let home_dir = dirs::home_dir().expect("Failed to get home directory");
let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
//create archiver
let archiver = FantocciniArchiver::new(connection_string).await;
//archive
let path = archiver.create_archive(url, &new_dir).await;
//path to the archive returned
println!("{:?}", path);
//close archiver
let _ = archiver.close().await;
基本存档器基本的存档器只使用 reqwest
use lolchive::web_archiver::BasicArchiver
use dirs;
let url = "https://rust-lang.net.cn/";
let home_dir = dirs::home_dir().expect("Failed to get home directory");
let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
println!("{:?}", new_dir);
let path = BasicArchiver::create_archive(url, &new_dir).await;
println!("{:?}", path);
爬虫
Fantoccini 爬虫 - 使用 fantoccini 和 gecko WebDriver
use lolchive::crawler::FantocciniCrawler;
use dirs;
let url = "https://en.wikipedia.org/wiki/Rust_(programming_language)";
let connection_string = "https://127.0.0.1:4444";
let home_dir = dirs::home_dir().expect("Failed to get home directory");
let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
let fcrawler = FantocciniCrawler::new(connection_string).await.unwrap();
let paths = fcrawler.save_crawl(url, &new_dir, 2).await.unwrap();
let _ = fcrawler.close().await;
println!("{:?}", paths);
assert!(paths.len() == 2);
基本爬虫 - 使用 reqwest
use lolchive::crawler::BasicCrawler;
use dirs;
let url = "https://rust-lang.net.cn/";
let home_dir = dirs::home_dir().expect("Failed to get home directory");
let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
let paths = BasicCrawler::save_crawl(url, &new_dir, 2).await.unwrap();
println!("{:?}", paths);
assert!(paths.len() == 2);
依赖项
27–44MB
~514K SLoC