#archiver #local #page #webpage #html #image #liminal

lolchive

本地临时的网页存档器

2 个不稳定版本

0.2.0 2023 年 5 月 13 日
0.1.0 2023 年 5 月 12 日

7#archiver

每月 24 次下载

MIT 许可证

41KB
790 代码行

lolchive

本地临时的页面存档器

目前还不支持 Windows

这个将把网页保存到您指定的路径上,所以

google.com/path/to/this

google.com/
        |_/path
            |_/to
                |_/this
                    |_/date
                        |/css
                        |/images
                        |/js
                        |_index.html

将是要保存的文件夹路径。

使用

fantoccini 存档器使用 fantoccini,用于这些目的使用 geckodriver,基本的存档器只使用 reqwest

FantocciniArchiver

    use lolchive::web_archiver::FantocciniArchiver
    use dirs;

    let url = "https://www.merriam-webster.com/dictionary/fantoccini";

    //use the connection string to pass in, this is where geckodriver is running
    let connection_string = "https://127.0.0.1:4444";

    //set up absolute pathe to where you want it to store archive
    let home_dir = dirs::home_dir().expect("Failed to get home directory");
    let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");

    //create archiver
    let archiver = FantocciniArchiver::new(connection_string).await;

    //archive
    let path = archiver.create_archive(url, &new_dir).await;

    //path to the archive returned
    println!("{:?}", path);

    //close archiver
    let _ = archiver.close().await;

基本存档器基本的存档器只使用 reqwest

    use lolchive::web_archiver::BasicArchiver
    use dirs;

    let url = "https://rust-lang.net.cn/";
    let home_dir = dirs::home_dir().expect("Failed to get home directory");
    let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
    println!("{:?}", new_dir);
    let path = BasicArchiver::create_archive(url, &new_dir).await;
    println!("{:?}", path);

爬虫

Fantoccini 爬虫 - 使用 fantoccini 和 gecko WebDriver

            use lolchive::crawler::FantocciniCrawler;
            use dirs;
            
            let url = "https://en.wikipedia.org/wiki/Rust_(programming_language)";
            let connection_string = "https://127.0.0.1:4444";
            let home_dir = dirs::home_dir().expect("Failed to get home directory");
            let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
            let fcrawler = FantocciniCrawler::new(connection_string).await.unwrap();
            let paths = fcrawler.save_crawl(url, &new_dir, 2).await.unwrap();
            let _ = fcrawler.close().await;

            println!("{:?}", paths);
            assert!(paths.len() == 2);

基本爬虫 - 使用 reqwest

            use lolchive::crawler::BasicCrawler;
            use dirs;

            let url = "https://rust-lang.net.cn/";
            let home_dir = dirs::home_dir().expect("Failed to get home directory");
            let new_dir = format!("{}{}", home_dir.to_str().unwrap(), "/Projects/archive_test");
            let paths = BasicCrawler::save_crawl(url, &new_dir, 2).await.unwrap();

            println!("{:?}", paths);
            assert!(paths.len() == 2);

依赖项

27–44MB
~514K SLoC