1个不稳定版本
0.1.2 | 2023年5月27日 |
---|
#9 in #finds
16KB
328 行
网络爬虫
找到网站上的每个页面、图像和脚本(并下载它)
用法
Rust Web Crawler
Usage: web-crawler [OPTIONS] <URL>
Arguments:
<URL>
Options:
-d, --download
Download all files
-c, --crawl-external
Whether or not to crawl other websites it finds a link to. Might result in downloading the entire internet
-m, --max-url-length <MAX_URL_LENGTH>
Maximum url length it allows. Will ignore page it url length reaches this limit [default: 300]
-e, --exclude <EXCLUDE>
Will ignore paths that start with these strings (comma-seperated)
--export <EXPORT>
Where to export found URLs
--export-internal <EXPORT_INTERNAL>
Where to export internal URLs
--export-external <EXPORT_EXTERNAL>
Where to export external URLs
-t, --timeout <TIMEOUT>
Timeout between requests in milliseconds [default: 100]
-h, --help
Print help
-V, --version
Print version
如何自行编译
- 下载rust
- 输入
cargo build -r
- 可执行文件位于
target/release
依赖项
~7–24MB
~326K SLoC