#category #url #web-page #js #tech #analysis #fetch

bin+lib wappalyzer

识别网页上使用的科技

1 个不稳定版本

0.1.0 2020年1月14日

#13 in #tech

MIT/Apache

91KB
392

wappalyzer

这个包将从url抓取内容,并使用Wappalyzer项目中的定义来识别技术。

与JS(主)库相比,Rust版本有限,因为我们不是在无头浏览器中运行规则,而是只针对主页最初返回的HTML。它更像Go端口

权衡的是速度和效率,因为这样可以对大量网页运行得更加高效。

let url = Url::parse(&String::from("http://google.com"))?;
let res = wappalyzer::scan(url).await;
println!("{:?}", res);

// Analysis { url: "http://google.com/", result: Ok([Tech { category: "Web Servers", 
// name: "Google Web Server" }, Tech { category: "JavaScript Frameworks", name: "ExtJS" }
//, Tech { category: "JavaScript Libraries", name: "List.js" }]) }

或者从可执行文件

> cargo run cargo run http://google.com/ | jq
{
  "url": "http://google.com/",
  "result": {
    "Ok": [
      {
        "category": "Web Servers",
        "name": "Google Web Server"
      },
      {
        "category": "JavaScript Libraries",
        "name": "List.js"
      },
      {
        "category": "JavaScript Frameworks",
        "name": "ExtJS"
      }
    ]
  }
}```

or given a list of domains in a file:
```bash
> cat urls.list
http://google.com/
http://bbc.com/
...
http://cnn.com/

> cat urls.list | cargo run
{"url":"http://google.com/","result":{"Ok":[{"category":"JavaScript Frameworks","name":"ExtJS"},{"category":"Web Servers","name":"Google Web Server"},{"category":"JavaScript Libraries","name":"List.js"}]}}
{"url":"http://bbc.com/","result":{"Ok":[{"category":"Tag Managers","name":"Google Tag Manager"},{"category":"Analytics","name":"Chartbeat"},{"category":"JavaScript Frameworks","name":"React"},{"category":"Cache Tools","name":"Varnish"},{"category":"Web Servers","name":"Apache"},{"category":"Issue Trackers","name":"Atlassian Jira"},{"category":"Analytics","name":"GrowingIO"},{"category":"JavaScript Libraries","name":"List.js"},{"category":"JavaScript Graphics","name":"Chart.js"},{"category":"Analytics","name":"Optimizely"},{"category":"Analytics","name":"Segment"}]}}
{"url":"http://cnn.com/","result":{"Ok":[{"category":"JavaScript Frameworks","name":"ExtJS"},{"category":"JavaScript Frameworks","name":"Twitter Flight"},{"category":"JavaScript Frameworks","name":"Riot"},{"category":"Advertising Networks","name":"Criteo"},{"category":"Analytics","name":"Chartbeat"},{"category":"Analytics","name":"GoSquared"},{"category":"JavaScript Libraries","name":"Moment.js"},{"category":"Ecommerce","name":"Magento"},{"category":"JavaScript Frameworks","name":"React"},{"category":"Cache Tools","name":"Varnish"},{"category":"Analytics","name":"GrowingIO"},{"category":"JavaScript Libraries","name":"List.js"},{"category":"JavaScript Graphics","name":"Chart.js"},{"category":"Comment Systems","name":"Livefyre"},{"category":"Analytics","name":"Optimizely"},{"category":"Analytics","name":"Segment"}]}}

状态

开发中。

待办事项

备注

依赖

~14–19MB
~380K SLoC