4 个版本 (重大更改)
0.4.0 | 2022年1月3日 |
---|---|
0.3.0 | 2021年9月15日 |
0.2.0 | 2021年9月11日 |
0.1.0 | 2019年5月20日 |
#1332 in 命令行工具
每月452次下载
24KB
438 行
htmlq
类似于 jq
,但用于 HTML。使用 CSS 选择器 从 HTML 文件中提取内容片段。
安装
Cargo
cargo install htmlq
Homebrew
brew install htmlq
用法
$ htmlq -h
htmlq 0.4.0
Michael Maclean <[email protected]>
Runs CSS selectors on HTML
USAGE:
htmlq [FLAGS] [OPTIONS] [--] [selector]...
FLAGS:
-B, --detect-base Try to detect the base URL from the <base> tag in the document. If not found, default to
the value of --base, if supplied
-h, --help Prints help information
-w, --ignore-whitespace When printing text nodes, ignore those that consist entirely of whitespace
-p, --pretty Pretty-print the serialised output
-t, --text Output only the contents of text nodes inside selected elements
-V, --version Prints version information
OPTIONS:
-a, --attribute <attribute> Only return this attribute (if present) from selected elements
-b, --base <base> Use this URL as the base for links
-f, --filename <FILE> The input file. Defaults to stdin
-o, --output <FILE> The output file. Defaults to stdout
-r, --remove-nodes <SELECTOR>... Remove nodes matching this expression before output. May be specified multiple
times
ARGS:
<selector>... The CSS expression to select [default: html]
$
示例
使用 cURL 通过 ID 查找页面的一部分
$ curl --silent https://rust-lang.net.cn/ | htmlq '#get-help'
<div class="four columns mt3 mt0-l" id="get-help">
<h4>Get help!</h4>
<ul>
<li><a href="https://doc.rust-lang.net.cn">Documentation</a></li>
<li><a href="https://users.rust-lang.org">Ask a Question on the Users Forum</a></li>
<li><a href="http://ping.rust-lang.org">Check Website Status</a></li>
</ul>
<div class="languages">
<label class="hidden" for="language-footer">Language</label>
<select id="language-footer">
<option title="English (US)" value="en-US">English (en-US)</option>
<option title="French" value="fr">Français (fr)</option>
<option title="German" value="de">Deutsch (de)</option>
</select>
</div>
</div>
查找页面中的所有链接
$ curl --silent https://rust-lang.net.cn/ | htmlq --attribute href a
/
/tools/install
/learn
/tools
/governance
/community
https://blog.rust-lang.net.cn/
/learn/get-started
https://blog.rust-lang.net.cn/2019/04/25/Rust-1.34.1.html
https://blog.rust-lang.net.cn/2018/12/06/Rust-1.31-and-rust-2018.html
[...]
获取帖子的文本内容
$ curl --silent https://nixos.org/nixos/about.html | htmlq --text .main
About NixOS
NixOS is a GNU/Linux distribution that aims to
improve the state of the art in system configuration management. In
existing distributions, actions such as upgrades are dangerous:
upgrading a package can cause other packages to break, upgrading an
entire system is much less reliable than reinstalling from scratch,
you can’t safely test what the results of a configuration change will
be, you cannot easily undo changes to the system, and so on. We want
to change that. NixOS has many innovative features:
[...]
在输出前删除节点
这个页面有一个我不需要的大 SVG 图片,所以这里是删除它的方法。
$ curl --silent https://nixos.org/ | ./target/debug/htmlq '.whynix' --remove-nodes svg
<ul class="whynix">
<li>
<h2>Reproducible</h2>
<p>
Nix builds packages in isolation from each other. This ensures that they
are reproducible and don't have undeclared dependencies, so <strong>if a
package works on one machine, it will also work on another</strong>.
</p>
</li>
<li>
<h2>Declarative</h2>
<p>
Nix makes it <strong>trivial to share development and build
environments</strong> for your projects, regardless of what programming
languages and tools you’re using.
</p>
</li>
<li>
<h2>Reliable</h2>
<p>
Nix ensures that installing or upgrading one package <strong>cannot
break other packages</strong>. It allows you to <strong>roll back to
previous versions</strong>, and ensures that no package is in an
inconsistent state during an upgrade.
</p>
</li>
</ul>
美化打印 HTML
(这是一个正在进行中的工作)
$ curl --silent https://mgdm.net | htmlq --pretty '#posts'
<section id="posts">
<h2>I write about...
</h2>
<ul class="post-list">
<li>
<time datetime="2019-04-29 00:%i:1556496000" pubdate="">
29/04/2019</time><a href="/weblog/nettop/">
<h3>Debugging network connections on macOS with nettop
</h3></a>
<p>Using nettop to find out what network connections a program is trying to make.
</p>
</li>
[...]
使用 bat
进行语法高亮
$ curl --silent example.com | htmlq 'body' | bat --language html
依赖项
~4–11MB
~127K SLoC