#markdown-converter #html #markdown #converter #convert-html

htmd

受turndown.js启发的HTML到Markdown转换器

7个版本

0.1.6 2024年7月11日
0.1.5 2024年6月27日

735文本处理

Download history 145/week @ 2024-06-12 626/week @ 2024-06-19 616/week @ 2024-06-26 374/week @ 2024-07-03 652/week @ 2024-07-10 246/week @ 2024-07-17 349/week @ 2024-07-24 220/week @ 2024-07-31 320/week @ 2024-08-07 252/week @ 2024-08-14

1,194 每月下载量
3 个crate中使用 (2 个直接使用)

Apache-2.0

285KB
1K SLoC

htmd

crates.io version

Rust的HTML到Markdown转换器,受turndown.js启发。

特性

  • 丰富的选项,与turndown.js相同
  • 可靠,它通过了turndown.js的所有测试用例
  • 最小依赖,它仅使用html5ever
  • 快速,它在一个i5 7代CPU上转换1.37MB的维基百科页面需要约70ms(参见基准README

寻找命令行工具?现在试试htmd-cli

用法

添加依赖项

htmd = "0.1"

基本用法

fn main() {
    assert_eq!("# Heading", htmd::convert("<h1>Heading</h1>").unwrap());
}

跳过标签

use htmd::HtmlToMarkdown;

let converter = HtmlToMarkdown::builder()
    .skip_tags(vec!["script", "style"])
    .build();
assert_eq!("", converter.convert("<script>let x = 0;</script>").unwrap());

选项

use htmd::{options::Options, HtmlToMarkdown};

let converter = HtmlToMarkdown::builder()
    .options(Options {
        heading_style: htmd::options::HeadingStyle::Setex,
        ..Default::default()
    })
    .build();
assert_eq!("Heading\n=======", converter.convert("<h1>Heading</h1>").unwrap());

自定义标签处理器

use htmd::HtmlToMarkdown;

let converter = HtmlToMarkdown::builder()
    .add_handler(vec!["svg"], |_: Element| Some("[Svg Image]".to_string()))
    .build();
assert_eq!("[Svg Image]", converter.convert("<svg></svg>").unwrap());

多线程

当只使用内置标签处理器时,您可以安全地在多个线程之间共享HtmlToMarkdown

let converter = Arc::new(HtmlToMarkdown::new());

for _ in 0..10 {
    let converter_clone = converter.clone();
    let handle = std::thread::spawn(move || {
        let md = converter_clone.convert("<h1>Hello</h1>").unwrap();
    });
}

如果您有非无状态的自定义标签处理器,您可能需要一个线程安全机制。例如,请参阅AnchorElementHandler

鸣谢

许可证

Copyright 2024 letmutex

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

依赖项

~1.5–6.5MB
~34K SLoC