4 个版本
0.23.1 | 2022 年 5 月 30 日 |
---|---|
0.23.0 | 2022 年 5 月 8 日 |
0.23.0-alpha3 | 2022 年 5 月 2 日 |
0.22.0 | 2022 年 5 月 2 日 |
#87 在 #writer
50,828 每月下载量
用于 4 个 crate (2 直接)
455KB
9K SLoC
quick-xml 已恢复并重新活跃。请使用它
fast-xml -- quick-xml 的继任者
高性能 XML 拉取读取器和写入器。
读取器
- 几乎零拷贝(尽可能使用
Cow
) - 内存分配容易(API 提供了一种重用缓冲区的方法)
- 支持各种编码(带有
encoding
功能)、命名空间解析、特殊字符。
语法灵感来源于 xml-rs。
从 quick-xml 迁移
如果您正在使用 quick-xml 0.22.0 或 0.23.0-alpha3,您只需在您的 Cargo.toml
中将 quick-xml
替换为 fast-xml
。将您的代码库中的每个 quick_xml
crate 名称替换为 fast_xml
。
fast-xml 的这两个版本是专门为迁移制作的,它们包含与原始 quick-xml 相同的代码,除了更新了测试、benches 和示例中的 cargo 元数据和 extern crate 名称。
示例
读取器
use fast_xml::Reader;
use fast_xml::events::Event;
let xml = r#"<tag1 att1 = "test">
<tag2><!--Test comment-->Test</tag2>
<tag2>
Test 2
</tag2>
</tag1>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut count = 0;
let mut txt = Vec::new();
let mut buf = Vec::new();
// The `Reader` does not implement `Iterator` because it outputs borrowed data (`Cow`s)
loop {
// NOTE: this is the generic case when we don't know about the input BufRead.
// when the input is a &str or a &[u8], we don't actually need to use another
// buffer, we could directly call `reader.read_event_unbuffered()`
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) => {
match e.name() {
b"tag1" => println!("attributes values: {:?}",
e.attributes().map(|a| a.unwrap().value).collect::<Vec<_>>()),
b"tag2" => count += 1,
_ => (),
}
},
Ok(Event::Text(e)) => txt.push(e.unescape_and_decode(&reader).unwrap()),
Ok(Event::Eof) => break, // exits the loop when reaching end of file
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
_ => (), // There are several other `Event`s we do not consider here
}
// if we don't keep a borrow elsewhere, we can clear the buffer to keep memory usage low
buf.clear();
}
写入器
use fast_xml::Writer;
use fast_xml::Reader;
use fast_xml::events::{Event, BytesEnd, BytesStart};
use std::io::Cursor;
use std::iter;
let xml = r#"<this_tag k1="v1" k2="v2"><child>text</child></this_tag>"#;
let mut reader = Reader::from_str(xml);
reader.trim_text(true);
let mut writer = Writer::new(Cursor::new(Vec::new()));
let mut buf = Vec::new();
loop {
match reader.read_event(&mut buf) {
Ok(Event::Start(ref e)) if e.name() == b"this_tag" => {
// crates a new element ... alternatively we could reuse `e` by calling
// `e.into_owned()`
let mut elem = BytesStart::owned(b"my_elem".to_vec(), "my_elem".len());
// collect existing attributes
elem.extend_attributes(e.attributes().map(|attr| attr.unwrap()));
// copy existing attributes, adds a new my-key="some value" attribute
elem.push_attribute(("my-key", "some value"));
// writes the event to the writer
assert!(writer.write_event(Event::Start(elem)).is_ok());
},
Ok(Event::End(ref e)) if e.name() == b"this_tag" => {
assert!(writer.write_event(Event::End(BytesEnd::borrowed(b"my_elem"))).is_ok());
},
Ok(Event::Eof) => break,
// you can use either `e` or `&e` if you don't want to move the event
Ok(e) => assert!(writer.write_event(&e).is_ok()),
Err(e) => panic!("Error at position {}: {:?}", reader.buffer_position(), e),
}
buf.clear();
}
let result = writer.into_inner().into_inner();
let expected = r#"<my_elem k1="v1" k2="v2" my-key="some value"><child>text</child></my_elem>"#;
assert_eq!(result, expected.as_bytes());
Serde
当使用 serialize
功能时,fast-xml 可以与 serde 的 Serialize
/Deserialize
特性一起使用。
以下是一个反序列化 crates.io 源的示例
// Cargo.toml
// [dependencies]
// serde = { version = "1.0", features = [ "derive" ] }
// fast-xml = { version = "0.22", features = [ "serialize" ] }
use serde::Deserialize;
use fast_xml::de::{from_str, DeError};
#[derive(Debug, Deserialize, PartialEq)]
struct Link {
rel: String,
href: String,
sizes: Option<String>,
}
#[derive(Debug, Deserialize, PartialEq)]
#[serde(rename_all = "lowercase")]
enum Lang {
En,
Fr,
De,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Head {
title: String,
#[serde(rename = "link", default)]
links: Vec<Link>,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Script {
src: String,
integrity: String,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Body {
#[serde(rename = "script", default)]
scripts: Vec<Script>,
}
#[derive(Debug, Deserialize, PartialEq)]
struct Html {
lang: Option<String>,
head: Head,
body: Body,
}
fn crates_io() -> Result<Html, DeError> {
let xml = "<!DOCTYPE html>
<html lang=\"en\">
<head>
<meta charset=\"utf-8\">
<meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">
<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">
<title>crates.io: Rust Package Registry</title>
<!-- EMBER_CLI_FASTBOOT_TITLE --><!-- EMBER_CLI_FASTBOOT_HEAD -->
<link rel=\"manifest\" href=\"/manifest.webmanifest\">
<link rel=\"apple-touch-icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" sizes=\"227x227\">
<link rel=\"stylesheet\" href=\"/assets/vendor-8d023d47762d5431764f589a6012123e.css\" integrity=\"sha256-EoB7fsYkdS7BZba47+C/9D7yxwPZojsE4pO7RIuUXdE= sha512-/SzGQGR0yj5AG6YPehZB3b6MjpnuNCTOGREQTStETobVRrpYPZKneJwcL/14B8ufcvobJGFDvnTKdcDDxbh6/A==\" >
<link rel=\"stylesheet\" href=\"/assets/cargo-cedb8082b232ce89dd449d869fb54b98.css\" integrity=\"sha256-S9K9jZr6nSyYicYad3JdiTKrvsstXZrvYqmLUX9i3tc= sha512-CDGjy3xeyiqBgUMa+GelihW394pqAARXwsU+HIiOotlnp1sLBVgO6v2ZszL0arwKU8CpvL9wHyLYBIdfX92YbQ==\" >
<link rel=\"shortcut icon\" href=\"/favicon.ico\" type=\"image/x-icon\">
<link rel=\"icon\" href=\"/cargo-835dd6a18132048a52ac569f2615b59d.png\" type=\"image/png\">
<link rel=\"search\" href=\"/opensearch.xml\" type=\"application/opensearchdescription+xml\" title=\"Cargo\">
</head>
<body>
<!-- EMBER_CLI_FASTBOOT_BODY -->
<noscript>
<div id=\"main\">
<div class='noscript'>
This site requires JavaScript to be enabled.
</div>
</div>
</noscript>
<script src=\"/assets/vendor-bfe89101b20262535de5a5ccdc276965.js\" integrity=\"sha256-U12Xuwhz1bhJXWyFW/hRr+Wa8B6FFDheTowik5VLkbw= sha512-J/cUUuUN55TrdG8P6Zk3/slI0nTgzYb8pOQlrXfaLgzr9aEumr9D1EzmFyLy1nrhaDGpRN1T8EQrU21Jl81pJQ==\" ></script>
<script src=\"/assets/cargo-4023b68501b7b3e17b2bb31f50f5eeea.js\" integrity=\"sha256-9atimKc1KC6HMJF/B07lP3Cjtgr2tmET8Vau0Re5mVI= sha512-XJyBDQU4wtA1aPyPXaFzTE5Wh/mYJwkKHqZ/Fn4p/ezgdKzSCFu6FYn81raBCnCBNsihfhrkb88uF6H5VraHMA==\" ></script>
</body>
</html>
}";
let html: Html = from_str(xml)?;
assert_eq!(&html.head.title, "crates.io: Rust Package Registry");
Ok(html)
}
致谢
这主要受到了 serde-xml-rs 的启发。fast-xml 沿用了其反序列化约定,包括 $value
特殊名称。
原始 quick-xml 由 @tafia 开发,并于 2021 年底停止维护。
解析标签的 "值"
如果您有一个如下形式的输入 <foo abc="xyz">bar</foo>
,并且您想访问 bar
,您可以使用特殊名称 $value
struct Foo {
pub abc: String,
#[serde(rename = "$value")]
pub body: String,
}
将结构体展开为详尽的 XML
如果您的XML文件看起来像这样:<root><first>value</first><second>value</second></root>
,您可以使用特殊名称前缀 $unflatten=
来进行序列化和反序列化。
struct Root {
#[serde(rename = "$unflatten=first")]
first: String,
#[serde(rename = "$unflatten=second")]
other_field: String,
}
将单元变体序列化为原始数据类型
使用前缀 $primitive
,您可以将没有关联值(在内部称为 单元变体)的枚举变体序列化为原始字符串,而不是自闭合标签。考虑以下定义
enum Foo {
#[serde(rename = "$primitive=Bar")]
Bar
}
struct Root {
foo: Foo
}
序列化 Root { foo: Foo::Bar }
将产生 <Root foo="Bar"/>
而不是 <Root><Bar/><</Root>
。
性能
请注意,尽管它没有专注于性能(存在几个不必要的复制),但它仍然比 serde-xml-rs 快约 10 倍。
特性
encoding
:支持非UTF8的XMLserialize
:支持 serdeSerialize
/Deserialize
性能
基准测试很困难,结果取决于您的输入文件和您的机器。
在这里,针对我特定的文件,fast-xml 比 xml-rs 包快约 50倍。 (测量是在这个包名为 quick-xml 时进行的)
// quick-xml benches
test bench_quick_xml ... bench: 198,866 ns/iter (+/- 9,663)
test bench_quick_xml_escaped ... bench: 282,740 ns/iter (+/- 61,625)
test bench_quick_xml_namespaced ... bench: 389,977 ns/iter (+/- 32,045)
// same bench with xml-rs
test bench_xml_rs ... bench: 14,468,930 ns/iter (+/- 321,171)
// serde-xml-rs vs serialize feature
test bench_serde_quick_xml ... bench: 1,181,198 ns/iter (+/- 138,290)
test bench_serde_xml_rs ... bench: 15,039,564 ns/iter (+/- 783,485)
要比较功能和性能,您还可以查看 RazrFalcon 的 解析器比较表。
贡献
欢迎任何 PR!
许可
MIT
依赖项
~0.2–1.3MB
~39K SLoC