#stream #html5ever #data-stream #parser #io #html #hyper

html5ever-stream

轻松将数据流式传输到 html5ever 解析器

1 个不稳定版本

使用旧的 Rust 2015

0.1.0 2018年6月4日

#15#html5ever

MIT 许可证

15KB
212

html5ever-stream

Travis CI Status MIT licensed crates.io Released API docs

适配器,可轻松将数据流式传输到 html5ever 解析器。

概述

本包旨在提供垫片,使解析 HTML 变得相对容易,这种数据流可以通过标准 IO 读取器/写入器特性或通过 Streamfutures 包中消耗

  • 支持任何发出实现 AsRef<[u8]> 项的 Stream
    • 自动支持 hyper 和不稳定 reqwest 类型
  • 支持 reqwest 的 copy_to 方法
  • RcDom 的辅助包装器,使其更容易使用。

示例

使用 Hyper 0.11

extern crate futures;
extern crate html5ever;
extern crate html5ever_stream;
extern crate hyper;
extern crate hyper_tls;
extern crate tokio_core;
extern crate num_cpus;

use html5ever::rcdom;
use futures::{Future, Stream};
use hyper::Client;
use hyper_tls::HttpsConnector;
use tokio_core::reactor::Core;
use html5ever_stream::{ParserFuture, NodeStream};

fn main() {
    let mut core = Core::new().unwrap();
    let handle = core.handle();
    let client = Client::configure()
        .connector(HttpsConnector::new(num_cpus::get(), &handle).unwrap())
        .build(&handle);


    // NOTE: We throw away errors here in two places, you are better off casting them into your
    // own custom error type in order to propagate them.
    let req_fut = client.get("https://github.com".parse().unwrap()).map_err(|_| ());
    let parser_fut = req_fut.and_then(|res| {
        ParserFuture::new(res.body().map_err(|_| ()), rcdom::RcDom::default())
    });
    let nodes = parser_fut.and_then(|dom| {
        NodeStream::new(&dom).collect()
    });
    let print_fut = nodes.and_then(|vn| {
        println!("found {} elements", vn.len());
        Ok(())
    });
    core.run(print_fut).unwrap();
}

使用不稳定 Async Reqwest 0.8.6

extern crate futures;
extern crate html5ever;
extern crate html5ever_stream;
extern crate reqwest;
extern crate tokio_core;

use html5ever::rcdom;
use futures::{Future, Stream};
use reqwest::unstable::async as async_reqwest;
use tokio_core::reactor::Core;
use html5ever_stream::{ParserFuture, NodeStream};

fn main() {
    let mut core = Core::new().unwrap();
    let client = async_reqwest::Client::new(&core.handle());

    // NOTE: We throw away errors here in two places, you are better off casting them into your
    // own custom error type in order to propagate them.
    let req_fut = client.get("https://github.com").send().map_err(|_| ());
    let parser_fut = req_fut.and_then(|res| {
        ParserFuture::new(res.into_body().map_err(|_| ()), rcdom::RcDom::default())
    });
    let nodes = parser_fut.and_then(|dom| {
        NodeStream::new(&dom).collect()
    });
    let print_fut = nodes.and_then(|vn| {
        println!("found {} elements", vn.len());
        Ok(())
    });
    core.run(print_fut).unwrap();
}

使用稳定 Reqwest 0.8.6

extern crate html5ever;
extern crate html5ever_stream;
extern crate reqwest;

use html5ever::rcdom;
use html5ever_stream::{ParserSink, NodeIter};

fn main() {
    let mut resp = reqwest::get("https://github.com").unwrap();
    let mut parser = ParserSink::new(rcdom::RcDom::default());
    resp.copy_to(&mut parser).unwrap();
    let document = parser.finish();
    let nodes: Vec<rcdom::Handle> = NodeIter::new(&document).collect();
    println!("found {} elements", nodes.len());
}

许可证

MIT 许可证 下授权

依赖项

~1.2–3MB
~56K SLoC