3 个不稳定版本
| 0.2.0 | 2024年1月2日 | 
|---|---|
| 0.1.1 | 2024年1月1日 | 
| 0.1.0 | 2024年1月1日 | 
#1162 in 网页开发
26KB
456 行
rust-recipe
rust-recipe 是一个 Rust 包,用于从网站抓取食谱。它受到了 Golang 库 "go-recipe" 的启发。
添加到项目中
cargo add rust-recipe
可选地,您可以使用 blocking 或 async 功能。
使用方法
自定义抓取
默认情况下,该包提供了 scrape_recipe 方法,它接受您从网站抓取的 HTML,并尝试解析它。通过 RecipeInformationProvider 特性提供了可用于抓取后获取信息的方法。
use rust_recipe::scrape_recipe;
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    let html = ureq::get(url).call()?.into_string()?;
    let recipe = scrape_recipe(&html).unwrap();
    println!("Fetching {:?}...\n", url);
    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }
    Ok(())
}
还可以通过实现 RecipeScraper 特性来使用自定义抓取器。
use rust_recipe::{custom_scrape_recipe, RecipeInformationProvider, RecipeScraper};
use std::{collections::HashMap, error::Error};
fn main() -> Result<(), Box<dyn Error>> {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    let html = ureq::get(url).call()?.into_string()?;
    let scraper = CustomScraper {};
    let recipe = custom_scrape_recipe(&html, scraper).unwrap();
    println!("Fetching {:?}...\n", url);
    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }
    Ok(())
}
pub struct CustomScraper {...}
pub struct CustomRecipeInfoProvider {
    vals: HashMap<String, String>,
}
impl RecipeScraper for CustomScraper {
    fn scrape_recipe(
        self,
        html: &str,
    ) -> Result<Box<dyn rust_recipe::RecipeInformationProvider>, serde_json::Error> {
        let mut m = HashMap::new();
        m.insert(
            String::from("description"),
            String::from("My favourite recipe"),
        );
        m.insert(
            String::from("ingredients"),
            String::from("carrots, potatoes"),
        );
        ...
        Ok(Box::new(CustomRecipeInfoProvider { vals: m }))
    }
}
impl RecipeInformationProvider for CustomRecipeInfoProvider {
    ...
    fn description(&self) -> Option<String> {
        self.vals.get("description").cloned()
    }
    fn ingredients(&self) -> Option<Vec<String>> {
        self.vals
            .get("ingredients")
            .cloned()
            .map(|s| s.split(", ").map(String::from).collect())
    }
    ...
}
异步
async 功能使用 reqwest 对提供的 URL 进行异步调用。
use rust_recipe::scrape_recipe_from_url;
#[tokio::main]
async fn main() {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    println!("Fetching {:?}...\n", url);
    let recipe = scrape_recipe_from_url(url).await.unwrap();
    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }
}
阻塞
blocking 功能使用 ureq 包对提供的 URL 进行阻塞调用。
use rust_recipe::{scrape_recipe_from_url_blocking, RecipeScraper};
fn main() {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    println!("Fetching {:?}...\n", url);
    let recipe = scrape_recipe_from_url_blocking(url).unwrap();
    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }
}
依赖项
~8–23MB
~320K SLoC