#recipe #website #scrape #blocking #scraping #async

rust-recipe

Rust 包,用于从网站抓取食谱

3 个不稳定版本

0.2.0 2024年1月2日
0.1.1 2024年1月1日
0.1.0 2024年1月1日

#1162 in 网页开发

MIT 许可证

26KB
456

rust-recipe

crates.io Documentation CI Status

rust-recipe 是一个 Rust 包,用于从网站抓取食谱。它受到了 Golang 库 "go-recipe" 的启发。

添加到项目中

cargo add rust-recipe

可选地,您可以使用 blockingasync 功能。

使用方法

自定义抓取

默认情况下,该包提供了 scrape_recipe 方法,它接受您从网站抓取的 HTML,并尝试解析它。通过 RecipeInformationProvider 特性提供了可用于抓取后获取信息的方法。

use rust_recipe::scrape_recipe;
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    let html = ureq::get(url).call()?.into_string()?;
    let recipe = scrape_recipe(&html).unwrap();

    println!("Fetching {:?}...\n", url);
    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }

    Ok(())
}

还可以通过实现 RecipeScraper 特性来使用自定义抓取器。

use rust_recipe::{custom_scrape_recipe, RecipeInformationProvider, RecipeScraper};
use std::{collections::HashMap, error::Error};

fn main() -> Result<(), Box<dyn Error>> {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    let html = ureq::get(url).call()?.into_string()?;
    let scraper = CustomScraper {};
    let recipe = custom_scrape_recipe(&html, scraper).unwrap();

    println!("Fetching {:?}...\n", url);
    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }

    Ok(())
}

pub struct CustomScraper {...}

pub struct CustomRecipeInfoProvider {
    vals: HashMap<String, String>,
}

impl RecipeScraper for CustomScraper {
    fn scrape_recipe(
        self,
        html: &str,
    ) -> Result<Box<dyn rust_recipe::RecipeInformationProvider>, serde_json::Error> {
        let mut m = HashMap::new();
        m.insert(
            String::from("description"),
            String::from("My favourite recipe"),
        );
        m.insert(
            String::from("ingredients"),
            String::from("carrots, potatoes"),
        );
        ...
        Ok(Box::new(CustomRecipeInfoProvider { vals: m }))
    }
}

impl RecipeInformationProvider for CustomRecipeInfoProvider {
    ...
    fn description(&self) -> Option<String> {
        self.vals.get("description").cloned()
    }

    fn ingredients(&self) -> Option<Vec<String>> {
        self.vals
            .get("ingredients")
            .cloned()
            .map(|s| s.split(", ").map(String::from).collect())
    }
    ...
}

异步

async 功能使用 reqwest 对提供的 URL 进行异步调用。

use rust_recipe::scrape_recipe_from_url;

#[tokio::main]
async fn main() {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    println!("Fetching {:?}...\n", url);
    let recipe = scrape_recipe_from_url(url).await.unwrap();

    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }
}

阻塞

blocking 功能使用 ureq 包对提供的 URL 进行阻塞调用。

use rust_recipe::{scrape_recipe_from_url_blocking, RecipeScraper};

fn main() {
    let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
    println!("Fetching {:?}...\n", url);
    let recipe = scrape_recipe_from_url_blocking(url).unwrap();

    let desc = recipe.description().unwrap();
    println!("Description: {}", desc);
    println!();
    println!("Ingredients:");
    for i in recipe.ingredients().unwrap().iter() {
        println!("- {}", i);
    }
}

依赖项

~8–23MB
~320K SLoC