3 个不稳定版本
0.2.0 | 2024年1月2日 |
---|---|
0.1.1 | 2024年1月1日 |
0.1.0 | 2024年1月1日 |
#1162 in 网页开发
26KB
456 行
rust-recipe
rust-recipe 是一个 Rust 包,用于从网站抓取食谱。它受到了 Golang 库 "go-recipe" 的启发。
添加到项目中
cargo add rust-recipe
可选地,您可以使用 blocking
或 async
功能。
使用方法
自定义抓取
默认情况下,该包提供了 scrape_recipe
方法,它接受您从网站抓取的 HTML,并尝试解析它。通过 RecipeInformationProvider
特性提供了可用于抓取后获取信息的方法。
use rust_recipe::scrape_recipe;
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
let html = ureq::get(url).call()?.into_string()?;
let recipe = scrape_recipe(&html).unwrap();
println!("Fetching {:?}...\n", url);
let desc = recipe.description().unwrap();
println!("Description: {}", desc);
println!();
println!("Ingredients:");
for i in recipe.ingredients().unwrap().iter() {
println!("- {}", i);
}
Ok(())
}
还可以通过实现 RecipeScraper
特性来使用自定义抓取器。
use rust_recipe::{custom_scrape_recipe, RecipeInformationProvider, RecipeScraper};
use std::{collections::HashMap, error::Error};
fn main() -> Result<(), Box<dyn Error>> {
let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
let html = ureq::get(url).call()?.into_string()?;
let scraper = CustomScraper {};
let recipe = custom_scrape_recipe(&html, scraper).unwrap();
println!("Fetching {:?}...\n", url);
let desc = recipe.description().unwrap();
println!("Description: {}", desc);
println!();
println!("Ingredients:");
for i in recipe.ingredients().unwrap().iter() {
println!("- {}", i);
}
Ok(())
}
pub struct CustomScraper {...}
pub struct CustomRecipeInfoProvider {
vals: HashMap<String, String>,
}
impl RecipeScraper for CustomScraper {
fn scrape_recipe(
self,
html: &str,
) -> Result<Box<dyn rust_recipe::RecipeInformationProvider>, serde_json::Error> {
let mut m = HashMap::new();
m.insert(
String::from("description"),
String::from("My favourite recipe"),
);
m.insert(
String::from("ingredients"),
String::from("carrots, potatoes"),
);
...
Ok(Box::new(CustomRecipeInfoProvider { vals: m }))
}
}
impl RecipeInformationProvider for CustomRecipeInfoProvider {
...
fn description(&self) -> Option<String> {
self.vals.get("description").cloned()
}
fn ingredients(&self) -> Option<Vec<String>> {
self.vals
.get("ingredients")
.cloned()
.map(|s| s.split(", ").map(String::from).collect())
}
...
}
异步
async
功能使用 reqwest
对提供的 URL 进行异步调用。
use rust_recipe::scrape_recipe_from_url;
#[tokio::main]
async fn main() {
let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
println!("Fetching {:?}...\n", url);
let recipe = scrape_recipe_from_url(url).await.unwrap();
let desc = recipe.description().unwrap();
println!("Description: {}", desc);
println!();
println!("Ingredients:");
for i in recipe.ingredients().unwrap().iter() {
println!("- {}", i);
}
}
阻塞
blocking
功能使用 ureq
包对提供的 URL 进行阻塞调用。
use rust_recipe::{scrape_recipe_from_url_blocking, RecipeScraper};
fn main() {
let url = "https://www.bbcgoodfood.com/recipes/crab-lasagne";
println!("Fetching {:?}...\n", url);
let recipe = scrape_recipe_from_url_blocking(url).unwrap();
let desc = recipe.description().unwrap();
println!("Description: {}", desc);
println!();
println!("Ingredients:");
for i in recipe.ingredients().unwrap().iter() {
println!("- {}", i);
}
}
依赖项
~8–23MB
~320K SLoC