7个版本

0.2.2	2023年12月19日
0.2.1	2023年12月14日
0.1.11	2023年12月14日

#419 in 机器学习

每月71次下载

MIT/Apache

385KB
115 行

Image Captioner

Image Captioner使用Salesforce的BLIP模型从Rust代码中轻松生成图像标题。所有处理都在您的设备上完成。在初始模型下载后，使用M1 MacBook Pro处理图像大约需要5秒，无需GPU。

标题相当不错。对于这张图片，自动生成的标题是“一个着火的笔记本电脑”。

A laptop on fire

示例用法

假设您在crate根目录下有一个名为image.jpg的图像

use image_captioner::get_caption;
use std::path::Path;

fn main() {
    // This path is relative to the directory you run your Rust application from,
    // usually the crate root.
    let image_path = Path::new("./image.jpg");

    // The first time you run this will be slow since it has to download the model,
    // which is 990 MB.
    match get_caption(image_path) {
        Ok(caption) => println!("Caption: {}", caption),
        Err(err) => eprintln!("Error: {:?}", err),
    }
}

关于BLIP深度学习模型

BLIP（Bootstrapping Language-Image Pre-training）是Salesforce于2022年发布的一个模型，在图像标题等众多视觉+语言任务上表现优异。它采用宽松的许可协议（BSD 3-Clause），允许在个人和商业项目中使用。

有关更多信息，请参阅Hugging Face上的BLIP模型卡。

模型下载

第一次运行image_captioner时，它会自动下载BLIP模型。此过程需要互联网连接。模型大小为990MB，因此下载可能需要一些时间。

模型由transformers Python库下载并缓存。默认缓存位置通常是

Linux/macOS： ~/.cache/huggingface/hub
Windows： C:\Users\<username>\.cache\huggingface\hub

此位置可能因您的系统配置和环境设置而异。《code>transformers库会自动管理缓存，在后续运行中高效存储和检索模型。

依赖项

~4–15MB
~165K SLoC