#bert #transformer #ai #llm #nlp #language-model

rbert

Bert 嵌入的简单接口

6 个版本

0.3.3 2024 年 8 月 18 日
0.3.2 2024 年 8 月 14 日
0.2.1 2024 年 2 月 28 日
0.1.0 2023 年 12 月 16 日

#936机器学习 中排名

Download history 2/week @ 2024-05-14 19/week @ 2024-05-21 23/week @ 2024-05-28 12/week @ 2024-06-04 13/week @ 2024-06-11 10/week @ 2024-06-18 22/week @ 2024-06-25 1/week @ 2024-07-02 11/week @ 2024-07-09 15/week @ 2024-07-16 39/week @ 2024-07-30 8/week @ 2024-08-06 368/week @ 2024-08-13

每月 415 次下载
3 个包中使用 (通过 kalosm-language)

MIT/Apache 协议

175KB
3.5K SLoC

rbert

Rust 对 bert 句子转换器 的包装,由 Candle 实现

使用方法

use kalosm_language_model::Embedder;
use rbert::*;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut bert = Bert::new().await?;
    let sentences = [
        "Cats are cool",
        "The geopolitical situation is dire",
        "Pets are great",
        "Napoleon was a tyrant",
        "Napoleon was a great general",
    ];
    let embeddings = bert.embed_batch(sentences).await?;
    println!("embeddings {:?}", embeddings);

    // Find the cosine similarity between the first two sentences
    let mut similarities = vec![];
    let n_sentences = sentences.len();
    for (i, e_i) in embeddings.iter().enumerate() {
        for j in (i + 1)..n_sentences {
            let e_j = embeddings.get(j).unwrap();
            let cosine_similarity = e_j.cosine_similarity(e_i);
            similarities.push((cosine_similarity, i, j))
        }
    }
    similarities.sort_by(|u, v| v.0.total_cmp(&u.0));
    for &(score, i, j) in similarities.iter() {
        println!("score: {score:.2} '{}' '{}'", sentences[i], sentences[j])
    }

    Ok(())
}

依赖项

~33–53MB
~1M SLoC