13 个版本 (5 个稳定版)

1.1.0	2020年12月24日
1.0.3	2019年12月13日
1.0.0	2019年11月9日
0.3.0	2019年11月6日
0.1.4	2019年10月27日

#178 in 科学

每月下载量：54

MIT 许可证

1MB
162 行

bayespam

简单的贝叶斯垃圾邮件分类器。

关于

Bayespam 受到朴素贝叶斯分类器的启发，这是一种流行的电子邮件过滤统计技术。

在这里，待识别的消息被切割成简单的单词，也称为标记。它们与所有消息库（垃圾邮件或非垃圾邮件）进行比较，以确定不同标记在两个类别中的频率。

使用概率公式来计算消息是垃圾邮件的概率。当概率足够高时，分类器将消息分类为可能是垃圾邮件，否则为可能是正常邮件。默认情况下，概率阈值设置为 0.8。

文档

在此处了解 Bayespam 的更多信息：https://docs.rs/bayespam.

用法

将其添加到您的 Cargo.toml 清单中

[dependencies]
bayespam = "1.1.0"

使用预训练模型

将 model.json 文件添加到您的 包根目录。然后，您可以使用它来评分和识别消息

extern crate bayespam;

use bayespam::classifier;

fn main() -> Result<(), std::io::Error> {
    // Identify a typical spam message
    let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
    let score = classifier::score(spam)?;
    let is_spam = classifier::identify(spam)?;
    println!("{:.4?}", score);
    println!("{:?}", is_spam);

    // Identify a typical ham message
    let ham = "Hi Bob, can you send me your machine learning homework?";
    let score = classifier::score(ham)?;
    let is_spam = classifier::identify(ham)?;
    println!("{:.4?}", score);
    println!("{:?}", is_spam);

    Ok(())
}

$> cargo run
0.9993
true
0.6311
false

训练您自己的模型并将其保存为 JSON 文件

您可以从头开始训练一个新的模型，将其保存为 JSON 以供以后重新加载

extern crate bayespam;

use bayespam::classifier::Classifier;
use std::fs::File;

fn main() -> Result<(), std::io::Error> {
    // Create a new classifier with an empty model
    let mut classifier = Classifier::new();

    // Train the classifier with a new spam example
    let spam = "Don't forget our special promotion: -30% on men shoes, only today!";
    classifier.train_spam(spam);

    // Train the classifier with a new ham example
    let ham = "Hi Bob, don't forget our meeting today at 4pm.";
    classifier.train_ham(ham);

    // Identify a typical spam message
    let spam = "Lose up to 19% weight. Special promotion on our new weightloss.";
    let score = classifier.score(spam);
    let is_spam = classifier.identify(spam);
    println!("{:.4}", score);
    println!("{}", is_spam);

    // Identify a typical ham message
    let ham = "Hi Bob, can you send me your machine learning homework?";
    let score = classifier.score(ham);
    let is_spam = classifier.identify(ham);
    println!("{:.4}", score);
    println!("{}", is_spam);

    // Serialize the model and save it as JSON into a file
    let mut file = File::create("my_super_model.json")?;
    classifier.save(&mut file, false)?;

    Ok(())
}

$> cargo run
0.9999
true
0.0001
false

$> cat my_super_model.json
{"token_table":{"forget":{"ham":1,"spam":1},"only":{"ham":0,"spam":1},"meeting":{"ham":1,"spam":0},"our":{"ham":1,"spam":1},"dont":{"ham":1,"spam":1},"bob":{"ham":1,"spam":0},"men":{"ham":0,"spam":1},"today":{"ham":1,"spam":1},"shoes":{"ham":0,"spam":1},"special":{"ham":0,"spam":1},"promotion:":{"ham":0,"spam":1}}}

bin+lib bayespam

13 个版本 (5 个稳定版)

bayespam

关于

文档

用法

使用预训练模型

训练您自己的模型并将其保存为 JSON 文件

贡献

许可证

依赖项