2个不稳定版本
使用旧的Rust 2015
0.2.1 | 2017年8月9日 |
---|---|
0.1.0 | 2016年10月31日 |
#1446 in 算法
23KB
322 行
去重签名
本库实现了生成哈希/签名/指纹的算法,用于检测重复文档。这些算法适用于长文本,如新闻文章或网页。签名可以以几种方式实现
- TextProfileSignature:Apache Nutch的模糊哈希实现,用于近似重复检测。它是可调的,但在较长的文本上效果最佳。
- Lookup3:用于精确重复检测的64位哈希。这比MD5快得多,并且存储空间更小。
安装
Cargo
将以下内容添加到Cargo.toml中
[dependencies]
dedup_signature = "^0.2.1"
入门
请按照安装说明进行操作,然后运行以下代码
extern crate dedup_signature;
use dedup_signature::generator::*;
fn main(){
let profile_generator = TextProfileSignature{ ..TextProfileSignature::default() };
let text = r#"Liberty, in philosophy, involves free will as contrasted with determinism.[1] In politics, liberty consists of the social and political freedoms enjoyed by all citizens.[2] In theology, liberty is freedom from the bondage of sin.[3] Generally, liberty seems to be distinct from freedom in that freedom concerns itself primarily, if not exclusively, with the ability to do as one wills and what one has the power to do; whereas liberty also takes into account the rights of all involved. As such, liberty can be thought of as freedom limited by rights, and therefore cannot be abused."#;
let sign = profile_generator.generate_sign(&text);
assert_eq!("6274be1f2560d8c9b8d344513d0b3942", sign);
}
文档
文本配置文件签名
首先,您需要导入Text Profile Signature生成器
extern crate dedup_signature;
use dedup_signature::generator::text_profile_signature::TextProfileSignature;
然后,您需要使用默认参数创建一个profile结构体
let profile_generator = TextProfileSignature{ ..TextProfileSignature::default() };
最后,您可以通过调用generate_sign方法来生成您的签名
let text = r#"Liberty, in philosophy, involves free will as contrasted with determinism.[1] In politics, liberty consists of the social and political freedoms enjoyed by all citizens.[2] In theology, liberty is freedom from the bondage of sin.[3] Generally, liberty seems to be distinct from freedom in that freedom concerns itself primarily, if not exclusively, with the ability to do as one wills and what one has the power to do; whereas liberty also takes into account the rights of all involved. As such, liberty can be thought of as freedom limited by rights, and therefore cannot be abused."#;
let sign = profile_generator.generate_sign(&text);
assert_eq!("6274be1f2560d8c9b8d344513d0b3942", sign);
选项
名称 | 类型 | 描述 | 默认值 |
---|---|---|---|
min_token_length | usize | 考虑的最小令牌长度 | 2 |
quant_rate | f32 | 当乘以最大令牌频率时,这决定了计数量化 | 0.01 |
Lookup3
Lookup3哈希生成器是用于精确重复检测的64位哈希。这比MD5快得多,并且存储空间更小。
首先,您需要导入lookup3生成器
extern crate dedup_signature;
use dedup_signature::generator::lookup3_signature::Lookup3Signature;
然后,您需要使用默认参数创建一个profile结构体
let profile_generator = Lookup3Signature { ..Lookup3Signature::default() };
最后,您可以通过调用generate_sign方法来生成您的签名
let text = r#"Liberty, in philosophy, involves free will as contrasted with determinism.[1] In politics, liberty consists of the social and political freedoms enjoyed by all citizens.[2] In theology, liberty is freedom from the bondage of sin.[3] Generally, liberty seems to be distinct from freedom in that freedom concerns itself primarily, if not exclusively, with the ability to do as one wills and what one has the power to do; whereas liberty also takes into account the rights of all involved. As such, liberty can be thought of as freedom limited by rights, and therefore cannot be abused."#;
let sign = profile_generator.generate_sign_64(&text);
assert_eq!("0682d4d013cb3b2c", sign);
选项
名称 | 类型 | 描述 | 默认值 |
---|---|---|---|
seed | u64 | 生成哈希的初始种子 | 12345 |
许可
版权所有 2016-2017 Hamed Ramezanian Nik
本程序是自由软件:您可以按照自由软件基金会发布的GNU通用公共许可证的条款重新分发和/或修改它,许可证版本为3,或者(根据您的选择)任何后续版本。
本程序是根据本程序的预期用途而分发的,但没有任何保证;甚至没有关于其适销性或适用于特定目的的暗示保证。有关详细信息,请参阅GNU通用公共许可证。
您应该已经随本程序收到了GNU通用公共许可证的一份副本。如果没有,请参阅http://www.gnu.org/licenses/。
依赖
~4MB
~50K SLoC