18次发布
0.0.1-alpha.20 | 2024年2月24日 |
---|---|
0.0.1-alpha.18 | 2023年3月12日 |
0.0.1-alpha.13 | 2023年2月26日 |
#1139 in 文本处理
42 每月下载量
60KB
1K SLoC
Quickner Core
这是Quickner项目的核心所在。Rust代码位于src
目录中。src
目录包含以下内容
config.rs
- 配置文件解析器和验证器models.rs
- 项目中使用的数据库模型utils.rs
- 项目中使用的实用函数
构建
要构建项目,您需要安装Rust。您可以按照此处的说明安装Rust。安装Rust后,您可以通过运行以下命令来构建项目
cargo build --release
许可证
本项目采用Mozilla公共许可证2.0许可。有关详细信息,请参阅LICENSE文件。
lib.rs
:
quickner是一个提供命令行界面和Python API的NER注释库。它附带一个默认的配置文件,可以根据您的需求进行修改。
批量注释
您可以使用quickner注释一批文本。
提供配置文件和包含您的文本的文件夹
- 包含您想注释的文本的csv文件。
- 包含您想注释的实体的csv文件。
- 包含您想排除在注释之外的csv文件。
配置
配置文件是一个包含以下字段的toml文件
[logging]
level = "info" # level of logging (debug, info, warning, error, fatal)
[texts]
[texts.input]
filter = false # if true, only texts in the filter list will be used
path = "texts.csv" # path to the texts file
[texts.filters]
accept_special_characters = ".,-" # list of special characters to accept in the text (if special_characters is true)
alphanumeric = false # if true, only strictly alphanumeric texts will be used
case_sensitive = false # if true, case sensitive search will be used
max_length = 1024 # maximum length of the text
min_length = 0 # minimum length of the text
numbers = false # if true, texts with numbers will not be used
punctuation = false # if true, texts with punctuation will not be used
special_characters = false # if true, texts with special characters will not be used
[annotations]
format = "spacy" # format of the output file (jsonl, spaCy, brat, conll)
[annotations.output]
path = "annotations.jsonl" # path to the output file
[entities]
[entities.input]
filter = true # if true, only entities in the filter list will be used
path = "entities.csv" # path to the entities file
save = true # if true, the entities found will be saved in the output file
[entities.filters]
accept_special_characters = ".-" # list of special characters to accept in the entity (if special_characters is true)
alphanumeric = false # if true, only strictly alphanumeric entities will be used
case_sensitive = false # if true, case sensitive search will be used
max_length = 20 # maximum length of the entity
min_length = 0 # minimum length of the entity
numbers = false # if true, entities with numbers will not be used
punctuation = false # if true, entities with punctuation will not be used
special_characters = true # if true, entities with special characters will not be used
[entities.excludes]
# path = "excludes.csv" # path to entities to exclude from the search
示例
use quickner::models::Quickner;
let quick = Quickner::new("./config.toml");
let annotations = quick.process(true);
单条注释
您还可以使用quickner注释单条文本。当您只想注释单条文本并在代码中使用注释时,这很有用。
use quickner::Document;
let annotation = Document::from_string("Rust is maintained by Mozilla");
let entities = HashMap::new();
entities.insert("Rust", "Programming Language");
entities.insert("Mozilla", "Organization");
annotation.annotate(entities);
依赖关系
~9–18MB
~224K SLoC