#nlp #config-file #ner #cli #named-entity

quickner-core

一个快速简单的命名实体识别(NER)工具

18次发布

0.0.1-alpha.202024年2月24日
0.0.1-alpha.182023年3月12日
0.0.1-alpha.132023年2月26日

#1139 in 文本处理

42 每月下载量

自定义许可

60KB
1K SLoC

Quickner Core

这是Quickner项目的核心所在。Rust代码位于src目录中。src目录包含以下内容

  • config.rs - 配置文件解析器和验证器
  • models.rs - 项目中使用的数据库模型
  • utils.rs - 项目中使用的实用函数

构建

要构建项目,您需要安装Rust。您可以按照此处的说明安装Rust。安装Rust后,您可以通过运行以下命令来构建项目

cargo build --release

许可证

本项目采用Mozilla公共许可证2.0许可。有关详细信息,请参阅LICENSE文件。


lib.rs:

quickner是一个提供命令行界面和Python API的NER注释库。它附带一个默认的配置文件,可以根据您的需求进行修改。

批量注释

您可以使用quickner注释一批文本。

提供配置文件和包含您的文本的文件夹

  • 包含您想注释的文本的csv文件。
  • 包含您想注释的实体的csv文件。
  • 包含您想排除在注释之外的csv文件。

配置

配置文件是一个包含以下字段的toml文件

[logging]
level = "info" # level of logging (debug, info, warning, error, fatal)

[texts]

[texts.input]
filter = false     # if true, only texts in the filter list will be used
path = "texts.csv" # path to the texts file

[texts.filters]
accept_special_characters = ".,-" # list of special characters to accept in the text (if special_characters is true)
alphanumeric = false              # if true, only strictly alphanumeric texts will be used
case_sensitive = false            # if true, case sensitive search will be used
max_length = 1024                 # maximum length of the text
min_length = 0                    # minimum length of the text
numbers = false                   # if true, texts with numbers will not be used
punctuation = false               # if true, texts with punctuation will not be used
special_characters = false        # if true, texts with special characters will not be used

[annotations]
format = "spacy" # format of the output file (jsonl, spaCy, brat, conll)

[annotations.output]
path = "annotations.jsonl" # path to the output file

[entities]

[entities.input]
filter = true         # if true, only entities in the filter list will be used
path = "entities.csv" # path to the entities file
save = true           # if true, the entities found will be saved in the output file

[entities.filters]
accept_special_characters = ".-" # list of special characters to accept in the entity (if special_characters is true)
alphanumeric = false             # if true, only strictly alphanumeric entities will be used
case_sensitive = false           # if true, case sensitive search will be used
max_length = 20                  # maximum length of the entity
min_length = 0                   # minimum length of the entity
numbers = false                  # if true, entities with numbers will not be used
punctuation = false              # if true, entities with punctuation will not be used
special_characters = true        # if true, entities with special characters will not be used

[entities.excludes]
# path = "excludes.csv" # path to entities to exclude from the search

示例

use quickner::models::Quickner;

let quick = Quickner::new("./config.toml");
let annotations = quick.process(true);

单条注释

您还可以使用quickner注释单条文本。当您只想注释单条文本并在代码中使用注释时,这很有用。

use quickner::Document;

let annotation = Document::from_string("Rust is maintained by Mozilla");
let entities = HashMap::new();
entities.insert("Rust", "Programming Language");
entities.insert("Mozilla", "Organization");
annotation.annotate(entities);

依赖关系

~9–18MB
~224K SLoC