#sqlite #bioinformatics #ncbi #taxonomy #database #read #copy

bin+lib ncbitaxonomy

从文件中读取NCBI分类学数据库并与NCBI分类学数据库一起工作

18个版本 (8个稳定版)

使用旧Rust 2015

1.0.7 2020年7月12日
1.0.6 2020年7月11日
1.0.5 2020年6月17日
1.0.3 2020年5月4日
0.1.5 2019年1月13日

生物学类别中排名第111

MIT许可证

55KB
1K SLoC

CircleCI

ncbitaxonomy

这是一个用于处理NCBI分类学数据库本地副本的Rust crate(即库)。可以从NCBI分类学数据库(可以是taxdump.ziptaxdump.tar.gz)下载该数据库,并使用taxonomy_util实用工具的to_sqlite子命令将其转换为SQLite数据库。

文档可在crates.io上找到。

taxonomy_filter_refseq

(自0.1.1版起新增)

一个用于过滤NCBI RefSeq FASTA文件的工具,以便仅保留给定物种的祖先。

$ taxonomy_filter_refseq --help
taxonomy_filter_refseq 1.0.0
Peter van Heusden <[email protected]>
Filter NCBI RefSeq FASTA files by taxonomic lineage

USAGE:
    taxonomy_filter_refseq [FLAGS] [OPTIONS] <INPUT_FASTA> <ANCESTOR_NAME> [OUTPUT_FASTA]

FLAGS:
        --no_curated      Don't accept curated RNAs and proteins (NM_, NR_ and NP_ accessions)
        --no_predicted    Don't accept computationally predicted RNAs and proteins (XM_, XR_ and XP_ accessions)
    -h, --help            Prints help information
    -V, --version         Prints version information

OPTIONS:
    -d, --db <TAXDB_URL>    URL for SQLite taxonomy database

ARGS:
    <INPUT_FASTA>      FASTA file with RefSeq sequences
    <ANCESTOR_NAME>    Name of ancestor to use as ancestor filter
    <OUTPUT_FASTA>     Output FASTA filename (or stdout if omitted)

taxonomy_filter_fastq

(自版本0.2.0起新增)

$ taxonomy_filter_fastq --help
taxonomy_filter_fastq 1.0.0
Peter van Heusden <[email protected]>
Filter FASTQ files whose reads have been classified by Centrifuge or Kraken2, only retaining reads in taxa descending
from given ancestor

USAGE:
    taxonomy_filter_fastq [FLAGS] [OPTIONS] <INPUT_FASTQ>... --ancestor_taxid <ANCESTOR_ID> --tax_report_filename <TAXONOMY_REPORT_FILENAME> <--centrifuge|--kraken2>

FLAGS:
    -d, --output_dir    Directory to deposited filtered output files in
    -C, --centrifuge    Filter using report from Centrifuge
    -h, --help          Prints help information
    -K, --kraken2       Filter using report from Kraken2
    -V, --version       Prints version information

OPTIONS:
    -A, --ancestor_taxid <ANCESTOR_ID>                      Name of ancestor to use as ancestor filter
    -d, --db <TAXDB_URL>                                    URL for SQLite taxonomy database
    -F, --tax_report_filename <TAXONOMY_REPORT_FILENAME>    Output from Kraken2 (default) or Centrifuge

ARGS:
    <INPUT_FASTQ>...    FASTA file with RefSeq sequences

taxonomy_util

(自1.0.0版起新增)

将NCBI分类学数据库文件转换为SQLite数据库的实用工具(其他工具使用的输入格式)。

taxonomy_util 1.0.0
Peter van Heusden <[email protected]>
Utilities for working with the NCBI taxonomy database

USAGE:
    taxonomy_util [OPTIONS] [SUBCOMMAND]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -d, --db <TAXDB_URL>    URL for SQLite taxonomy database

SUBCOMMANDS:
    common_ancestor_distance    find the tree distance to te common ancestor between two taxa
    get_id                      find taxonomy ID for name
    get_lineage                 get lineage for name
    get_name                    find name for taxonomy ID
    help                        Prints this message or the help of the given subcommand(s)
    to_sqlite                   save taxonomy database loaded from files to SQLite database file

依赖项

~34MB
~631K SLoC