1 个不稳定版本
0.1.0 | 2022年7月13日 |
---|
#233 in 生物学
190KB
2.5K SLoC
生物花园
生物花园是一个算法集合,作为一个学习生物信息学和Rust的项目而创建。它目前支持与序列比对、分析、统计和模式匹配相关的算法。
安装
Cargo.toml
[dependencies]
biogarden = "0.1.0"
- 命令行应用程序
$ cargo install biogarden
用法
对于简单情况,可以在源文件中直接处理序列。
use biogarden::ds::sequence::Sequence;
use biogarden::analysis::seq::*;
use biogarden::processing::patterns::*;
use biogarden::processing::transformers::*;
fn main() {
let a = Sequence::from("TTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGAAGTACGGGCATCAACCCAGTT");
let b = Sequence::from("TCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGCGGTACGAGTGTTCCTTTGGGT");
// Get some properties for sequence A
let gc_a = gc_content(&a);
let lc_a = linguistic_complexity(&a).unwrap();
println!("[A] GC Content: {}, Linguistic complexity: {}", gc_a, lc_a);
// Comparative metrics
let edit_dist = edit_distance(&a, &b).unwrap();
let tt_ratio = transition_transversion_ratio(&a, &b).unwrap();
println!("[A-B] Edit Distance: {}, TT Ratio: {}", edit_dist, tt_ratio);
// Pattern finding
let positions_tcg = find_motif(&a, &Sequence::from("TCG"));
println!("[A] Positons TCG: {:?}", positions_tcg);
let rev_cs = reverse_complement_substrings(&a, 4, 6);
println!("[A] Reverse complement substrings: {:?}", rev_cs);
// Pattern based compare
let lcss = longest_common_subsequence(&a, &b);
println!("[A-B] Longest common subsequence: {}", lcss);
// Transcribe
let b_rna = transcribe_dna(b);
println!("[B] RNA: {}", b_rna);
}
对于使用多个长序列的情况,从文件中读取数据可能更实际。例如,对于大约10000个碱基对的inputs
的半全局比对可以按如下方式计算
use biogarden::io::fasta::{FastaRead, Reader};
use std::path::Path;
use biogarden::alignment::*;
use biogarden::ds::tile::Tile;
use biogarden::processing::patterns::*;
fn main() {
// Allocate container for sequences
let mut tile = Tile::new();
// Read input
let mut path = Path::new(&"tests/data/input/semiglobal_alignment.fasta");
let mut reader = Reader::from_file(path).unwrap();
reader.read_all(&mut tile);
// Sequence alignment
let mut aligner = aligner::SequenceAligner::new();
let gap_penalty_open = -1;
let gap_penalty_enlarge = -2;
// Semiglobal alignment
let (align_score, a_align, b_align) = aligner
.semiglobal_alignment(
&tile[0],
&tile[1],
&score::blosum62,
gap_penalty_open,
gap_penalty_enlarge,
)
.unwrap();
print!("[A-B] Align score: {}", align_score);
print!("[A] Aligned: {}", a_align);
print!("[B] Aligned: {}", b_align);
let alphabet = HashSet::from([b'A', b'C', b'G', b'T']);
// Minimum number of times the common substring have to occur
let minimum_frequency = 6;
let lcs = longest_common_substring(&tile, &alphabet, minimum_frequency).unwrap();
println!("[TILE] LCS: {}", lcs);
}
依赖项
~8.5MB
~163K SLoC