3个版本 (1个稳定版)

1.0.0	2024年8月8日
0.4.1	2022年3月2日
0.4.0	2022年2月23日

#53 in 生物学

每月下载量：120

GPL-3.0 许可证

2MB
1K SLoC

基因GEM相关分析（GGCA）

高效计算两个数据集中所有对之间的相关系数（皮尔逊、斯皮尔曼或肯德尔）和双尾p值。它还支持CpG位点ID。

重要：GGCA是名为Multiomix的平台的核心。在官方网站上，您可以通过友好的图形界面快速敏捷地使用这个库（以及许多额外功能！）前往https://multiomix.org/开始使用！

Python PyPi | Rust Crate

索引

要求
用法
- Python
- Rust
贡献
注意事项

用法

在examples文件夹中有一些两种语言的示例。

Python

安装：pip install ggca
配置并调用correlate方法

import ggca


mrna_file_path = "mrna.csv"
gem_file_path = "mirna.csv"

try:
	(result_combinations, evaluated_combinations) = ggca.correlate(
		mrna_file_path,
		gem_file_path,
		correlation_method=ggca.CorrelationMethod.Pearson,
		correlation_threshold=0.5,
		sort_buf_size=2_000_000,
		adjustment_method=ggca.AdjustmentMethod.BenjaminiHochberg,
		all_vs_all=True,
		gem_contains_cpg=False,
		collect_gem_dataset=None,
		keep_top_n=2  # Keeps only top 2 elements
	)

	print(f'Number of resulting combinations: {len(result_combinations)} of {evaluated_combinations} evaluated combinations')
	for combination in result_combinations:
		print(
			combination.gene,
			combination.gem,
			combination.correlation,
			combination.p_value,
			combination.adjusted_p_value
		)
except ggca.GGCADiffSamplesLength as ex:
	print('Raised GGCADiffSamplesLength:', ex)
except ggca.GGCADiffSamples as ex:
	print('Raised GGCADiffSamples:', ex)
except ggca.InvalidCorrelationMethod as ex:
	print('Raised InvalidCorrelationMethod:', ex)
except ggca.InvalidAdjustmentMethod as ex:
	print('Raised InvalidAdjustmentMethod:', ex)
except ggca.GGCAError as ex:
	print('Raised GGCAError:', ex)

Rust

将crate添加到Cargo.toml： ggca = { version = "1.0.0", default-features = false }
创建分析并运行

use ggca::adjustment::AdjustmentMethod;
use ggca::analysis::Analysis;
use ggca::correlation::CorrelationMethod;

// File's paths
let df1_path = "mrna.csv";
let df2_path = "mirna.csv";

// Some parameters
let gem_contains_cpg = false;
let is_all_vs_all = true;
let keep_top_n = Some(10); // Keeps the top 10 of correlation (sorting by abs values)
let collect_gem_dataset = None; // Better performance. Keep small GEM files in memory

let analysis = Analysis::new_from_files(df1_path.to_string(), df2_path.to_string(), false);
let (result, number_of_elements_evaluated) = analysis.compute(
	CorrelationMethod::Pearson,
	0.7,
	2_000_000,
	AdjustmentMethod::BenjaminiHochberg,
	is_all_vs_all,
	collect_gem_dataset,
	keep_top_n,
)?;

println!("Number of elements -> {} of {} combinations evaluated", result.len(), number_of_elements_evaluated);

for cor_p_value in result.iter() {
	println!("{}", cor_p_value);
}

注意，使用env_logger crate来提供一些警告，以防某些mRNA/GEM组合产生NaN值（例如，因为输入数组有0标准差）。在这种情况下，您可以将RUST_LOG=warn添加到您的命令以在stderr中产生警告。例如

RUST_LOG=warn cargo test--测试

或

RUST_LOG=warn cargo run--示例基本

开发和贡献

欢迎各种帮助！请随意提交问题或PR。

为Rust构建：使用cargo build [--release]命令或运行examples文件夹中的示例，使用命令cargo run --example [示例名称]

在Python中构建和运行：运行cargo build [--release]，并按照官方说明导入编译后的库到Python脚本中。为Python构建（使用Maturin），并由CI maturin-actions生成。

测试所有的相关性、p值和调整后的p值都来自R编程语言中的cor.test和p.adjust函数以及Python语言中的statsmodels包。 small_files文件夹中的数据是通过从结直肠癌（TCGA，Nature 2012）数据集中随机采样获取的。此数据集可以从cBioPortal数据集页面或此直接链接下载。所有相关性结果都直接与R-Multiomics输出（仅适用于R语言的multiomix.org的旧版本）进行了比较。性能我们使用criterion.rs进行基准测试。如果你对项目做出了贡献，可以通过在修改前后运行cargo bench来检查项目中是否添加了回归。注意事项如果你使用了我们代码的任何部分，或者这个工具对你的研究有用，请考虑引用。 @article{camele2022multiomix, title={Multiomix: a cloud-based platform to infer cancer genomic and epigenomic events associated with gene expression modulation}, author={Camele, Genaro and Menazzi, Sebastian and Chanfreau, Hern{\'a}n and Marraco, Agustin and Hasperu{\'e}, Waldo and Butti, Matias D and Abba, Martin C}, journal={Bioinformatics}, volume={38}, number={3}, pages={866--868}, year={2022}, publisher={Oxford University Press} }

依赖项 ~15–27MB ~404K SLoC bincode csv env_logger 0.9 extsort fast-float itertools 0.9 lazy_static log pyo3 0.22.2+extension-module rayon serde serde_derive statrs 0.17.1 dev approx dev criterion 0.5.1 其他特性 extension-module