9 个版本
0.2.5 | 2022年7月28日 |
---|---|
0.2.3 | 2022年5月13日 |
0.1.11 | 2022年5月11日 |
0.1.10 | 2021年12月10日 |
0.1.6 | 2021年11月10日 |
#128 in 音频
每月下载量:33
200KB
5K SLoC
DeepFilterNet
一个用于全频带音频(48kHz)的基于深度滤波的低复杂度语音增强框架。
新闻
-
原始 DeepFilterNet 论文: DeepFilterNet:一种基于深度滤波的全频带音频低复杂度语音增强框架
-
新的 DeepFilterNet2 论文: DeepFilterNet2:面向嵌入式设备的全频带音频实时语音增强
用法
此框架支持 Linux、MacOS 和 Windows。训练仅在 Linux 下进行测试。框架结构如下
libDF
包含用于数据加载和增强的 Rust 代码。DeepFilterNet
包含 DeepFilterNet 代码,包括训练、评估和可视化以及预训练模型权重。pyDF
包含 libDF STFT/ISTFT 处理循环的 Python 包装器。pyDF-data
包含 libDF 数据集功能的 Python 包装器,并提供 PyTorch 数据加载器。
PyPI
使用 pip 安装 DeepFilterNet Python 包
# Install cpu/cuda pytorch (>=1.8) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install DeepFilterNet
pip install deepfilternet
# Or install DeepFilterNet including data loading functionality for training (Linux only)
pip install deepfilternet[train]
要使用 DeepFilterNet 增强噪声音频文件,请运行
# Specify an output directory with --output-dir [OUTPUT_DIR]
deepFilter path/to/noisy_audio.wav
手动安装
通过 rustup 安装 cargo。建议使用 conda
或 virtualenv
。
安装 Python 依赖项和 libDF
cd path/to/DeepFilterNet/ # cd into repository
# Recommended: Install or activate a python env
# Mandatory: Install cpu/cuda pytorch (>=1.8) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install build dependencies used to compile libdf and DeepFilterNet python wheels
pip install maturin poetry
# Build and install libdf python package required for enhance.py
maturin develop --release -m pyDF/Cargo.toml
# Optional: Install libdfdata python package with dataset and dataloading functionality for training
# Required build dependency: HDF5 headers (e.g. ubuntu: libhdf5-dev)
maturin develop --release -m pyDF-data/Cargo.toml
# Install remaining DeepFilterNet python dependencies
cd DeepFilterNet
poetry install -E train -E eval # Note: This globally installs DeepFilterNet in your environment
# Alternatively for developement: Install only dependencies and work with the repository version
poetry install -E train -E eval --no-root
# You may need to set the python path
export PYTHONPATH=$PWD
要使用 DeepFilterNet 增强噪声音频文件,请运行
$ python DeepFilterNet/df/enhance.py --help
usage: enhance.py [-h] [--model-base-dir MODEL_BASE_DIR] [--pf] [--output-dir OUTPUT_DIR] [--log-level LOG_LEVEL] [--compensate-delay]
noisy_audio_files [noisy_audio_files ...]
positional arguments:
noisy_audio_files List of noise files to mix with the clean speech file.
optional arguments:
-h, --help show this help message and exit
--model-base-dir MODEL_BASE_DIR, -m MODEL_BASE_DIR
Model directory containing checkpoints and config.
To load a pretrained model, you may just provide the model name, e.g. `DeepFilterNet`.
By default, the pretrained DeepFilterNet2 model is loaded.
--pf Post-filter that slightly over-attenuates very noisy sections.
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
Directory in which the enhanced audio files will be stored.
--log-level LOG_LEVEL
Logger verbosity. Can be one of (debug, info, error, none)
--compensate-delay, -D
Add some paddig to compensate the delay introduced by the real-time STFT/ISTFT implementation.
# Enhance audio with original DeepFilterNet
python DeepFilterNet/df/enhance.py -m DeepFilterNet path/to/noisy_audio.wav
# Enhance audio with DeepFilterNet2
python DeepFilterNet/df/enhance.py -m DeepFilterNet2 path/to/noisy_audio.wav
训练
入口点是 DeepFilterNet/df/train.py
。它期望包含 HDF5 数据集以及数据集配置 json 文件的目录。
因此,首先您需要创建自己的HDF5格式数据集。每个数据集通常只包含噪声、语音或RIR的训练、验证或测试集。
# Install additional dependencies for dataset creation
pip install h5py librosa soundfile
# Go to DeepFilterNet python package
cd path/to/DeepFilterNet/DeepFilterNet
# Prepare text file (e.g. called training_set.txt) containing paths to .wav files
#
# usage: prepare_data.py [-h] [--num_workers NUM_WORKERS] [--max_freq MAX_FREQ] [--sr SR] [--dtype DTYPE]
# [--codec CODEC] [--mono] [--compression COMPRESSION]
# type audio_files hdf5_db
#
# where:
# type: One of `speech`, `noise`, `rir`
# audio_files: Text file containing paths to audio files to include in the dataset
# hdf5_db: Output HDF5 dataset.
python df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5
所有数据集应在一个数据集文件夹中提供给训练脚本。
数据集配置文件应包含3个条目:“train”、“valid”、“test”。其中每个都包含一个数据集列表(例如,一个语音、噪声和一个RIR数据集)。您可以使用多个语音或噪声数据集。可选地,可以指定一个采样因子,用于对数据集进行过采样或欠采样。例如,您有一个具有瞬态噪声的特定数据集,并希望通过过采样增加非平稳噪声的数量。在大多数情况下,您希望将此因子设置为1。
数据集配置示例
数据集.配置
{
"train": [
[
"TRAIN_SET_SPEECH.hdf5",
1.0
],
[
"TRAIN_SET_NOISE.hdf5",
1.0
],
[
"TRAIN_SET_RIR.hdf5",
1.0
]
],
"valid": [
[
"VALID_SET_SPEECH.hdf5",
1.0
],
[
"VALID_SET_NOISE.hdf5",
1.0
],
[
"VALID_SET_RIR.hdf5",
1.0
]
],
"test": [
[
"TEST_SET_SPEECH.hdf5",
1.0
],
[
"TEST_SET_NOISE.hdf5",
1.0
],
[
"TEST_SET_RIR.hdf5",
1.0
]
]
}
最后,启动训练脚本。如果不存在,训练脚本可能会创建一个用于记录、一些音频样本、模型检查点和配置的基础目录
。如果没有找到配置文件,它将创建一个默认配置。请参阅DeepFilterNet/pretrained_models/DeepFilterNet以获取配置文件。
# usage: train.py [-h] [--debug] data_config_file data_dir base_dir
python df/train.py path/to/dataset.cfg path/to/data_dir/ path/to/base_dir/
引用指南
如果您使用此框架,请引用:DeepFilterNet:基于深度滤波的低复杂度全频带语音增强框架
@inproceedings{schroeter2022deepfilternet,
title={DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering},
author={Hendrik Schröter and Alberto N. Escalante-B. and Tobias Rosenkranz and Andreas Maier},
booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2022},
organization={IEEE}
}
如果您使用DeepFilterNet2模型,请引用:DeepFilterNet2:面向嵌入式设备的全频带语音实时语音增强
@misc{schroeter2022deepfilternet2,
title = {{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio},
author = {Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
publisher = {arXiv},
year = {2022},
url = {https://arxiv.org/abs/2205.05474},
}
许可证
DeepFilterNet是免费和开源的!此存储库中的所有代码均根据您的选择双重许可,在以下任一许可证下
- MIT许可证(《LICENSE-MIT》或http://opensource.org/licenses/MIT)
- Apache许可证,版本2.0(《LICENSE-APACHE》或https://apache.ac.cn/licenses/LICENSE-2.0》)
这意味着您可以选择您喜欢的许可证!
除非您明确声明,否则根据Apache-2.0许可证定义的您有意提交以包含在工作中的任何贡献,均将根据上述条款双重许可,而无需任何其他条款或条件。
依赖项
~3–15MB
~166K SLoC