9 个版本

0.2.5	2022年7月28日
0.2.3	2022年5月13日
0.1.11	2022年5月11日
0.1.10	2021年12月10日
0.1.6	2021年11月10日

#128 in 音频

每月下载量：33

MIT/Apache

200KB
5K SLoC

DeepFilterNet

一个用于全频带音频（48kHz）的基于深度滤波的低复杂度语音增强框架。

新闻

原始 DeepFilterNet 论文： DeepFilterNet：一种基于深度滤波的全频带音频低复杂度语音增强框架
- 论文：https://arxiv.org/abs/2110.05588
- 示例：https://rikorose.github.io/DeepFilterNet-Samples/
- 演示：https://hugging-face.cn/spaces/hshr/DeepFilterNet
- 视频讲座：https://youtu.be/it90gBqkY6k
新的 DeepFilterNet2 论文： DeepFilterNet2：面向嵌入式设备的全频带音频实时语音增强
- 论文：https://arxiv.org/abs/2205.05474
- 示例：https://rikorose.github.io/DeepFilterNet2-Samples/
- 演示：https://hugging-face.cn/spaces/hshr/DeepFilterNet2

用法

此框架支持 Linux、MacOS 和 Windows。训练仅在 Linux 下进行测试。框架结构如下

libDF 包含用于数据加载和增强的 Rust 代码。
DeepFilterNet 包含 DeepFilterNet 代码，包括训练、评估和可视化以及预训练模型权重。
pyDF 包含 libDF STFT/ISTFT 处理循环的 Python 包装器。
pyDF-data 包含 libDF 数据集功能的 Python 包装器，并提供 PyTorch 数据加载器。

PyPI

使用 pip 安装 DeepFilterNet Python 包

# Install cpu/cuda pytorch (>=1.8) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install DeepFilterNet
pip install deepfilternet
# Or install DeepFilterNet including data loading functionality for training (Linux only)
pip install deepfilternet[train]

要使用 DeepFilterNet 增强噪声音频文件，请运行

# Specify an output directory with --output-dir [OUTPUT_DIR]
deepFilter path/to/noisy_audio.wav

手动安装

通过 rustup 安装 cargo。建议使用 conda 或 virtualenv。

安装 Python 依赖项和 libDF

cd path/to/DeepFilterNet/  # cd into repository
# Recommended: Install or activate a python env
# Mandatory: Install cpu/cuda pytorch (>=1.8) dependency from pytorch.org, e.g.:
pip install torch torchaudio -f https://download.pytorch.org/whl/cpu/torch_stable.html
# Install build dependencies used to compile libdf and DeepFilterNet python wheels
pip install maturin poetry
# Build and install libdf python package required for enhance.py
maturin develop --release -m pyDF/Cargo.toml
# Optional: Install libdfdata python package with dataset and dataloading functionality for training
# Required build dependency: HDF5 headers (e.g. ubuntu: libhdf5-dev)
maturin develop --release -m pyDF-data/Cargo.toml
# Install remaining DeepFilterNet python dependencies
cd DeepFilterNet
poetry install -E train -E eval # Note: This globally installs DeepFilterNet in your environment
# Alternatively for developement: Install only dependencies and work with the repository version
poetry install -E train -E eval --no-root
# You may need to set the python path
export PYTHONPATH=$PWD

要使用 DeepFilterNet 增强噪声音频文件，请运行

$ python DeepFilterNet/df/enhance.py --help
usage: enhance.py [-h] [--model-base-dir MODEL_BASE_DIR] [--pf] [--output-dir OUTPUT_DIR] [--log-level LOG_LEVEL] [--compensate-delay]
                  noisy_audio_files [noisy_audio_files ...]

positional arguments:
  noisy_audio_files     List of noise files to mix with the clean speech file.

optional arguments:
  -h, --help            show this help message and exit
  --model-base-dir MODEL_BASE_DIR, -m MODEL_BASE_DIR
                        Model directory containing checkpoints and config.
                        To load a pretrained model, you may just provide the model name, e.g. `DeepFilterNet`.
                        By default, the pretrained DeepFilterNet2 model is loaded.
  --pf                  Post-filter that slightly over-attenuates very noisy sections.
  --output-dir OUTPUT_DIR, -o OUTPUT_DIR
                        Directory in which the enhanced audio files will be stored.
  --log-level LOG_LEVEL
                        Logger verbosity. Can be one of (debug, info, error, none)
  --compensate-delay, -D
                        Add some paddig to compensate the delay introduced by the real-time STFT/ISTFT implementation.

# Enhance audio with original DeepFilterNet
python DeepFilterNet/df/enhance.py -m DeepFilterNet path/to/noisy_audio.wav

# Enhance audio with DeepFilterNet2
python DeepFilterNet/df/enhance.py -m DeepFilterNet2 path/to/noisy_audio.wav

训练

入口点是 DeepFilterNet/df/train.py。它期望包含 HDF5 数据集以及数据集配置 json 文件的目录。

因此，首先您需要创建自己的HDF5格式数据集。每个数据集通常只包含噪声、语音或RIR的训练、验证或测试集。

# Install additional dependencies for dataset creation
pip install h5py librosa soundfile
# Go to DeepFilterNet python package
cd path/to/DeepFilterNet/DeepFilterNet
# Prepare text file (e.g. called training_set.txt) containing paths to .wav files
#
# usage: prepare_data.py [-h] [--num_workers NUM_WORKERS] [--max_freq MAX_FREQ] [--sr SR] [--dtype DTYPE]
#                        [--codec CODEC] [--mono] [--compression COMPRESSION]
#                        type audio_files hdf5_db
#
# where:
#   type: One of `speech`, `noise`, `rir`
#   audio_files: Text file containing paths to audio files to include in the dataset
#   hdf5_db: Output HDF5 dataset.
python df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5

所有数据集应在一个数据集文件夹中提供给训练脚本。

数据集配置文件应包含3个条目：“train”、“valid”、“test”。其中每个都包含一个数据集列表（例如，一个语音、噪声和一个RIR数据集）。您可以使用多个语音或噪声数据集。可选地，可以指定一个采样因子，用于对数据集进行过采样或欠采样。例如，您有一个具有瞬态噪声的特定数据集，并希望通过过采样增加非平稳噪声的数量。在大多数情况下，您希望将此因子设置为1。

数据集配置示例

数据集.配置

{
  "train": [
    [
      "TRAIN_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "TRAIN_SET_NOISE.hdf5",
      1.0
    ],
    [
      "TRAIN_SET_RIR.hdf5",
      1.0
    ]
  ],
  "valid": [
    [
      "VALID_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "VALID_SET_NOISE.hdf5",
      1.0
    ],
    [
      "VALID_SET_RIR.hdf5",
      1.0
    ]
  ],
  "test": [
    [
      "TEST_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "TEST_SET_NOISE.hdf5",
      1.0
    ],
    [
      "TEST_SET_RIR.hdf5",
      1.0
    ]
  ]
}

最后，启动训练脚本。如果不存在，训练脚本可能会创建一个用于记录、一些音频样本、模型检查点和配置的基础目录。如果没有找到配置文件，它将创建一个默认配置。请参阅DeepFilterNet/pretrained_models/DeepFilterNet以获取配置文件。

# usage: train.py [-h] [--debug] data_config_file data_dir base_dir
python df/train.py path/to/dataset.cfg path/to/data_dir/ path/to/base_dir/

引用指南

如果您使用此框架，请引用：DeepFilterNet：基于深度滤波的低复杂度全频带语音增强框架

@inproceedings{schroeter2022deepfilternet,
      title={DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering}, 
      author={Hendrik Schröter and Alberto N. Escalante-B. and Tobias Rosenkranz and Andreas Maier},
      booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
      year={2022},
      organization={IEEE}
}

如果您使用DeepFilterNet2模型，请引用：DeepFilterNet2：面向嵌入式设备的全频带语音实时语音增强

@misc{schroeter2022deepfilternet2,
  title = {{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio},
  author = {Schröter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
  publisher = {arXiv},
  year = {2022},
  url = {https://arxiv.org/abs/2205.05474},
}

许可证

DeepFilterNet是免费和开源的！此存储库中的所有代码均根据您的选择双重许可，在以下任一许可证下

MIT许可证（《LICENSE-MIT》或https://open-source.org.cn/licenses/MIT）
Apache许可证，版本2.0（《LICENSE-APACHE》或https://apache.ac.cn/licenses/LICENSE-2.0》）

这意味着您可以选择您喜欢的许可证！

除非您明确声明，否则根据Apache-2.0许可证定义的您有意提交以包含在工作中的任何贡献，均将根据上述条款双重许可，而无需任何其他条款或条件。

依赖项

~3–15MB
~166K SLoC