6个版本

新 0.3.3	2024年8月24日
0.3.2	2024年8月14日
0.2.1	2024年2月28日
0.1.0	2023年12月16日

#233 在音频

274 每月下载
用于 3 个包（通过 kalosm）

MIT/Apache

725KB
2.5K SLoC

Kalosm Sound

Kalosm Sound 是 Kalosm 框架的一组音频模型和实用工具。它支持多种语音活动检测模型，并提供将音频转录为文本的实用工具。

音频流

kalosm sound 中的模型与任何 AsyncSource 一起工作。您可以使用 MicInput::stream 从麦克风流式传输音频，或任何实现 rodio::Source 的同步音频源，如 mp3 或 wav 文件。

您可以使用以下方式转换音频流

VoiceActivityDetectorExt::voice_activity_stream：在音频数据中检测语音活动
DenoisedExt::denoise_and_detect_voice_activity：降噪音频数据并检测语音活动
AsyncSourceTranscribeExt::transcribe：根据语音活动将音频流切分成块，然后转录切分后的音频数据
VoiceActivityStreamExt::rechunk_voice_activity：根据语音活动将音频流切分成块
VoiceActivityStreamExt::filter_voice_activity：根据语音活动过滤音频数据块
TranscribeChunkedAudioStreamExt::transcribe：转录切分的音频流

语音活动检测

VAD模型用于检测在特定音频流中说话者何时在说话。使用VAD模型的最简单方法是创建一个音频流并调用VoiceActivityDetectorExt::voice_activity_stream以流式传输正在被说话的音频块

use kalosm::sound::*;
#[tokio::main]
async fn main() {
    // Get the default microphone input
    let mic = MicInput::default();
    // Stream the audio from the microphone
    let stream = mic.stream().unwrap();
    // Detect voice activity in the audio stream
    let mut vad = stream.voice_activity_stream();
    while let Some(input) = vad.next().await {
        println!("Probability: {}", input.probability);
    }
}

Kalosm还提供了VoiceActivityStreamExt::rechunk_voice_activity以收集具有高VAD概率的连续音频样本块。这对于语音识别等应用非常有用，其中连续音频样本之间的上下文很重要。

use kalosm::sound::*;
use rodio::Source;
#[tokio::main]
async fn main() {
    // Get the default microphone input
    let mic = MicInput::default();
    // Stream the audio from the microphone
    let stream = mic.stream().unwrap();
    // Chunk the audio into chunks of speech
    let vad = stream.voice_activity_stream();
    let mut audio_chunks = vad.rechunk_voice_activity();
    // Print the chunks as they are streamed in
    while let Some(input) = audio_chunks.next().await {
        println!("New voice activity chunk with duration {:?}", input.total_duration());
    }
}

转录

您可以使用Whisper模型将音频转录成文本。Kalosm可以使用AsyncSource和AsyncSourceTranscribeExt::transcribe方法将任何音频转录成转录流

use kalosm::sound::*;
#[tokio::main]
async fn main() {
    // Get the default microphone input
    let mic = MicInput::default();
    // Stream the audio from the microphone
    let stream = mic.stream().unwrap();
    // Transcribe the audio into text with the default Whisper model
    let mut transcribe = stream.transcribe(Whisper::new().await.unwrap());
    // Print the text as it is streamed in
    transcribe.to_std_out().await.unwrap();
}

依赖关系

~34–73MB
~1.5M SLoC