1个不稳定版本

0.0.1	2024年7月23日

#9 在 #stt

每月 134 次下载

Apache-2.0 OR MIT

97KB
2K SLoC

Norma

一个易于使用且可扩展的纯Rust实时转录（语音识别）库。

模型

Whisper（支持完整的长格式解码）

示例

use std::{
    thread::{self, sleep},
    time::Duration,
};
use norma::{
    mic::Settings,
    models::whisper::monolingual,
    Transcriber,
};

// Define the model that will be used for transcription
let model = monolingual::Definition::new(
    monolingual::ModelType::DistilLargeEnV3,
    norma::models::SelectedDevice::Cpu, // Replace with Cuda(0) or Metal as needed
);

// Spawn the transcriber in a new std thread
let (jh, th) = Transcriber::blocking_spawn(model).unwrap();

// Start recording using the default microphone
let mut stream = th.blocking_start(Settings::default()).unwrap();

thread::spawn(move || while let Some(msg) = stream.blocking_recv() {
  println!("{}", msg);
});

sleep(Duration::from_secs(10));

// Stop the transcription and drop the TranscriberHandle,
// causing the transcriber to terminate
th.stop().unwrap();
drop(th);

// Join the thread that was spawned for the transcriber
jh.join().unwrap().unwrap();

音频后端

Norma使用cpal以支持多个音频后端。

这使我们能够支持

Linux（通过ALSA或JACK）
Windows（通过WASAPI）
macOS（通过CoreAudio）
iOS（通过CoreAudio）
Android（通过Oboe）

某些音频后端是可选的，并且只有在编译时启用功能标志才会编译。

JACK（在Linux上）：jack

Oboe可以使用共享或静态运行时。默认使用静态运行时，但激活oboe-shared-stdcxx功能会使它使用共享运行时，执行时需要从Android NDK中提供libc++_shared.so。

加速器

所有加速器都在models::SelectedDevice中定义。

CPU

使用CPU不需要任何额外的功能。

然而，在macOS上构建时，可以启用accelerate功能，以允许生成的程序利用Apple的Accelerate框架。

let device = SelectedDevice::Cpu;

CUDA和cuDNN

以下代码要编译，必须启用cuda或cudnn功能。

cuda功能标志要求在您的机器上安装并正确配置CUDA。一旦启用，程序将带有CUDA支持构建，并且运行代码的机器需要CUDA。

cudnn功能标志要求在您的机器上安装并正确配置cuDNN。一旦启用，程序将带有cuDNN支持构建，并且运行代码的机器需要CUDA和cuDNN。

let device = SelectedDevice::Cuda(ord);

其中ord是要使用的CUDA设备的ID。如果您只有一个设备或想使用默认值，将其设置为0。

Metal

使用Metal需要在macOS上使用metal功能标志编译程序。

let device = SelectedDevice::Metal;

依赖项

~29–67MB
~1M SLoC