#语音识别 #语音 #认知 #微软 #机器人框架 #识别 #合成

cognitive-services-speech-sdk-rs

微软语音SDK的Rust绑定

15个版本 (5个稳定版)

1.0.4 2024年6月12日
1.0.2 2024年5月13日
0.3.1 2024年4月24日
0.2.3 2023年12月2日
0.1.4 2021年6月18日

#40 in 音频

Download history 193/week @ 2024-04-27 148/week @ 2024-05-04 172/week @ 2024-05-11 29/week @ 2024-05-18 2/week @ 2024-05-25 4/week @ 2024-06-01 233/week @ 2024-06-08 38/week @ 2024-06-15 1/week @ 2024-06-22 9/week @ 2024-07-20 44/week @ 2024-07-27 22/week @ 2024-08-10

每月75次下载

MIT/Apache

455KB
10K SLoC

cognitive-services-speech-sdk-rs


License License: MIT Crates.io docs.rs CI

Rust绑定微软认知语音服务SDK。围绕原生C API提供轻量级抽象。深受官方Go库的启发。提供语音转文本、文本转语音和机器人框架对话管理功能。

欢迎pull请求!

语音转文本

use cognitive_services_speech_sdk_rs as msspeech;
use log::*;
use std::env;

async fn speech_to_text() {
    let filename = env::var("WAVFILENAME").unwrap();
    let audio_config = msspeech::audio::AudioConfig::from_wav_file_input(&filename).unwrap();

    let speech_config = msspeech::speech::SpeechConfig::from_subscription(
        env::var("MSSubscriptionKey").unwrap(),
        env::var("MSServiceRegion").unwrap(),
    )
    .unwrap();
    let mut speech_recognizer =
        msspeech::speech::SpeechRecognizer::from_config(speech_config, audio_config).unwrap();

    speech_recognizer
        .set_session_started_cb(|event| info!("set_session_started_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_session_stopped_cb(|event| info!("set_session_stopped_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_speech_start_detected_cb(|event| info!("set_speech_start_detected_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_speech_end_detected_cb(|event| info!("set_speech_end_detected_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_recognizing_cb(|event| info!("set_recognizing_cb {:?}", event.result.text))
        .unwrap();

    speech_recognizer
        .set_recognized_cb(|event| info!("set_recognized_cb {:?}", event))
        .unwrap();

    speech_recognizer
        .set_canceled_cb(|event| info!("set_canceled_cb {:?}", event))
        .unwrap();

    let result = speech_recognizer.recognize_once_async().await.unwrap();
    info!("got recognition {:?}", result);
}

文本转语音

use cognitive_services_speech_sdk_rs as msspeech;
use log::*;
use std::env;

async fn text_to_speech() {
    let pull_stream = msspeech::audio::PullAudioOutputStream::create_pull_stream().unwrap();
    let audio_config = msspeech::audio::AudioConfig::from_stream_output(&pull_stream).unwrap();

    let speech_config = msspeech::speech::SpeechConfig::from_subscription(
        env::var("MSSubscriptionKey").unwrap(),
        env::var("MSServiceRegion").unwrap(),
    )
    .unwrap();
    let mut speech_synthesizer =
        msspeech::speech::SpeechSynthesizer::from_config(speech_config, audio_config).unwrap();

    speech_synthesizer
        .set_synthesizer_started_cb(|event| info!("synthesizer_started_cb {:?}", event))
        .unwrap();

    speech_synthesizer
        .set_synthesizer_synthesizing_cb(|event| info!("synthesizer_synthesizing_cb {:?}", event))
        .unwrap();

    speech_synthesizer
        .set_synthesizer_completed_cb(|event| info!("synthesizer_completed_cb {:?}", event))
        .unwrap();

    speech_synthesizer
        .set_synthesizer_canceled_cb(|event| info!("synthesizer_canceled_cb {:?}", event))
        .unwrap();

    match speech_synthesizer.speak_text_async("Hello Rust!").await {
        Err(err) => error!("speak_text_async error {:?}", err),
        Ok(speech_audio_bytes) => {
            info!("speech_audio_bytes {:?}", speech_audio_bytes);
        }
    }
}

更多信息请参阅GitHub集成测试(tests文件夹)和示例(examples文件夹)。

构建先决条件

目前支持在Windows、Linux和MacOS上构建。使用Clang和微软语音SDK共享库。详细信息请参阅此处

在运行cargo build之前安装以下先决条件

sudo apt-get update 
sudo apt-get install clang build-essential libssl1.0.0 libasound2 wget

构建过程会生成Speech SDK原生函数的Rust绑定。这些绑定已经预先构建并放入ffi/bindings.rs文件中。在大多数情况下,不需要重新生成它们。设置以下内容以跳过绑定重新生成

export MS_COG_SVC_SPEECH_SKIP_BINDGEN=1
cargo build

构建过程将MS Speech SDK下载到目标文件夹。从这里您可以将其复制到其他文件夹,例如./SpeechSDK。在运行编译后的二进制文件时,应使用动态链接

Linux

export LD_LIBRARY_PATH=/Users/xxx/cognitive-services-speech-sdk-rs/SpeechSDK/macOS/sdk_output/MicrosoftCognitiveServicesSpeech.xcframework/macos-arm64_x86_64

MacOS

export DYLD_FALLBACK_FRAMEWORK_PATH=/Users/xxx/cognitive-services-speech-sdk-rs/SpeechSDK/macOS/sdk_output/MicrosoftCognitiveServicesSpeech.xcframework/macos-arm64_x86_64

Windows(指向目标文件夹中的SpeechSDK)

set PATH=%PATH%;"C:\Users\xxx\cognitive-services-speech-sdk-rs\target\debug\build\cognitive-services-speech-sdk-rs-b9c946c378fbb4f1\out\sdk_output\runtimes\win-x64\native"

如何在MacOS上构建

我们支持MacOS的armaarch64x86_64架构。

运行以下命令以构建

cargo build

在构建和运行过程中,将动态链接Speech SDK库。在运行应用程序时,使用以下环境变量指向自定义库位置

export DYLD_FALLBACK_FRAMEWORK_PATH=/Users/xxx/cognitive-services-speech-sdk-rs/SpeechSDK/macOS/sdk_output/MicrosoftCognitiveServicesSpeech.xcframework/macos-arm64_x86_64

然后运行您的应用程序,利用cognitive-services-speech-sdk-rs或示例,例如

cargo run --example recognizer

本版本新增功能

查看变更日志

依赖项

~3–12MB
~115K SLoC