5 releases

0.2.1	Apr 27, 2025
0.2.0	Apr 27, 2025
0.1.2	Apr 22, 2025
0.1.1	Apr 22, 2025
0.1.0	Apr 22, 2025

#56 in Asynchronous

414 downloads per month

MIT/Apache

460KB
9K SLoC

STT CLI (Speech-to-Text Command Line Interface)

A command-line tool for real-time speech-to-text transcription with AI (Groq and OpenAI).

Features

Real-time audio capture from microphone
Support for multiple transcription providers:
- Groq (using whisper-large-v3)
- OpenAI (using Whisper)
Efficient audio processing with proper chunking
Configurable activation modes: always-on or hotkey
Clean shutdown handling with Ctrl+C
Optional text insertion, auto-capitalization, and auto-punctuation

Installation

via cargo

cargo install stt-cli

Manual way

Make sure you have Rust installed on your system. If not, install it from rustup.rs

Clone the repository:

git clone https://github.com/TwistingTwists/stt-cli
cd stt-cli

Build the project:
```
cargo build --release
```

Usage

Run the CLI with your desired options:

stt-cli [OPTIONS]

Command-Line Options

Flag / Option	Description
`-d, --device <DEVICE>`	Audio device name to use
`-m, --mode <MODE>`	Transcription activation mode [default: always-on] [possible values: always-on, hotkey]
`-k, --hotkey <HOTKEY>`	Hotkey for toggling recording (when in hotkey mode) [default: ctrl+space]
`--data-dir <DATA_DIR>`	Directory to store data [default: data_dir]
`--debug`	Enable debug mode
`-t, --transcription-provider <TRANSCRIPTION_PROVIDER>`	Transcription provider [default: groq] [possible values: groq, open-ai]
`--enable-text-insertion`	Enable text insertion at cursor position
`--auto-capitalize`	Automatically capitalize first letter of transcribed text
`--auto-punctuate`	Automatically add trailing punctuation if missing
`-h, --help`	Print help
`-V, --version`	Print version

Examples

# Using Groq
stt-cli -t groq

# Using OpenAI
stt-cli -t open-ai

# Using hotkey mode with a custom hotkey
stt-cli -m hotkey -k "alt+r"

Environment Variables

Before running the application, make sure to set up the required API keys:

For Groq:

export GROQ_API_KEY='your-groq-api-key'

For OpenAI:

export OPENAI_API_KEY='your-openai-api-key'

Expected Output

When running the application, you'll see:

Initialization messages for audio device setup
Real-time transcription of your speech
Status messages for audio processing and API requests

Example:

Initializing audio device...
Audio capture started. Speak into your microphone.
[Transcription] "Hello, this is a test of the speech to text system."
...

Press Ctrl+C to gracefully stop the application.

Contributing

Contributions are welcome! Please feel free to submit a Issue.

License

This project is licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Dependencies

~34–70MB
~1M SLoC