whisper-overlay — Rust 命令行工具 // Lib.rs

1 个稳定版本

1.0.0	2024年6月23日

在命令行工具中排名 #783

MIT 许可协议

60KB
1K SLoC

安装和使用

💬 whisper-overlay

一个 Wayland 窗口叠加工具，通过全局按住说话热键为任何应用程序提供语音到文本功能。在按住热键的同时说话的内容将被实时转录并在屏幕上显示。实时转录使用一个更快但不太准确的模型，但一旦你暂停说话或释放热键，转录将使用第二个更准确的模型更新。然后，将此文本输入到当前聚焦的窗口中。

使用 CUDA 和更快的whisper在屏幕上进行实时转录
基于服务器-客户端的架构允许你在另一台机器上托管模型
与 waybar 集成以显示状态
使用 layer-shell 和 virtual-keyboard-v1 支持大多数 Wayland 合成器

使用 RealtimeSTT Python 库提供实时转录，该库反过来使用 faster-whisper 作为实际实时和高保真转录模型。

要求

一个 Wayland 合成器（sway, hyprland, ...）
强烈建议使用支持 CUDA 的 GPU，否则即使在现代 CPU 上翻译也会有显著的延迟（实时转录的延迟为 1 秒，结果为 ~5 秒）

🚀 快速入门

克隆存储库

git clone https://github.com/oddlama/whisper-overlay
cd whisper-overlay

使用 docker 运行实时-stt-server
```
docker-compose up
```

安装并运行 whisper-overlay

cargo install whisper-overlay
whisper-overlay overlay
# Or alternatively select a hotkey:
#whisper-overlay overlay --hotkey KEY_F12

现在按住 右 Ctrl 键进行转录。对于永久安装，我建议将服务器作为 systemd 服务启动，并将 whisper-overlay overlay 添加到桌面环境/合成器的启动命令中。

⚙️ 使用

原则上，你只需要启动 ./realtime-stt-server.py，它就会监听 localhost:7007 上的请求。然后你可以启动 whisper-overlay overlay 来转录文本。默认的热键是 右 Ctrl，但你可以通过指定 evdev::Key 的任何名称来更改它，例如 KEY_F12 对应 F12。请注意，热键仅被观察，并且仍会被传递给当前焦点中的应用程序。

服务器（realtime-stt-server）

如果你想更改服务器设置，它有以下选项

> realtime-stt-server.py --help
usage: realtime-stt-server.py [-h] [--host HOST] [--port PORT] [--device DEVICE] [--model MODEL]
                              [--model-realtime MODEL_REALTIME] [--language LANGUAGE] [--debug]

options:
  -h, --help            show this help message and exit
  --host HOST           The host to listen on [default: 'localhost']
  --port PORT           The port to listen on [default: 7007]
  --device DEVICE       Device to run the models on, defaults to cuda if available, else cpu [default: 'cuda']
  --model MODEL         Main model used to generate the final transcription [default: 'large-v3']
  --model-realtime MODEL_REALTIME
                        Faster model used to generate live transcriptions [default: 'base']
  --language LANGUAGE   Set the spoken language. Leave empty to auto-detect. [default: '']
  --debug               Enable debug log output [default: unset]

客户端（whisper-overlay）

实际的叠加层也可以进行自定义，例如通过提供你自己的 gtk 风格（参考内置的 style.css），或者通过更改热键。它有以下选项

> whisper-overlay overlay --help
Usage: whisper-overlay overlay [OPTIONS]

Options:
  -a, --address <ADDRESS>  The address of the the whisper streaming instance (host:port) [default: localhost:7007]
  -s, --style <STYLE>      An optional stylesheet for the overlay, which replaces the internal style
      --hotkey <HOTKEY>    Specifies the hotkey to activate voice input. You can use any key or button name from [evdev::Key](https://docs.rs/evdev/latest/evdev/struct.Key.html) [default: KEY_RIGHTCTRL]
  -h, --help               Print help

📦 安装

❄️ 🐳 Docker & cargo

为了快速简单的安装，你可以使用 Docker 运行服务器，并通过 cargo 直接安装叠加层。

git clone https://github.com/oddlama/whisper-overlay
cd whisper-overlay

# Start realtime-stt-server
docker-compose up

# Install and run overlay
cargo install whisper-overlay
whisper-overlay overlay

❄️ NixOS

此应用程序附带 NixOS 模块和叠加层，你可以轻松访问相关包并托管实时 stt 服务器。首先，将此 flake 添加为输入

{
  inputs = {
    # ...
    whisper-overlay.url = "github:oddlama/whisper-overlay";
    whisper-overlay.inputs.nixpkgs.follows = "nixpkgs";
  };
}

然后添加此 flake 提供的 nixos 模块，并在你的 configuration.nix 中启用实时 stt-server。此外，将相关包添加到您的系统或用户，以便以后启动。

{
  imports = [
    inputs.whisper-overlay.nixosModules.default
  ];

  # Also make sure to enable cuda support in nixpkgs, otherwise transcription will
  # be painfully slow. But be prepared to let your computer build packages for 2-3 hours.
  nixpkgs.config.cudaSupport = true;

  services.realtime-stt-server.enable = true;
  environment.systemPackages = [pkgs.whisper-overlay];
}

现在服务器将随着您的系统自动启动，您可以用您的用户身份运行 whisper-overlay overlay。你可能想要添加这个。

🧰 手动

首先，安装并启动服务器

# Create virtualenv
python -m venv venv
source venv/bin/activate

# Install RealtimeSTT (fork)
# Follow this for GPU support:
# https://github.com/KoljaB/RealtimeSTT?tab=readme-ov-file#gpu-support-with-cuda-recommended
git clone https://github.com/oddlama/RealtimeSTT
cd RealtimeSTT
pip install -r requirements.txt
cd ..

# Run server script
git clone https://github.com/oddlama/whisper-overlay
python ./realtime-stt-server.py

其次，通过从源启动客户端来启动叠加层

# Clone repository (or reuse the previous checkout)
git clone https://github.com/oddlama/whisper-overlay
cargo build --release
./target/release/whisper-overlay overlay

🌟 Waybar 集成

whisper-overlay 原生支持 waybar 状态命令，可以在 waybar 中显示服务器状态。

将此添加到你的 waybar 配置

"custom/whisper_overlay": {
    "escape": true,
    "exec": "/path/to/whisper-overlay waybar-status",
    "format": "{icon} {}",
    "format-icons": {
        "disconnected": "<span foreground='gray'></span>",
        "connected": "<span foreground='#4ab0fa'></span>",
        "connected-active": "<span foreground='red'></span>"
    },
    "return-type": "json",
    "tooltip": true
},

并在某处实例化模块

"modules-left": [
    // ...
    "custom/whisper_overlay"
    // ...
],

❌ 局限性

需要 RealtimeSTT 分支

目前，你需要使用我的 RealtimeSTT 分支，它允许客户端读取令牌概率并解决一些关闭问题。已请求将其合并到上游，所以希望这不会长期需要。

单个活动客户端

提供的 realtime-stt-server 实现允许你将服务器托管在您的机器上，或者在您的网络中的另一台机器上。我们的实现技术上是准备支持多个客户端的，但由于 RealtimeSTT 的工作方式，它目前不能同时处理多个请求。所以你将不得不等待其他客户端断开连接，然后你的转录才能开始。

仅支持 Wayland

目前，此项目需要使用支持 layer-shell 和 virtual-keyboard-v1 协议扩展的 wayland 组合器。因此，它应该在基于 wlroots 的组合器（sway、...）和 hyprland 上正常工作。目前没有计划支持 X11。有一个为 X11 部分实现的分支，但让 GTK4 创建一个可靠的叠加窗口已被证明是困难的，并且与 enigo（用于虚拟输入的 Rust 库）一起使用时，自动输入无法正常工作。但当然，如果有人知道如何解决剩余的问题，我很乐意接受这方面的贡献。

通过 evdev 的全局热键

全局热键是通过evdev检测的，因为我没有成功让GlobalShortcuts桌面门户使用层壳协议（相关问题）与Windows一起工作。将来这可能会改变，但现在您的用户必须属于input组才能使它工作。

📜 许可证

根据MIT许可证（LICENSE 或 https://open-source.org.cn/licenses/MIT）许可。除非您明确声明，否则您有意提交给本项目并由您包含在内的任何贡献，都将按照上述方式许可，不附加任何额外条款或条件。

依赖项

~30–66MB
~1M SLoC