7 个版本

0.3.2	2024年4月29日
0.3.1	2024年2月28日
0.1.3	2023年11月8日
0.1.0	2023年10月25日

87 在机器学习中排名

每月下载量：158
用于 aio-cli

MIT/Apache 许可

9.5MB
105K SLoC

llama_cpp-rs

安全、高级的 Rust 绑定到同名的 C++ 项目 llama.cpp，旨在尽可能易于使用。只需十五行代码即可在您的 CPU 上运行基于 GGUF 的大语言模型，无需 ML 经验！

// Create a model from anything that implements `AsRef<Path>`:
let model = LlamaModel::load_from_file("path_to_model.gguf", LlamaParams::default()).expect("Could not load model");

// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be
// several gigabytes large, a session is typically a few dozen to a hundred megabytes!
let mut ctx = model.create_session(SessionParams::default()).expect("Failed to create session");

// You can feed anything that implements `AsRef<[u8]>` into the model's context.
ctx.advance_context("This is the story of a man named Stanley.").unwrap();

// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!
let max_tokens = 1024;
let mut decoded_tokens = 0;

// `ctx.start_completing_with` creates a worker thread that generates tokens. When the completion
// handle is dropped, tokens stop generating!
let mut completions = ctx.start_completing_with(StandardSampler::default(), 1024).into_strings();

for completion in completions {
    print!("{completion}");
    let _ = io::stdout().flush();
    
    decoded_tokens += 1;
    
    if decoded_tokens > max_tokens {
        break;
    }
}

此仓库托管了高级绑定（crates/llama_cpp）以及自动生成的 llama.cpp 的低级 C API 绑定（crates/llama_cpp_sys）。欢迎贡献力量--只需保持 UX 简洁即可！

构建

请注意，llama.cpp 计算量非常大，这意味着标准的调试构建（仅运行 cargo build/cargo run）将因缺乏优化而受到严重影响。因此，除非真的需要进行调试，否则强烈建议使用 Cargo 的 --release 标志进行构建和运行。

Cargo 功能

通过功能支持 llama.cpp 的几个后端

cuda - 启用 CUDA 后端，如果启用此功能，则编译需要 CUDA 工具包。
vulkan - 启用 Vulkan 后端，如果启用此功能，则编译需要 Vulkan SDK。
metal - 启用 Metal 后端，仅限 macOS。
hipblas - 启用 hipBLAS/ROCm 后端，如果启用此功能，则编译需要 ROCm。

实验性

这些绑定提供的一项功能是预测内存中的上下文大小，但应注意的是，这是一个高度实验性的功能，因为这不是llama.cpp本身提供的内容。返回的值可能非常不准确，但会尽量确保返回的值不低于实际大小。

许可证

MIT或Apache-2.0，任选其一（“Rust”许可证）。请参阅LICENSE-MIT和LICENSE-APACHE。

依赖项

~3–11MB
~221K SLoC