7个版本
0.0.7 | 2024年6月29日 |
---|---|
0.0.6 | 2024年6月9日 |
0.0.3 | 2024年5月28日 |
#92 in 科学
40 每月下载量
用于 llm_client
7MB
4.5K SLoC
llm_utils: 无链,只有工具
- 主要开源模型的分词器。
- 平衡 的SotA文本分块器,具有快速并行化实现。
- 用于本地加载模型的预设。为您的GPU计算最佳量化值。
- 高级提示工具;开源模型或OpenAI/Anthropic模型的聊天模板,准确计数提示令牌,并构建语法和logit偏差。
- 解析和清理HTML和文本。
安装
[dependencies]
llm_utils = "*"
分词器 🧮
-
Hugging Face的Tokenizer库用于本地模型和Tiktoken-rs用于OpenAI和Anthropic (Anthropic没有公开可用的分词器.)
-
简单的抽象API用于编码和解码,允许跨多个架构抽象地消费LLM。
-
安全地设置LLM的
max_token
参数,以确保请求不会因超过令牌限制而失败!
// Get a Tiktoken tokenizer
//
let tokenizer: LlmTokenizer = LlmTokenizer::new_tiktoken("gpt-4o");
// Get a Hugging Face tokenizer from local path
//
let tokenizer: LlmTokenizer = LlmTokenizer::new_from_tokenizer_json("path/to/tokenizer.json");
// Or load from repo
//
let tokenizer: LlmTokenizer = LlmTokenizer::new_from_hf_repo(hf_token, "meta-llama/Meta-Llama-3-8B-Instruct");
// Get tokenizan'
//
let token_ids: Vec<u32> = tokenizer.tokenize("Hello there");
let count: u32 = tokenizer.count_tokens("Hello there");
let word_probably: String = tokenizer.detokenize_one(token_ids[0])?;
let words_probably: String = tokenizer.detokenize_many(token_ids)?;
// These function are used for generating logit bias
let token_id: u32 = tokenizer.try_into_single_token("hello");
let word_probably: String = tokenizer.try_from_single_token_id(1234);
文本分块 🪓
平衡文本分块意味着所有块的大小大致相同。
let text = "one, two, three, four, five, six, seven, eight, nine";
// Give a max token count of four, other text chunkers would split this into three chunks.
assert_eq!(["one, two, three, four", "five, six, seven, eight", "nine"], // "nine" is orphaned!
OtherChunkers::new()
.max_chunk_token_size(4)
.Chunk(text));
// A balanced text chunker, however, would also split the text into three chunks, but of even sizes.
assert_eq!(["one, two, three", "four, five, six", "seven, eight, nine"],
TextChunker::new()
.max_chunk_token_size(4)
.run(&text)?);
只要传入文本的总令牌长度不能被最大令牌计数整除,最终块将比其他块小。在某些情况下,它可能非常小,以至于会被“遗弃”并变得无用。如果您要求RAG实现回答“七个吃了什么?”,那么回答问题的最终块将无法检索。
TextChunker首先尝试以下顺序进行语义分割:段落、换行符、句子。如果失败,它将使用最大的可用分割线性构建块,并在需要的地方进行分割。
模型预设 🛤️
-
来自Hugging Face的开源LLM的预设,或API模型如OpenAI和Anthropic。
-
加载和/或下载带有元数据、分词器和本地路径(用于本地LLM如llama.cpp、vllm、mistral.rs)的模型。
-
自动选择适合您VRAM的最大量化GGUF!
支持的开源模型
⚪ Llama 3
⚪ Mistral 和 Mixtral
⚪ Phi 3
// Load the largest quantized Mistral-7B-Instruct model that will fit in your vram
//
let model: OsLlm = PresetModelBuilder::new()
.mistral_7b_instruct()
.vram(48)
.ctx_size(9001) // ctx_size impacts vram usage!
.load()
.await?;
not_a_real_assert_eq!(model, OsLlm {
pub model_id: String,
pub model_url: String,
pub local_model_path: String, // Use this to load the llama.cpp server
pub model_config_json: OsLlmConfigJson,
pub chat_template: OsLlmChatTemplate,
pub tokenizer: Option<LlmTokenizer>,
})
// Or Openai
//
let model: OpenAiLlm = OpenAiLlm::gpt_4_o();
not_a_real_assert_eq!(model, OpenAiLlm {
model_id: "gpt-4o".to_string(),
context_length: 128000,
cost_per_m_in_tokens: 5.00,
max_tokens_output: 4096,
cost_per_m_out_tokens: 15.00,
tokens_per_message: 3,
tokens_per_name: 1,
tokenizer: Option<LlmTokenizer>,
})
// Or Anthropic
//
let model: AnthropicLlm = AnthropicLlm::claude_3_opus();
来自 Hugging Face 或本地路径的 GGUF 模型 🚤
// From HF
//
let model_url = "https://hugging-face.cn/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF/blob/main/Meta-Llama-3-8B-Instruct.Q6_K.gguf";
let model: OsLlm = GGUFModelBuilder::new()
.hf_quant_file_url(model_url)
.load()
.await?;
// Note: because we can't instantiate a tokenizer from a GGUF file, the returned model will not have a tokenizer!
// However, if we provide the base model's repo, we load from there.
let repo_id = "meta-llama/Meta-Llama-3-8B-Instruct";
let model: OsLlm = GGUFModelBuilder::new()
.hf_quant_file_url(model_url)
.hf_config_repo_id(repo_id)
.load()
.await?;
// From Local
//
let local_path = "/root/.cache/huggingface/hub/models--MaziyarPanahi--Meta-Llama-3-8B-Instruct-GGUF/blobs/c2ca99d853de276fb25a13e369a0db2fd3782eff8d28973404ffa5ffca0b9267";
let model: OsLlm = GGUFModelBuilder::new()
.local_quant_file_path(local_path)
.load()
.await?;
// Again, we require a tokenizer.json. This can also be loaded from a local path.
let local_config_path = "/llm_utils/src/models/open_source/llama/llama_3_8b_instruct";
let model: OsLlm = GGUFModelBuilder::new()
.local_quant_file_path(model_url)
.local_config_path(local_config_path)
.load()
.await?;
提示 🎶
-
为 GGUF 模型、Openai 和 Anthropic 生成正确格式的提示。
-
使用 GGUF 的聊天模板和 Jinja 模板来格式化提示以符合模型规范。
-
从动态输入和/或来自文件的静态输入的组合中创建提示。
// Default formatted prompt (Openai and Anthropic format)
//
let default_formatted_prompt: HashMap<String, HashMap<String, String>> = prompting::default_formatted_prompt(
"You are a nice robot.",
"path/to/a/file/no_birds_and_bees_yap.yaml",
"Where do robots come from?"
)?;
// Get total tokens in prompt
//
let total_prompt_tokens: u32 = model.openai_token_count_of_prompt(&tokenizer, &default_formatted_prompt);
// Then convert it to be used for a GGUF model
//
let gguf_formatted_prompt: String = prompting::convert_default_prompt_to_model_format(
&default_formatted_prompt,
&model.chat_template,
)?;
// Since the GGUF formatted prompt is just a string, we can just use the generic count_tokens function
//
let total_prompt_tokens: u32 = tokenizer.count_tokens(&gguf_formatted_prompt);
// Validate requested max_tokens for a generation. If it exceeds the models limits, reduce max_tokens to a safe value.
//
let safe_max_tokens = get_and_check_max_tokens_for_response(
model.context_length,
model.max_tokens_output, // If using a GGUF model use either model.context_length or the ctx_size of the server.
total_prompt_tokens,
10,
None,
requested_max_tokens,
)?;
语法 🤓
-
语法是结构化 LLM 输出的最有效方法。 这是为了与 LlamaCpp 一起使用而设计的,但计划支持其他模型。
-
创建 N 项的列表,限制字符类型。
-
将添加更多功能(JSON、分类、限制字符、单词、短语)
// Return a list of between 1, 4 items
//
let grammar = llm_utils::grammar::create_list_grammar(1, 4);
// List will be formatted: `- <list text>\n
//
let response: String = text_generation_request(&req_config, Some(&grammar)).await?;
// So you can easily split like:
//
let response_items: Vec<String> = response
.lines()
.map(|line| line[1..].trim().to_string())
.collect();
// Exclude numbers from text generation
//
let grammar = llm_utils::grammar::create_text_structured_grammar(vec![RestrictedCharacterSet::PunctuationExtended]);
let response: String = text_generation_request(&req_config, Some(&grammar)).await?;
assert!(!response.contains('0'))
assert!(!response.contains("1234"))
// Exclude a list of common, and commonly unwanted characters from text generation
//
let grammar = llm_utils::grammar::create_text_structured_grammar(vec![RestrictedCharacterSet::PunctuationExtended]);
let response: String = text_generation_request(&req_config, Some(&grammar)).await?;
assert!(!response.contains('@'))
assert!(!response.contains('['))
assert!(!response.contains('*'))
对数偏差 #️⃣
-
为 LlamaCpp 和 Openai 创建正确格式的对数偏差请求。
-
从各种来源添加对数偏差的功能,以及验证。
// Exclude some tokens from text generation
//
let mut words = HashMap::new();
words.entry("delve").or_insert(-100.0);
words.entry("as an ai model").or_insert(-100.0);
// Build and validate
//
let logit_bias = logit_bias::logit_bias_from_words(&tokenizer, &words)
let validated_logit_bias = logit_bias::validate_logit_bias_values(&logit_bias)?;
// Convert
//
let openai_logit_bias = logit_bias::convert_logit_bias_to_openai_format(&validated_logit_bias)?;
let llama_logit_bias = logit_bias::convert_logit_bias_to_llama_format(&validated_logit_bias)?;
文本分割 🔪
根据段落、句子、单词和字符分割文本。
let paragraph_splits: Vec<String> = TextSplitter::new()
.on_two_plus_newline()
.split_text(&text)?;
let newline_splits: Vec<String> = TextSplitter::new()
.on_single_newline()
.split_text(&text)?;
// There is no good implementation sentence splitting in Rust!
// This implementation is better than unicode-segmentation crate or any other crate I tested.
// But still not as good as a model based approach like Spacy or other NLP libraries.
//
let sentence_splits: Vec<String> = TextSplitter::new()
.on_sentences_rule_based()
.split_text(&text)?;
// Unicode
let sentence_splits: Vec<String> = TextSplitter::new()
.on_sentences_unicode()
.split_text(&text)?;
let word_splits: Vec<String> = TextSplitter::new()
.on_words_unicode()
.split_text(&text)?;
let graphemes_splits: Vec<String> = TextSplitter::new()
.on_graphemes_unicode()
.split_text(&text)?;
// If the split separator produces less than two splits,
// this mode tries the next separator.
// It does this until it produces more than one split.
//
let paragraph_splits: Vec<String> = TextSplitter::new()
.on_two_plus_newline()
.recursive(true)
.split_text(&text)?;
文本清理 📝
// Normalizes all whitespace chars .
// Reduce the number of newlines to singles or doubles (paragraphs) or convert them to " ".
// Optionally, remove all characters besides alphabetic, numbers, and punctuation.
//
let mut text_cleaner: String = llm_utils::text_utils::clean_text::TextCleaner::new();
let cleaned_text: String = text_cleaner
.reduce_newlines_to_single_space()
.remove_non_basic_ascii()
.run(some_dirty_text);
// Convert HTML to cleaned text.
// Uses an implementation of Mozilla's readability mode and HTML2Text.
//
let cleaned_text: String = llm_utils::text_utils::clean_html::clean_html(raw_html);
许可协议
本项目的许可协议为 MIT 许可协议。
贡献
我发布项目的动机是希望有人指出我可能做错了什么!
依赖关系
~32–46MB
~640K SLoC