1个不稳定版本
0.1.0 | 2024年7月1日 |
---|
84 在 文本编辑器 中
23KB
571 代码行
lm-proxy
- (大型) 语言模型代理
一个用于转发到外部服务器的(大型)语言模型代理。它管理外部服务器,并在需要时启动和关闭它们。
配置
[proxy]
port = 8080
# without running requests, keep models alive for 60s
keep_alive = 60
# with a running request, keep models alive for 300s
request_keep_alive = 300
[models.phi3]
args = [
"llama-server",
"--model",
"phi-3-mini-4k-instruct-q4.gguf",
"--port",
"{{ port }}",
]
[models.gemma2]
args = [
"llama-server",
"--model",
"gemma-2-9b-it-q5_k_m.gguf",
"--port",
"{{ port }}",
]
启动服务器
lm-proxy serve config.toml
使用服务器
from openai import OpenAI
client = OpenAI(
base_url = 'https://127.0.0.1:8080/v1',
api_key='unused',
)
# use the phi3 model
response = client.chat.completions.create(
model="phi3",
messages=[{"role": "user", "content": "What is 2 + 3?"}]
)
print(response.choices[0].message.content)
# use the gemma2 model
response = client.chat.completions.create(
model="gemma2",
messages=[{"role": "user", "content": "How can I add 2 and 3 in Python?"}]
)
print(response.choices[0].message.content)
依赖项
~10–22MB
~329K SLoC