1 Star 0 Fork 0

louis_lifu/BentoVLLM

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
service.py 1.68 KB
一键复制 编辑 原始数据 按行查看 历史
Zhao Shenyang 提交于 2024-02-01 06:22 . service rename
import bentoml
from typing import Optional, AsyncGenerator, List
MAX_TOKENS = 1024
PROMPT_TEMPLATE = """<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{user_prompt} [/INST] """
@bentoml.service(
traffic={
"timeout": 300,
},
resources={
"gpu": 1,
"memory": "16Gi",
},
)
class VLLM:
def __init__(self) -> None:
from vllm import AsyncEngineArgs, AsyncLLMEngine
ENGINE_ARGS = AsyncEngineArgs(
model='meta-llama/Llama-2-7b-chat-hf',
max_model_len=MAX_TOKENS
)
self.engine = AsyncLLMEngine.from_engine_args(ENGINE_ARGS)
self.request_id = 0
@bentoml.api
async def generate(self, prompt: str = "Explain superconductors like I'm five years old", tokens: Optional[List[int]] = None) -> AsyncGenerator[str, None]:
from vllm import SamplingParams
SAMPLING_PARAM = SamplingParams(max_tokens=MAX_TOKENS)
prompt = PROMPT_TEMPLATE.format(user_prompt=prompt)
stream = await self.engine.add_request(self.request_id, prompt, SAMPLING_PARAM, prompt_token_ids=tokens)
self.request_id += 1
async for request_output in stream:
yield request_output.outputs[0].text
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/louis_lifu/BentoVLLM.git
git@gitee.com:louis_lifu/BentoVLLM.git
louis_lifu
BentoVLLM
BentoVLLM
main

搜索帮助

23e8dbc6 1850385 7e0993f3 1850385