代码拉取完成,页面将自动刷新
"""
tokenizer 和 训练数据配置文件
"""
LANGUAGE = "enzh" # [en, zh, enzh]
# https://huggingface.co/baichuan-inc/Baichuan-7B/blob/main/tokenizer.model
TOKENIZER_MODEL = "tokenizers/baichuan/tokenizer.model" # the baichuan sentencepiece tokenizer model
TOKENIZER_BIN = "tokenizers/baichuan/tokenizer.bin" # binary version of the tokenizer for inference in C
# https://huggingface.co/ziqingyang/chinese-llama-2-7b/blob/main/tokenizer.model
# TOKENIZER_MODEL = "tokenizers/llama2enzh/tokenizer.model" # the llama2.
# TOKENIZER_BIN = "tokenizers/llama2enzh/tokenizer.bin" # binary version of the tokenizer for inference in C
# base llama2,
# TOKENIZER_MODEL = "tokenizers/llama2en/tokenizer.model" # the llama2-enzh.
# TOKENIZER_BIN = "tokenizers/llama2en/tokenizer.bin" # binary version of the tokenizer for inference in C
#自定义中文词表(红楼梦.txt)
#TOKENIZER_MODEL = "tokenizers/custom_tokenizer/meng.model" # the llama2-zh.
#TOKENIZER_BIN = "tokenizers/custom_tokenizer/meng.bin" # binary version of the tokenizer for inference in C
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。