1 Star 0 Fork 2

MagiCodeX/word2vec-pytorch

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
克隆/下载
word2vec_train.py 1.14 KB
一键复制 编辑 原始数据 按行查看 历史
MagiCodeX 提交于 2024-07-07 15:52 . 完善代码
from word2vec.word2vec import WordTokenizer, CorpusData, Word2vec
# 语料文件的路径
CORPUS_DATA_PATH = './resources/corpus/text8.txt'
# 模型参数保存的路径
MODEL_DICT_PATH = './resources/model/word2vec-latest.pth'
# 正样本距离中心词的最大距离
#MAX_WINDOW_SIZE = 3
MAX_WINDOW_SIZE = 5
# 每个正样本对应的负样本数据
NEGATIVE_SAMPLE_NUM = 15
# 词汇表最大数目
MAX_VOCAB_SIZE = 10000
# 词向量的维度大小
EMBEDDING_SIZE = 100
# 迭代次数
EPOCH_NUM = 1
# 批次大小
BATCH_SIZE = 32
# 学习率
LEARNING_RATE = 0.2
with open(CORPUS_DATA_PATH, 'r', encoding='utf-8') as f:
file_content = f.read()
word_tokenizer = WordTokenizer()
corpus_data = CorpusData(word_tokenizer, MAX_VOCAB_SIZE)
corpus_data.load_data(file_content)
word2vec = Word2vec(corpus_data, EMBEDDING_SIZE)
# 训练模型
word2vec.train_model( output_file_path = MODEL_DICT_PATH,
max_window_size = MAX_WINDOW_SIZE,
negative_sample_num = NEGATIVE_SAMPLE_NUM,
epoch_num = EPOCH_NUM,
batch_size = BATCH_SIZE,
learning_rate = LEARNING_RATE)
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/magicodex/word2vec-pytorch.git
git@gitee.com:magicodex/word2vec-pytorch.git
magicodex
word2vec-pytorch
word2vec-pytorch
master

搜索帮助