1 Star 3 Fork 1

David_Zhung/bayes_spam

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
examples.py 1.66 KB
一键复制 编辑 原始数据 按行查看 历史
David_Zhung 提交于 2020-04-29 00:38 . updates the codes
#!/usr/bin/python
# -*- coding: utf-8 -*-
__author__ = 'David Zhang'
from spam.impl import *
import logging
def train() -> BayesSpamModel:
# 获取训练数据集中属于正常邮件的文件路径
# get files belonging to normal email in train set
train_normal = MailFileHelper.get_all_files_under_dir('data/train/normal')
# 获取训练数据集中属于垃圾邮件的文件路径
# get files belonging to spam email in train set
train_spam = MailFileHelper.get_all_files_under_dir('data/train/spam')
# get a train instance
bayes_spam = BayesSpamTrain()
# read data from train set
# the arg log_step_interval is used to help logging
bayes_spam.read_train_set_data(train_normal, 0, log_step_interval=5)
bayes_spam.read_train_set_data(train_spam, 1, log_step_interval=5)
model = bayes_spam.train()
# export the model into specified/default local files.
model.export_model()
return model
def test(model=None):
if model is None:
# import the model from specified/default local files.
model = BayesSpamModel.import_model()
test_normal = MailFileHelper.get_all_files_under_dir('data/test/normal')
test_spam = MailFileHelper.get_all_files_under_dir('data/test/spam')
file_list = np.concatenate((np.array(test_normal), np.array(test_spam))).flatten()
y = np.concatenate((np.zeros(len(test_normal)), np.ones(len(test_spam)))).flatten()
acc = model.evaluate(file_list, y=y)
return acc
if __name__ == '__main__':
model = BayesSpamModel.import_model()
# 设置预测时使用到的高概率词数量
# model.set_threshold(-1)
logging.info(test(model))
# train()
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/ChiZhung/bayes_spam.git
git@gitee.com:ChiZhung/bayes_spam.git
ChiZhung
bayes_spam
bayes_spam
master

搜索帮助