BERT small Japanese finance

language

license

datasets

widget

ja

cc-by-sa-4.0

wikipedia

text
東京大学で[MASK]の研究をしています。

BERT small Japanese finance

This is a BERT model pretrained on texts in the Japanese language.

The codes for the pretraining are available at retarfi/language-pretraining.

Model architecture

The model architecture is the same as BERT small in the original ELECTRA paper; 12 layers, 256 dimensions of hidden states, and 4 attention heads.

Training Data

The models are trained on the Japanese version of Wikipedia.

The training corpus is generated from the Japanese version of Wikipedia, using Wikipedia dump file as of June 1, 2021.

The corpus file is 2.9GB, consisting of approximately 20M sentences.

Tokenization

The texts are first tokenized by MeCab with IPA dictionary and then split into subwords by the WordPiece algorithm.

The vocabulary size is 32768.

Training

The models are trained with the same configuration as BERT small in the original ELECTRA paper; 128 tokens per instance, 128 instances per batch, and 1.45M training steps.

Citation

@article{Suzuki-etal-2023-ipm,
  title = {Constructing and analyzing domain-specific language model for financial text mining}
  author = {Masahiro Suzuki and Hiroki Sakaji and Masanori Hirano and Kiyoshi Izumi},
  journal = {Information Processing & Management},
  volume = {60},
  number = {2},
  pages = {103194},
  year = {2023},
  doi = {10.1016/j.ipm.2022.103194}
}

Licenses

The pretrained models are distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0.

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP21K12010.

modelee/bert-small-japanese

BERT small Japanese finance

Model architecture

Training Data

Tokenization

Training

Citation

Licenses

Acknowledgments

简介

发行版

贡献者

近期动态

modelee/bert-small-japanese .gitee-modal { width: 500px !important; }

BERT small Japanese finance

Model architecture

Training Data

Tokenization

Training

Citation

Licenses

Acknowledgments

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

modelee/bert-small-japanese