https://github.com/embedchain/embedchain
https://github.com/dandelionsllm/pandallm
https://github.com/Clouditera/secgpt
https://github.com/firechecking/CleanParallel
https://github.com/fighting41love/funNLP
https://github.com/pytorch-labs/gpt-fast
https://github.com/mindspore-courses/CradleOfAI
https://github.com/DataEngineer-io/data-engineer-handbook
https://github.com/vectorch-ai/ScaleLLM
https://github.com/bigscience-workshop/data-preparation
https://github.com/FlagAI-Open/Aquila2
https://github.com/neelsjain/NEFTune
https://github.com/Mythos-Rudy/mnbvc-fasttext-classification
https://github.com/harvardnlp/annotated-transformer/
https://github.com/gaussic/Chinese-Lyric-Corpus
https://github.com/ffzs/dataset/