Skip to content
#

chinese-nlp

Here are 186 public repositories matching this topic...

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

  • Updated Oct 30, 2024

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction implemented based on TEXTRANK algorithm. Jcseg had a build-in http server and search modules for lucene,solr,elasticsearch,opensearch

  • Updated Sep 18, 2023
  • Java

Improve this page

Add a description, image, and links to the chinese-nlp topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chinese-nlp topic, visit your repo's landing page and select "manage topics."

Learn more