weixin_public_corpus

by nonamestreet

nonamestreet / weixin_public_corpus

微信公众号语料库

446 Stars 140 Forks Last release: Not found 19 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

微信公众号语料库

部分网络抓取的微信公众号的文章,已经去除HTML,只包含了纯文本。每行一篇,是JSON格式,name是微信公众号名字,account是微信公众号ID,title是题目,content是正文。

数据用zip分卷压缩过的, 没有密码。预览可以看preview.json。

目前数据大约3G,数据会定期更新增加。

请只用于研究用途。

有问题或者特殊需求直接建Issue。

[email protected]

欢迎志同道合的小伙伴加入校宝一起来搞有意思的事情!https://www.xiaobaoonline.com/pc/contactjoin

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.