[data] add coig-p dataset (#7657)

This commit is contained in:
hoshi-hiyouga
2025-04-09 21:18:25 +08:00
committed by GitHub
parent 89a4f9ec7f
commit 4eec541857
9 changed files with 314 additions and 12 deletions

View File

@ -85,7 +85,7 @@ Regarding the above dataset, the *dataset description* in `dataset_info.json` sh
### Pre-training Dataset
- [Example dataset](c4_demo.json)
- [Example dataset](c4_demo.jsonl)
In pre-training, only the `text` column will be used for model learning.

View File

@ -85,7 +85,7 @@
### 预训练数据集
- [样例数据集](c4_demo.json)
- [样例数据集](c4_demo.jsonl)
在预训练时,只有 `text` 列中的内容会用于模型学习。

300
data/c4_demo.jsonl Normal file

File diff suppressed because one or more lines are too long