mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Files

Yuanyuan Chen f64354e89a Format empty lines and white space in markdown files. (#41100 )

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

2025-09-23 16:20:01 -07:00

2.0 KiB

Raw Blame History

This model was released on 2025-06-06 and added to Hugging Face Transformers on 2025-06-25.

dots.llm1

Overview

The dots.llm1 model was proposed in dots.llm1 technical report by rednote-hilab team.

The abstract from the report is the following:

Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.

Dots1Config

autodoc Dots1Config

Dots1Model

autodoc Dots1Model - forward

Dots1ForCausalLM

autodoc Dots1ForCausalLM - forward

2.0 KiB Raw Blame History