Files
transformers/docs/source/en/model_doc/gpt_oss.md
2025-10-15 14:08:54 -07:00

3.4 KiB

This model was released on 2025-08-05 and added to Hugging Face Transformers on 2025-08-05 and contributed by [](https://huggingface.co/).

FlashAttention SDPA

GptOss

GptOss are open-weight reasoning models built on a mixture-of-experts transformer for improved accuracy and lower inference cost. Training combines large-scale distillation with reinforcement learning, and the models are optimized for agentic tasks like web research, Python execution, and integration with external developer tools. They use a rendered chat format to improve instruction following and role clarity, and demonstrate strong performance across benchmarks in math, coding, and safety. All components—model weights, inference systems, tool environments, and tokenizers are released under Apache 2.0 for broad accessibility.

import torch
from transformers import pipeline

pipeline = pipeline(task="text-generation", model="openai/gpt-oss-20b", dtype="auto",)
pipeline("Plants create energy through a process known as photosynthesis.")
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")
model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b", dtype="auto",)

inputs = tokenizer("Plants create energy through a process known as photosynthesis.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Usage tips

  • Attention sinks with flex attention require special handling. Unlike standard attention implementations where sinks add directly to attention scores, flex attention score_mod function operates on individual score elements rather than the full attention matrix.
  • Apply attention sinks renormalization after flex attention computations. Renormalize the outputs using the log-sum-exp (LSE) values returned by flex attention.

GptOssConfig

autodoc GptOssConfig

GptOssModel

autodoc GptOssModel - forward

GptOssForCausalLM

autodoc GptOssForCausalLM - forward

GptOssForSequenceClassification

autodoc GptOssForSequenceClassification - forward

GptOssForTokenClassification

autodoc GptOssForTokenClassification - forward