3.3 KiB
This model was released on 2024-04-22 and added to Hugging Face Transformers on 2024-04-24.
Phi-3
Phi-3 introduces phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, achieving performance comparable to Mixtral 8x7B and GPT-3.5 on benchmarks like MMLU and MT-bench. The model's dataset includes heavily filtered web data and synthetic data, ensuring robustness, safety, and chat format alignment. Additionally, phi-3-small (7B parameters) and phi-3-medium (14B parameters) models, trained on 4.8T tokens, demonstrate higher capabilities with improved MMLU and MT-bench scores.
import torch
from transformers import pipeline
pipeline = pipeline(task="text-generation", model="microsoft/Phi-3-mini-4k-instruct", dtype="auto",)
pipeline("Plants create energy through a process known as photosynthesis.")
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct", dtype="auto",)
inputs = tokenizer("Plants create energy through a process known as photosynthesis.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Usage tips
- This model is very similar to Llama. The main difference is [
Phi3SuScaledRotaryEmbedding
] and [Phi3YarnScaledRotaryEmbedding
], which extend the context of rotary embeddings. - Query, key, and values are fused. The MLP's up and gate projection layers are also fused.
- The tokenizer is identical to [
LlamaTokenizer
], except for additional tokens.
Phi3Config
autodoc Phi3Config
Phi3Model
autodoc Phi3Model - forward
Phi3ForCausalLM
autodoc Phi3ForCausalLM - forward - generate
Phi3ForSequenceClassification
autodoc Phi3ForSequenceClassification - forward
Phi3ForTokenClassification
autodoc Phi3ForTokenClassification - forward