mirror of
https://github.com/huggingface/transformers.git
synced 2025-10-20 17:13:56 +08:00
* Fix white space Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Revert changes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix autodoc Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2.4 KiB
2.4 KiB
This model was released on 2024-08-23 and added to Hugging Face Transformers on 2025-02-14.
GraniteMoeShared
Overview
The GraniteMoe model was proposed in Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.
Additionally this class GraniteMoeSharedModel adds shared experts for Moe.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "ibm-research/moe-7b-1b-active-shared-experts"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()
# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."
# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
print(i)
This HF implementation is contributed by Mayank Mishra, Shawn Tan and Sukriti Sharma.
GraniteMoeSharedConfig
autodoc GraniteMoeSharedConfig
GraniteMoeSharedModel
autodoc GraniteMoeSharedModel - forward
GraniteMoeSharedForCausalLM
autodoc GraniteMoeSharedForCausalLM - forward