Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
fix comment
2025-10-20 17:13:56 +08:00 · 2024-09-23 13:27:38 -07:00 · 2024-09-09 13:17:29 -07:00
1 changed files with 2 additions and 2 deletions
--- a/src/transformers/models/llama/modeling_llama.py
+++ b/src/transformers/models/llama/modeling_llama.py
@ -490,8 +490,8 @@ class LlamaFlashAttention2(LlamaAttention):
        value_states = self.v_proj(hidden_states)

        # Flash attention requires the input to have the shape
-        # batch_size x seq_length x head_dim x hidden_dim
-        # therefore we just need to keep the original shape
+        # batch_size, seq_length, num_heads, head_dim
+        # but rotary embeddings require batch_size, num_heads, seq_length, head_dim
        query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
        key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
        value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
Author	SHA1	Message	Date
Nicholas Broad	dd94152e96	Update src/transformers/models/llama/modeling_llama.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-09-23 13:27:38 -07:00
Nicholas Broad	633f4363ec	fix comment	2024-09-09 13:17:29 -07:00