Manim animation of big model inference (#671)

* Manim animation of big model inference * Make into big section, not small * Revert back to old style of headers
2025-10-20 18:13:46 +08:00 · 2022-09-02 10:34:46 -04:00
parent 52c2b1c244
commit cf1e8dce75
6 changed files with 780 additions and 5 deletions
--- a/docs/source/usage_guides/big_modeling.mdx
+++ b/docs/source/usage_guides/big_modeling.mdx
@ -35,7 +35,13 @@ While this works very well for regularly sized models, this workflow has some cl

 </Tip>

-## Instantiating an empty model
+## How the Process Works: A Quick Overview
+
+<Youtube id="MWCSGj9jEAo" />
+
+## How the Process Works: Working with Code
+
+### Instantiating an empty model

 The first tool 🤗 Accelerate introduces to help with big models is a context manager [`init_empty_weights`] that helps you initialize a model without using any RAM, so that step 1 can be done on models of any size. Here is how it works:

@ -61,7 +67,7 @@ initializes an empty model with a bit more than 100B parameters. Behind the scen

 </Tip>

-## Sharded checkpoints
+### Sharded checkpoints

 It's possible your model is so big that even a single copy won't fit in RAM. That doesn't mean it can't be loaded: if you have one or several GPUs, this is more memory available to store your model. In this case, it's better if your checkpoint is split in several smaller files that we call checkpoint shards.

@ -86,7 +92,7 @@ with index.json being the following file:

 and `first_state_dict.bin` containing the weights for `"linear1.weight"` and `"linear1.bias"`, `second_state_dict.bin` the ones for `"linear2.weight"` and `"linear2.bias"`

-## Loading weights
+### Loading weights

 The second tool 🤗 Accelerate introduces is a function [`load_checkpoint_and_dispatch`], that will allow you to load a checkpoint inside your empty model. This supports full checkpoints (a single file containing the whole state dict) as well as sharded checkpoints. It will also automatically dispatch those weights across the devices you have available (GPUs, CPU RAM), so if you are loading a sharded checkpoint, the maximum RAM usage will be the size of the biggest shard.

@ -176,7 +182,7 @@ You can also design your `device_map` yourself, if you prefer to explicitly deci
 model = load_checkpoint_and_dispatch(model, "sharded-gpt-j-6B", device_map=my_device_map)
 ```

-## Run the model
+### Run the model

 Now that we have done this, our model lies across several devices, and maybe the hard drive. But it can still be used as a regular PyTorch model:

@ -203,7 +209,7 @@ This way, you model can run for inference even if it doesn't fit on one of the G

 </Tip>

-## Designing a device map
+### Designing a device map

 You can let 🤗 Accelerate handle the device map computation by setting `device_map` to one of the supported options (`"auto"`, `"balanced"`, `"balanced_low_0"`, `"sequential"`) or create one yourself, if you want more control over where each layer should go.

--- a/manim_animations/big_model_inference/stage_1.py
+++ b/manim_animations/big_model_inference/stage_1.py
@ -0,0 +1,108 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from manim import *
+
+
+class Stage1(Scene):
+    def construct(self):
+        mem = Rectangle(height=0.5,width=0.5)
+        fill = Rectangle(height=0.46,width=0.46).set_stroke(width=0)
+
+        cpu_left_col_base = [mem.copy() for i in range(6)]
+        cpu_right_col_base = [mem.copy() for i in range(6)]
+        cpu_left_col = VGroup(*cpu_left_col_base).arrange(UP, buff=0)
+        cpu_right_col = VGroup(*cpu_right_col_base).arrange(UP, buff=0)
+        cpu_rects = VGroup(cpu_left_col,cpu_right_col).arrange(RIGHT, buff=0)
+        cpu_text = Text("CPU", font_size=24)
+        cpu = Group(cpu_rects,cpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        cpu.move_to([-2.5,-.5,0])
+        self.add(cpu)
+
+        gpu_base = [mem.copy() for i in range(1)]
+        gpu_rect = VGroup(*gpu_base).arrange(UP,buff=0)
+        gpu_text = Text("GPU", font_size=24)
+        gpu = Group(gpu_rect,gpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        gpu.align_to(cpu, DOWN)
+        gpu.set_x(gpu.get_x() - 1)
+        
+        self.add(gpu)
+
+        model_base = [mem.copy() for i in range(6)]
+        model_rect = VGroup(*model_base).arrange(RIGHT,buff=0)
+
+        model_text = Text("Model", font_size=24)
+        model = Group(model_rect,model_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        model.move_to([3, -1., 0])
+        
+        self.play(
+            Create(cpu_left_col, run_time=1),
+            Create(cpu_right_col, run_time=1),
+            Create(gpu_rect, run_time=1),
+        )
+
+        step_1 = MarkupText(
+            f"First, an empty model skeleton is loaded\ninto <span fgcolor='{YELLOW}'>memory</span> without using much RAM.", 
+            font_size=24
+        )
+
+        key = Square(side_length=2.2)
+        key.move_to([-5, 2, 0])
+
+        key_text = MarkupText(
+            f"<b>Key:</b>\n\n<span fgcolor='{YELLOW}'>●</span> Empty Model",
+            font_size=18,
+        )
+
+        key_text.move_to([-5, 2.4, 0])
+
+
+        step_1.move_to([2, 2, 0])
+        self.play(
+            Write(step_1, run_time=2.5),
+            Write(key_text),
+            Write(key)
+        )
+
+        self.add(model)
+        
+
+        cpu_targs = []
+        first_animations = []
+        second_animations = []
+        for i,rect in enumerate(model_base):
+
+            cpu_target = Rectangle(height=0.46,width=0.46).set_stroke(width=0.).set_fill(YELLOW, opacity=0.7)
+            cpu_target.move_to(rect)
+            cpu_target.generate_target()
+            cpu_target.target.height = 0.46/4
+            cpu_target.target.width = 0.46/3
+            
+            if i == 0:
+                cpu_target.target.next_to(cpu_left_col_base[0].get_corner(DOWN+LEFT), buff=0.02, direction=UP)
+                cpu_target.target.set_x(cpu_target.target.get_x()+0.1)
+            elif i == 3:
+                cpu_target.target.next_to(cpu_targs[0].target, direction=UP, buff=0.)
+            else:
+                cpu_target.target.next_to(cpu_targs[i-1].target, direction=RIGHT, buff=0.)
+            cpu_targs.append(cpu_target)
+
+            first_animations.append(rect.animate(run_time=0.5).set_stroke(YELLOW))
+            second_animations.append(MoveToTarget(cpu_target, run_time=1.5))
+
+        self.play(*first_animations)
+        self.play(*second_animations)
+                 
+
+        self.wait()
--- a/manim_animations/big_model_inference/stage_2.py
+++ b/manim_animations/big_model_inference/stage_2.py
@ -0,0 +1,126 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from manim import *
+
+class Stage2(Scene):
+    def construct(self):
+        mem = Rectangle(height=0.5,width=0.5)
+        fill = Rectangle(height=0.46,width=0.46).set_stroke(width=0)
+
+        cpu_left_col_base = [mem.copy() for i in range(6)]
+        cpu_right_col_base = [mem.copy() for i in range(6)]
+        cpu_left_col = VGroup(*cpu_left_col_base).arrange(UP, buff=0)
+        cpu_right_col = VGroup(*cpu_right_col_base).arrange(UP, buff=0)
+        cpu_rects = VGroup(cpu_left_col,cpu_right_col).arrange(RIGHT, buff=0)
+        cpu_text = Text("CPU", font_size=24)
+        cpu = Group(cpu_rects,cpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        cpu.move_to([-2.5,-.5,0])
+        self.add(cpu)
+
+        gpu_base = [mem.copy() for i in range(4)]
+        gpu_rect = VGroup(*gpu_base).arrange(UP,buff=0)
+        gpu_text = Text("GPU", font_size=24)
+        gpu = Group(gpu_rect,gpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        gpu.move_to([-1,-1,0])
+        self.add(gpu)
+
+        model_base = [mem.copy() for i in range(6)]
+        model_rect = VGroup(*model_base).arrange(RIGHT,buff=0)
+
+        model_text = Text("Model", font_size=24)
+        model = Group(model_rect,model_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        model.move_to([3, -1., 0])
+        self.add(model)
+        
+        cpu_targs = []
+        for i,rect in enumerate(model_base):
+            rect.set_stroke(YELLOW)
+            # target = fill.copy().set_fill(YELLOW, opacity=0.7)
+            # target.move_to(rect)
+            # self.add(target)
+
+            cpu_target = Rectangle(height=0.46/4,width=0.46/3).set_stroke(width=0.).set_fill(YELLOW, opacity=0.7)
+            
+            if i == 0:
+                cpu_target.next_to(cpu_left_col_base[0].get_corner(DOWN+LEFT), buff=0.02, direction=UP)
+                cpu_target.set_x(cpu_target.get_x()+0.1)
+            elif i == 3:
+                cpu_target.next_to(cpu_targs[0], direction=UP, buff=0.)
+            else:
+                cpu_target.next_to(cpu_targs[i-1], direction=RIGHT, buff=0.)
+            self.add(cpu_target)
+            cpu_targs.append(cpu_target)
+
+              
+
+        checkpoint_base = [mem.copy() for i in range(6)]
+        checkpoint_rect = VGroup(*checkpoint_base).arrange(RIGHT,buff=0)
+
+        checkpoint_text = Text("Loaded Checkpoint", font_size=24)
+        checkpoint = Group(checkpoint_rect,checkpoint_text).arrange(DOWN, aligned_edge=DOWN, buff=0.4)
+        checkpoint.move_to([3, .5, 0])
+            
+        key = Square(side_length=2.2)
+        key.move_to([-5, 2, 0])
+
+        key_text = MarkupText(
+            f"<b>Key:</b>\n\n<span fgcolor='{YELLOW}'>●</span> Empty Model",
+            font_size=18,
+        )
+
+        key_text.move_to([-5, 2.4, 0])
+
+        self.add(key_text, key)
+
+        blue_text = MarkupText(
+            f"<span fgcolor='{BLUE}'>●</span> Checkpoint",
+            font_size=18,
+        )
+
+        blue_text.next_to(key_text, DOWN*2.4, aligned_edge=key_text.get_left())
+
+        step_2 = MarkupText(
+            f'Next, a <i><span fgcolor="{BLUE}">second</span></i> model is loaded into memory,\nwith the weights of a <span fgcolor="{BLUE}">single shard</span>.', 
+            font_size=24
+        )
+        step_2.move_to([2, 2, 0])
+        self.play(
+            Write(step_2),
+            Write(blue_text)
+        )
+
+        self.play(
+            Write(checkpoint_text, run_time=1),
+            Create(checkpoint_rect, run_time=1)
+        )
+
+        first_animations = []
+        second_animations = []
+        for i,rect in enumerate(checkpoint_base):
+            target = fill.copy().set_fill(BLUE, opacity=0.7)
+            target.move_to(rect)
+            first_animations.append(GrowFromCenter(target, run_time=1))
+
+            cpu_target = target.copy()
+            cpu_target.generate_target()
+            if i < 5:
+                cpu_target.target.move_to(cpu_left_col_base[i+1])
+            else:
+                cpu_target.target.move_to(cpu_right_col_base[i-5])
+            second_animations.append(MoveToTarget(cpu_target, run_time=1.5))
+            
+        self.play(*first_animations)
+        self.play(*second_animations)
+        self.wait()
--- a/manim_animations/big_model_inference/stage_3.py
+++ b/manim_animations/big_model_inference/stage_3.py
@ -0,0 +1,158 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from manim import *
+
+class Stage3(Scene):
+    def construct(self):
+        mem = Rectangle(height=0.5,width=0.5)
+        meta_mem = Rectangle(height=0.25,width=0.25)
+        fill = Rectangle(height=0.46,width=0.46).set_stroke(width=0)
+
+        cpu_left_col_base = [mem.copy() for i in range(6)]
+        cpu_right_col_base = [mem.copy() for i in range(6)]
+        cpu_left_col = VGroup(*cpu_left_col_base).arrange(UP, buff=0)
+        cpu_right_col = VGroup(*cpu_right_col_base).arrange(UP, buff=0)
+        cpu_rects = VGroup(cpu_left_col,cpu_right_col).arrange(RIGHT, buff=0)
+        cpu_text = Text("CPU", font_size=24)
+        cpu = Group(cpu_rects,cpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        cpu.move_to([-2.5,-.5,0])
+        self.add(cpu)
+
+        gpu_base = [mem.copy() for i in range(4)]
+        gpu_rect = VGroup(*gpu_base).arrange(UP,buff=0)
+        gpu_text = Text("GPU", font_size=24)
+        gpu = Group(gpu_rect,gpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        gpu.move_to([-1,-1,0])
+        self.add(gpu)
+
+        model_base = [mem.copy() for i in range(6)]
+        model_rect = VGroup(*model_base).arrange(RIGHT,buff=0)
+
+        model_text = Text("Model", font_size=24)
+        model = Group(model_rect,model_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        model.move_to([3, -1., 0])
+        self.add(model)
+
+        model_arr = []
+        model_cpu_arr = []
+        model_meta_arr = []
+        
+        for i,rect in enumerate(model_base):
+            rect.set_stroke(YELLOW)
+
+            cpu_target = Rectangle(height=0.46/4,width=0.46/3).set_stroke(width=0.).set_fill(YELLOW, opacity=0.7)
+            
+            if i == 0:
+                cpu_target.next_to(cpu_left_col_base[0].get_corner(DOWN+LEFT), buff=0.02, direction=UP)
+                cpu_target.set_x(cpu_target.get_x()+0.1)
+            elif i == 3:
+                cpu_target.next_to(model_cpu_arr[0], direction=UP, buff=0.)
+            else:
+                cpu_target.next_to(model_cpu_arr[i-1], direction=RIGHT, buff=0.)
+            self.add(cpu_target)
+            model_cpu_arr.append(cpu_target)
+
+        self.add(*model_arr, *model_cpu_arr, *model_meta_arr)
+
+        checkpoint_base = [mem.copy() for i in range(6)]
+        checkpoint_rect = VGroup(*checkpoint_base).arrange(RIGHT,buff=0)
+
+        checkpoint_text = Text("Loaded Checkpoint", font_size=24)
+        checkpoint = Group(checkpoint_rect,checkpoint_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        checkpoint.move_to([3, .5, 0])
+            
+        self.add(checkpoint)
+
+        ckpt_arr = []
+        ckpt_cpu_arr = []
+
+        for i,rect in enumerate(checkpoint_base):
+            target = fill.copy().set_fill(BLUE, opacity=0.7)
+            target.move_to(rect)
+            ckpt_arr.append(target)
+
+            cpu_target = target.copy()
+            if i < 5:
+                cpu_target.move_to(cpu_left_col_base[i+1])
+            else:
+                cpu_target.move_to(cpu_right_col_base[i-5])
+            ckpt_cpu_arr.append(cpu_target)
+        self.add(*ckpt_arr, *ckpt_cpu_arr)
+
+        key = Square(side_length=2.2)
+        key.move_to([-5, 2, 0])
+
+        key_text = MarkupText(
+            f"<b>Key:</b>\n\n<span fgcolor='{YELLOW}'>●</span> Empty Model",
+            font_size=18,
+        )
+
+        key_text.move_to([-5, 2.4, 0])
+
+        self.add(key_text, key)
+
+        blue_text = MarkupText(
+            f"<span fgcolor='{BLUE}'>●</span> Checkpoint",
+            font_size=18,
+        )
+
+        blue_text.next_to(key_text, DOWN*2.4, aligned_edge=key_text.get_left())
+        self.add(blue_text)
+
+        step_3 = MarkupText(
+            f'Based on the passed in configuration, weights are stored in\na variety of np.memmaps on disk or to a particular device.', 
+            font_size=24
+        )
+        step_3.move_to([2, 2, 0])
+
+        disk_left_col_base = [meta_mem.copy() for i in range(6)]
+        disk_right_col_base = [meta_mem.copy() for i in range(6)]
+        disk_left_col = VGroup(*disk_left_col_base).arrange(UP, buff=0)
+        disk_right_col = VGroup(*disk_right_col_base).arrange(UP, buff=0)
+        disk_rects = VGroup(disk_left_col,disk_right_col).arrange(RIGHT, buff=0)
+        disk_text = Text("Disk", font_size=24)
+        disk = Group(disk_rects,disk_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        disk.move_to([-4.,-1.25,0])
+        self.play(
+            Write(step_3, run_time=3),
+            Write(disk_text, run_time=1),
+            Create(disk_rects, run_time=1)
+        )
+
+        animations = []
+        for i,rect in enumerate(ckpt_cpu_arr):
+            target = rect.copy()
+            target.generate_target()
+            target.target.move_to(disk_left_col_base[i]).scale(0.5)
+            animations.append(MoveToTarget(target, run_time=1.5))
+        self.play(*animations)
+
+        self.play(FadeOut(step_3))
+
+        step_4 = MarkupText(
+            f'Then, the checkpoint is removed from memory\nthrough garbage collection.', 
+            font_size=24
+        )
+        step_4.move_to([2, 2, 0])
+
+        self.play(
+            Write(step_4, run_time=3)
+        )
+
+        self.play(
+            FadeOut(checkpoint_rect, checkpoint_text, *ckpt_arr, *ckpt_cpu_arr),
+        )
+
+        self.wait()      
--- a/manim_animations/big_model_inference/stage_4.py
+++ b/manim_animations/big_model_inference/stage_4.py
@ -0,0 +1,156 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from manim import *
+
+class Stage4(Scene):
+    def construct(self):
+        mem = Rectangle(height=0.5,width=0.5)
+        fill = Rectangle(height=0.46,width=0.46).set_stroke(width=0)
+        meta_mem = Rectangle(height=0.25,width=0.25)
+
+        cpu_left_col_base = [mem.copy() for i in range(6)]
+        cpu_right_col_base = [mem.copy() for i in range(6)]
+        cpu_left_col = VGroup(*cpu_left_col_base).arrange(UP, buff=0)
+        cpu_right_col = VGroup(*cpu_right_col_base).arrange(UP, buff=0)
+        cpu_rects = VGroup(cpu_left_col,cpu_right_col).arrange(RIGHT, buff=0)
+        cpu_text = Text("CPU", font_size=24)
+        cpu = Group(cpu_rects,cpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        cpu.move_to([-2.5,-.5,0])
+        self.add(cpu)
+
+        gpu_base = [mem.copy() for i in range(4)]
+        gpu_rect = VGroup(*gpu_base).arrange(UP,buff=0)
+        gpu_text = Text("GPU", font_size=24)
+        gpu = Group(gpu_rect,gpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        gpu.move_to([-1,-1,0])
+        self.add(gpu)
+
+        model_base = [mem.copy() for i in range(6)]
+        model_rect = VGroup(*model_base).arrange(RIGHT,buff=0)
+
+        model_text = Text("Model", font_size=24)
+        model = Group(model_rect,model_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        model.move_to([3, -1., 0])
+        self.add(model)
+
+        model_cpu_arr = []
+        model_meta_arr = []
+        
+        for i,rect in enumerate(model_base):
+            rect.set_stroke(YELLOW)
+
+            cpu_target = Rectangle(height=0.46/4,width=0.46/3).set_stroke(width=0.).set_fill(YELLOW, opacity=0.7)
+            
+            if i == 0:
+                cpu_target.next_to(cpu_left_col_base[0].get_corner(DOWN+LEFT), buff=0.02, direction=UP)
+                cpu_target.set_x(cpu_target.get_x()+0.1)
+            elif i == 3:
+                cpu_target.next_to(model_cpu_arr[0], direction=UP, buff=0.)
+            else:
+                cpu_target.next_to(model_cpu_arr[i-1], direction=RIGHT, buff=0.)
+            self.add(cpu_target)
+            model_cpu_arr.append(cpu_target)
+
+        self.add(*model_cpu_arr, *model_meta_arr)
+
+        disk_left_col_base = [meta_mem.copy() for i in range(6)]
+        disk_right_col_base = [meta_mem.copy() for i in range(6)]
+        disk_left_col = VGroup(*disk_left_col_base).arrange(UP, buff=0)
+        disk_right_col = VGroup(*disk_right_col_base).arrange(UP, buff=0)
+        disk_rects = VGroup(disk_left_col,disk_right_col).arrange(RIGHT, buff=0)
+        disk_text = Text("Disk", font_size=24)
+        disk = Group(disk_rects,disk_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        disk.move_to([-4.,-1.25,0])
+        self.add(disk_text, disk_rects)
+
+        cpu_disk_arr = []
+
+        for i in range(6):
+            target = fill.copy().set_fill(BLUE, opacity=0.8)
+            target.move_to(disk_left_col_base[i]).scale(0.5)
+            cpu_disk_arr.append(target)
+
+        self.add(*cpu_disk_arr)
+
+        key = Square(side_length=2.2)
+        key.move_to([-5, 2, 0])
+
+        key_text = MarkupText(
+            f"<b>Key:</b>\n\n<span fgcolor='{YELLOW}'>●</span> Empty Model",
+            font_size=18,
+        )
+
+        key_text.move_to([-5, 2.4, 0])
+
+        self.add(key_text, key)
+
+        blue_text = MarkupText(
+            f"<span fgcolor='{BLUE}'>●</span> Checkpoint",
+            font_size=18,
+        )
+
+        blue_text.next_to(key_text, DOWN*2.4, aligned_edge=key_text.get_left())
+        self.add(blue_text)
+
+        step_5 = MarkupText(
+            f'The offloaded weights are all sent to the CPU.', 
+            font_size=24
+        )
+        step_5.move_to([2, 2, 0])
+
+        self.play(Write(step_5, run_time=3))
+
+        for i in range(6):
+            rect = cpu_disk_arr[i]
+            cp2 = rect.copy().set_fill(BLUE, opacity=0.8).scale(2.0)
+            cp2.generate_target()
+            cp2.target.move_to(model_base[i])
+
+            if i == 0:
+                rect.set_fill(BLUE, opacity=0.8)
+                rect.generate_target()
+                rect.target.move_to(cpu_left_col_base[0]).scale(2.0)
+                
+                self.remove(*model_meta_arr, 
+                    *model_cpu_arr,
+                )
+
+            else:
+                rect.generate_target()
+                rect.target.move_to(cpu_left_col_base[i]).scale(2.0)
+            self.play(
+                MoveToTarget(rect),
+                MoveToTarget(cp2),
+                model_base[i].animate.set_stroke(WHITE)
+            )
+        self.play(FadeOut(step_5))
+
+        step_5 = MarkupText(
+            f'Finally, hooks are added to each weight in the model\nto transfer the weights from CPU to GPU\n\t\tand back when needed.', 
+            font_size=24
+        )
+        step_5.move_to([2, 2, 0])
+
+        self.play(Write(step_5, run_time=3))
+
+        arrows = []
+        animations = []
+        for i in range(6):
+            a = Arrow(start=UP, end=DOWN, color=RED, buff=.5)
+            a.next_to(model_base[i].get_left(), UP, buff=0.2)
+            arrows.append(a)
+            animations.append(Write(a))
+        self.play(*animations)
+        self.wait()  
--- a/manim_animations/big_model_inference/stage_5.py
+++ b/manim_animations/big_model_inference/stage_5.py
@ -0,0 +1,221 @@
+# Copyright 2022 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from manim import *
+
+class Stage5(Scene):
+    def construct(self):
+        mem = Rectangle(height=0.5,width=0.5)
+        fill = Rectangle(height=0.46,width=0.46).set_stroke(width=0)
+
+        meta_mem = Rectangle(height=0.25,width=0.25)
+
+        cpu_left_col_base = [mem.copy() for i in range(6)]
+        cpu_right_col_base = [mem.copy() for i in range(6)]
+        cpu_left_col = VGroup(*cpu_left_col_base).arrange(UP, buff=0)
+        cpu_right_col = VGroup(*cpu_right_col_base).arrange(UP, buff=0)
+        cpu_rects = VGroup(cpu_left_col,cpu_right_col).arrange(RIGHT, buff=0)
+        cpu_text = Text("CPU", font_size=24)
+        cpu = Group(cpu_rects,cpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        cpu.move_to([-2.5,-.5,0])
+        self.add(cpu)
+
+        gpu_base = [mem.copy() for i in range(4)]
+        gpu_rect = VGroup(*gpu_base).arrange(UP,buff=0)
+        gpu_text = Text("GPU", font_size=24)
+        gpu = Group(gpu_rect,gpu_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        gpu.move_to([-1,-1,0])
+        self.add(gpu)
+
+        model_base = [mem.copy() for i in range(6)]
+        model_rect = VGroup(*model_base).arrange(RIGHT,buff=0)
+
+        model_text = Text("Model", font_size=24)
+        model = Group(model_rect,model_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        model.move_to([3, -1., 0])
+        self.add(model)
+
+        model_arr = []
+        model_cpu_arr = []
+        
+        for i,rect in enumerate(model_base):
+            target = fill.copy().set_fill(BLUE, opacity=0.8)
+            target.move_to(rect)
+            model_arr.append(target)
+
+            cpu_target = Rectangle(height=0.46,width=0.46).set_stroke(width=0.).set_fill(BLUE, opacity=0.8)
+            cpu_target.move_to(cpu_left_col_base[i])
+            model_cpu_arr.append(cpu_target)
+
+        self.add(*model_arr, *model_cpu_arr)
+
+        disk_left_col_base = [meta_mem.copy() for i in range(6)]
+        disk_right_col_base = [meta_mem.copy() for i in range(6)]
+        disk_left_col = VGroup(*disk_left_col_base).arrange(UP, buff=0)
+        disk_right_col = VGroup(*disk_right_col_base).arrange(UP, buff=0)
+        disk_rects = VGroup(disk_left_col,disk_right_col).arrange(RIGHT, buff=0)
+        disk_text = Text("Disk", font_size=24)
+        disk = Group(disk_rects,disk_text).arrange(DOWN, buff=0.5, aligned_edge=DOWN)
+        disk.move_to([-4,-1.25,0])
+        self.add(disk_text, disk_rects)
+
+        key = Square(side_length=2.2)
+        key.move_to([-5, 2, 0])
+
+        key_text = MarkupText(
+            f"<b>Key:</b>\n\n<span fgcolor='{YELLOW}'>●</span> Empty Model",
+            font_size=18,
+        )
+
+        key_text.move_to([-5, 2.4, 0])
+
+        self.add(key_text, key)
+
+        blue_text = MarkupText(
+            f"<span fgcolor='{BLUE}'>●</span> Checkpoint",
+            font_size=18,
+        )
+
+        blue_text.next_to(key_text, DOWN*2.4, aligned_edge=key_text.get_left())
+        self.add(blue_text)
+
+        step_6 = MarkupText(
+            f'Now watch as an input is passed through the model\nand how the memory is utilized and handled.', 
+            font_size=24
+        )
+        step_6.move_to([2, 2, 0])
+
+        self.play(Write(step_6))
+
+        input = Square(0.3)
+        input.set_fill(RED, opacity=1.)
+        input.set_stroke(width=0.)
+        input.next_to(model_base[0], LEFT, buff=.5)
+
+        self.play(Write(input))
+
+        input.generate_target()
+        input.target.next_to(model_arr[0], direction=LEFT, buff=0.02)
+        self.play(MoveToTarget(input))
+
+        self.play(FadeOut(step_6))
+
+
+        a = Arrow(start=UP, end=DOWN, color=RED, buff=.5)
+        a.next_to(model_arr[0].get_left(), UP, buff=0.2)
+
+        model_cpu_arr[0].generate_target()
+        model_cpu_arr[0].target.move_to(gpu_rect[0])
+
+        step_7 = MarkupText(
+            f'As the input reaches a layer, the hook triggers\nand weights are moved from the CPU\nto the GPU and back.', 
+            font_size=24
+        )
+        step_7.move_to([2, 2, 0])
+
+        self.play(Write(step_7, run_time=3))
+
+        circ_kwargs = {"run_time":1, "fade_in":True, "fade_out":True, "buff":0.02}
+
+        self.play(
+            Write(a), 
+            Circumscribe(model_arr[0], color=ORANGE, **circ_kwargs),
+            Circumscribe(model_cpu_arr[0], color=ORANGE, **circ_kwargs),
+            Circumscribe(gpu_rect[0], color=ORANGE, **circ_kwargs),
+        )
+        self.play(
+            MoveToTarget(model_cpu_arr[0])
+        )
+
+        a_c = a.copy()
+        for i in range(6):
+            a_c.next_to(model_arr[i].get_right()+0.02, UP, buff=0.2)
+
+            input.generate_target()
+            input.target.move_to(model_arr[i].get_right()+0.02)
+
+            grp = AnimationGroup(
+                FadeOut(a, run_time=.5), 
+                MoveToTarget(input, run_time=.5), 
+                FadeIn(a_c, run_time=.5),
+                lag_ratio=0.2
+            )
+
+            self.play(grp)
+
+
+            model_cpu_arr[i].generate_target()
+            model_cpu_arr[i].target.move_to(cpu_left_col_base[i])
+
+
+            if i < 5:
+                model_cpu_arr[i+1].generate_target()
+                model_cpu_arr[i+1].target.move_to(gpu_rect[0])
+                if i >= 1:
+                    circ_kwargs["run_time"] = .7
+
+                self.play(
+                    Circumscribe(model_arr[i], **circ_kwargs),
+                    Circumscribe(cpu_left_col_base[i], **circ_kwargs),
+                    Circumscribe(cpu_left_col_base[i+1], color=ORANGE, **circ_kwargs),                    
+                    Circumscribe(gpu_rect[0], color=ORANGE, **circ_kwargs),
+                    Circumscribe(model_arr[i+1], color=ORANGE, **circ_kwargs),
+                )
+                if i < 1:
+                    self.play(
+                        MoveToTarget(model_cpu_arr[i]), 
+                        MoveToTarget(model_cpu_arr[i+1]),
+                    )
+                else:
+                    self.play(
+                        MoveToTarget(model_cpu_arr[i], run_time=.7), 
+                        MoveToTarget(model_cpu_arr[i+1], run_time=.7),
+                    )
+            else:
+                model_cpu_arr[i].generate_target()
+                model_cpu_arr[i].target.move_to(cpu_left_col_base[-1])
+                input.generate_target()
+                input.target.next_to(model_arr[-1].get_right(), RIGHT+0.02, buff=0.2)
+
+                self.play(
+                    Circumscribe(model_arr[-1], color=ORANGE, **circ_kwargs),
+                    Circumscribe(cpu_left_col_base[-1], color=ORANGE, **circ_kwargs),
+                    Circumscribe(gpu_rect[0], color=ORANGE, **circ_kwargs),
+                )
+
+                self.play(
+                    MoveToTarget(model_cpu_arr[i])
+                )
+
+            a = a_c
+            a_c = a_c.copy()
+
+        input.generate_target()
+        input.target.next_to(model_base[-1], RIGHT+0.02, buff=.5)
+        self.play(
+            FadeOut(step_7),
+            FadeOut(a, run_time=.5), 
+        )
+
+        step_8 = MarkupText(
+            f'Inference on a model too large for GPU memory\nis successfully completed.', font_size=24
+        )
+        step_8.move_to([2, 2, 0])
+
+        self.play(
+            Write(step_8, run_time=3),
+            MoveToTarget(input)
+        )
+
+        self.wait()