A very nice visual explanation of how Gradient Checkpointing works is in this blog post by @yaroslavvb. https://t.co/IZLkEYf8nZ A brief summary from the blog on how GC stores some activations and uses partial forward passes for backprop. (Visualizations are from the blog) https://t.co/tM3pXBYioY https://t.co/0UL0ez9avJ
— Prateek Yadav (@prateeky2806) Oct 29, 2023
from Twitter https://twitter.com/prateeky2806
October 28, 2023 at 10:10PM
via IFTTT
No comments:
Post a Comment