Course file
week03_micrograd/micrograd_notes.md
Micrograd is small enough that you can actually read the whole thing. That is its power.
Focus on these ideas:
Value Is More Than A NumberEach object stores:
That means the program keeps both the answer and the history of how the answer was made.
Each operation only needs to know how to push gradients to its direct parents.
Examples:
Backprop works because many small local rules combine into a full-chain update.
You cannot compute gradients backward from a node before the nodes after it are ready. That is why a topological ordering is useful.
If a node affects the output through more than one path, it must add gradient contributions together. This is one of the most important ideas in the whole course.
You are not learning micrograd to become a micrograd expert. You are learning it so larger frameworks feel less mysterious later.