AI Course/week05_attention/attention_notes.md

Course file

attention_notes.md

week05_attention/attention_notes.md

Back to course dashboard Open course README

Attention Notes

Core Idea

Attention is a smart lookup rule. The current token asks, “Which earlier pieces of information matter most right now?”

What To Notice In The Notebook

The query is fixed for the current step.
Each key gets a score based on how well it matches the query.
The scores become weights after normalization.
The final output is not just one value vector. It is a weighted mix.

Why This Matters

Without attention, a model has a harder time deciding which earlier words are relevant. With attention, it can re-focus for each new token.

Questions To Answer In Your Own Words

Why do the weights add up to 1?
Why can one token matter more than another?
What would happen if all scores were almost the same?
How is this different from just averaging all values?

Co-Author Prompt

If this notebook felt too easy, too hard, or too abstract, write down one specific change that would improve it for the next student.