AI Course/week05_attention/attention_notes.md

Course file

attention_notes.md

week05_attention/attention_notes.md

Attention Notes

Core Idea

Attention is a smart lookup rule. The current token asks, “Which earlier pieces of information matter most right now?”

What To Notice In The Notebook

  • The query is fixed for the current step.
  • Each key gets a score based on how well it matches the query.
  • The scores become weights after normalization.
  • The final output is not just one value vector. It is a weighted mix.

Why This Matters

Without attention, a model has a harder time deciding which earlier words are relevant. With attention, it can re-focus for each new token.

Questions To Answer In Your Own Words

  • Why do the weights add up to 1?
  • Why can one token matter more than another?
  • What would happen if all scores were almost the same?
  • How is this different from just averaging all values?

Co-Author Prompt

If this notebook felt too easy, too hard, or too abstract, write down one specific change that would improve it for the next student.