Week 5: Attention
Objective
Understand attention with concrete numbers before thinking about giant transformer models.
Required Videos
- One short transformer attention explainer from StatQuest, 3Blue1Brown, or another trusted source
- Optional: rewatch the section after the notebook makes sense
Tasks
- Open
week5_attention.ipynb in Jupyter.
- Run every cell and inspect the printed scores and weights.
- Change the query vector and predict which token should get the most attention.
- Read and annotate
attention_notes.md.
Deliverables
- A completed notebook run
- Notes in plain language on queries, keys, and values
- One modified numeric example
Checkpoint Questions
- What are attention scores measuring?
- Why do we normalize scores into weights?
- How do values differ from keys?
- What changed when you edited the query?