Course file
week05_attention/week5_attention.ipynb
# Week 5: Attention by Numbers This notebook uses a tiny numeric example so attention feels concrete before it becomes a large matrix operation inside a transformer.
## Idea - A **query** asks: what kind of information am I looking for? - A **key** offers: what kind of information do I contain? - A **value** says: if I get picked, what information should I contribute? The query compares itself to every key, turns those scores into weights, and then builds a weighted sum of the values.
import numpy as np
np.set_printoptions(precision=4, suppress=True)
query = np.array([1.0, 0.5])
keys = np.array([
[1.4, 0.6],
[0.1, 0.2],
[0.7, 0.2],
])
scores = keys @ query
weights = np.exp(scores) / np.exp(scores).sum()
print('Raw attention scores:', scores)
print('Attention weights:', weights)
Raw attention scores: [1.7 0.2 0.8]\nAttention weights: [0.5898 0.1316 0.2786]\n
values = np.array([
[0.9, 0.1],
[0.2, 0.8],
[0.5, 0.1],
])
output = weights @ values
print('Weighted output vector:', output)
Weighted output vector: [0.7313 0.1687]\n