Glossary

Activation Function

A rule applied after a weighted sum that adds nonlinearity. Without it, a neural network would behave like a simple linear model.

Attention

A way for a model to decide which earlier pieces of information matter most for the current step.

Backpropagation

The process of computing how much each parameter contributed to the final error so the model can update itself.

Context Window

The amount of recent input a language model can look at when predicting the next token.

Epoch

One full pass through a dataset during training.

Gradient

The direction and size of change that tells a parameter how to reduce loss.

Inference

Using a trained model to make predictions, generate text, or answer a question.

Learning Rate

The step size used when updating parameters during training.

Loss

A number that measures how wrong the model currently is.

Overfitting

When a model memorizes training data so strongly that it performs worse on new examples.

Parameter

A learned value inside the model, such as a weight or bias.

Prompt

The input instruction or context given to a language model.

Query / Key / Value

Three vector roles used in attention:

query: what the current token is looking for
key: what each token offers for matching
value: the information each token contributes if selected

Retrieval-Augmented Generation (RAG)

A pattern where a system retrieves relevant source material first and then uses it to answer more accurately.

Token

A chunk of text used by language models. A token can be a word, part of a word, punctuation, or even whitespace.

Training Loop

The repeated cycle of forward pass, loss calculation, gradient calculation, and parameter update.

glossary.md