VI. Mathmatic Structure in GPT Training
Tokenization in GPT-3:
GPT operates on a linguistic unit known as a token, which is comprised of a sequence of characters that can represent a full word or a fragment of a word.
Attention/ Occurrence in Token:
the purpose of attention is to predict which input tokens to focus on and how much for each output in the sequence.
Logistic Probability behind Guessing:
GPT-3 uses the principles of probability to determine the likelihood of each word’s appearance through its training, and then selects the next word accordingly.