VI.    Mathmatic Structure in GPT Training






Tokenization in GPT-3:  

GPT operates on a linguistic unit known as a token, which is comprised of a sequence of characters that can represent a full word or a fragment of a word.



Attention/ Occurrence in Token:  

the purpose of attention is to predict which input tokens to focus on and how much for each output in the sequence.


Logistic Probability behind Guessing:  

GPT-3 uses the principles of probability to determine the likelihood of each word’s appearance through its training, and then selects the next word accordingly.