Write4U
Valued Senior Member
OpenAI's GPT-3 Language Model: A Technical Overview
GPT-3 Key Takeaways
https://lambdalabs.com/blog/demystifying-gpt-3/
Note: Data contamination is very much a problem with human learning.
GPT-3 Key Takeaways
- GPT-3 shows that language model performance scales as a power-law of model size, dataset size, and the amount of computation.
- GPT-3 demonstrates that a language model trained on enough data can solve NLP tasks that it has never encountered. That is, GPT-3 studies the model as a general solution for many downstream jobs without fine-tuning.
- The cost of AI is increasing exponentially. Training GPT-3 would cost over $4.6M using a Tesla V100 cloud instance.
- The size of state-of-the-art (SOTA) language models is growing by at least a factor of 10 every year. This outpaces the growth of GPU memory. For NLP, the days of "embarrassingly parallel" is coming to the end; model parallelization will become indispensable.
- Although there is a clear performance gain from increasing the model capacity, it is not clear what is really going on under the hood. Especially, it remains a question of whether the model has learned to do reasoning, or simply memorizes training examples in a more intelligent way.
...... moreOne novel challenge GPT-3 has to deal with is data contamination. Since their training dataset is sourced from the internet, it is possible that the training data will overlap with some of the testing datasets. Although GPT-2 has touched this topic, it is particularly relevant to GPT-3 175B because its dataset and model size is about two orders of magnitude larger than those used for GPT-2, creating increased potential for contamination and memorization.
https://lambdalabs.com/blog/demystifying-gpt-3/
Note: Data contamination is very much a problem with human learning.