OpenAI's GPT-3 Language Model: A Technical Overview GPT-3 Key Takeaways GPT-3 shows that language model performance scales as a power-law of model size, dataset size, and the amount of computation. GPT-3 demonstrates that a language model trained on enough data can solve NLP tasks that it has never encountered. That is, GPT-3 studies the model as a general solution for many downstream jobs without fine-tuning. The cost of AI is increasing exponentially. Training GPT-3 would cost over $4.6M using a Tesla V100 cloud instance. The size of state-of-the-art (SOTA) language models is growing by at least a factor of 10 every year. This outpaces the growth of GPU memory. For NLP, the days of "embarrassingly parallel" is coming to the end; model parallelization will become indispensable. Although there is a clear performance gain from increasing the model capacity, it is not clear what is really going on under the hood. Especially, it remains a question of whether the model has learned to do reasoning, or simply memorizes training examples in a more intelligent way. ...... more https://lambdalabs.com/blog/demystifying-gpt-3/ Note: Data contamination is very much a problem with human learning.