Released in mid 2020, GPT-3, or Generative Pretrained Transformer 3, is OpenAI’s third generation language prediction model within their GPT-n series.
The tool uses pre-trained algorithms and deep learning in order to generate human-like text. GPT-3 algorithms were fed an exuberant amount of data, 570GB to be exact, by using a plethora of OpenAI texts, something called CommonCrawl (a dataset created by crawling the internet).
GPT-3’s capacity exceeds that of Microsoft’s Turing NLG ten times over and it’s known to be the largest existing neural network model existing to date.
The GPT-3 model is so large that it cannot be stored and operated from a standard laptop, which is why OpenAI released only an API for it, instead of the model itself, as it did for GPT-2.
2019’s GPT-2 had already spiked people’s interest by proving to be a ready-to-go solution for multiple downstream natural language processing tasks that doesn’t require fine-tuning. It’s most novel successor however has completely surpassed expectations by using a more refined formatting approach to draw inferences. While other language prediction models such as Google’s BERT and Microsoft’s Turing NLP require fine-tuning in order to perform downstream tasks, GPT-3 does not. GPT-3 does not require the integration of additional layers that run on top of sentence encodings for specific tasks, it uses a single model for all downstream tasks.
GPT-3 can produce practically any type of content that has a language structure. Therefore, it can be used to answer questions, summarize large pieces of information, create computer code and even produce original texts such as poems and creative fiction.
This is GPT-3's most impressive feature - it’s ability to generate the most human-like quality text ever before seen created by a computer. The application can write coherent articles when just given a title, subtitle and the prompt word ‘Article’. The quality of these articles are so superior, they even manage to fool humans most of the time.
Although producing original articles is an incredibly impressive feature, GPT-3 can also be re-programmed for other general natural language processing tasks without needing to be explicitly fine-tuned. This is a revolutionary leap within the computational space. GPT-3 is capable of doing anything without needing prior examples to understand what is required from it. For example, it can translate tasks and answer questions but then it can quickly pivot to conduct arithmetic computations.
GPT-3, at its core, is a language prediction model. Despite it using the same attention-based architecture as its GPT-2 predecessor, it surpassed it by three orders of magnitude. Putting this into perspective, while GPT-2 has 1.5 billion parameters and was trained using 40GB of internet text (the equivalent of 10 billion tokens, one token being 4 characters), the GPT-3 has 175 billion parameters and was trained using 499 billion tokens.
Let that sink in. 175 billion parameters. What does that even mean?
It means that in order for GPT-3 to acquire such an exuberant amount of information, it required more computing power than any other language processing model preceding it. To be exact, GPT-3 required 3.14e23 flops of computing in order for it to be trained. Considering that a mere 15 Tflops would take around 665 years to run, the amount of power needed to compute 3.14e23 flops in a practical amount of time is unimaginable.
Unfortunately time is not the only daunting factor needing to be dealt with, storage is a challenging issue as well. 175 billion parameters need 700GB of memory (each parameter requires 4 Bytes), which is over 10 times the amount of a single GPU’s maximum memory.
How did OpenAI manage to handle the inconceivable amount of storage and time required for GPT-3 to operate?
OpenAI have yet to disclose the exact way in which they operate their training infrastructure and model implementation, however we know it cost them a hefty $46 million to perform a single model run.
Although its ability to generate language has been dubbed as the best AI has yet to produce, GPT-3 still comes with a few considerations that should not be ignored.
Firstly, the price of this tool surpasses many organization’s budgets by a tremendous amount, making it unattainable for the majority of the population.
Secondly, the output produced by GPT-3 is still not perfect, especially when it produces long copy. There have been instances where the computer produced insensitive content (racist, sexist, etc) suggesting that the quality of its sentiment analysis is still relatively low.
Thirdly, once the GPT-3 model has been trained, it cannot be ‘re-trained’, meaning additional information cannot be added to its knowledge graph. The repercussions of this is that the model is not as adaptable; once something changes, the entire model must be trained from scratch.
Nevertheless, these issues are not drastic and will definitely be addressed over time as its price begins to drop and its algorithms become more fine-tuned with increasing volumes of training data.