What is GPT-3?

Released in mid 2020, GPT-3, or Generative Pretrained Transformer 3, is OpenAI’s third generation language prediction model within their GPT-n series.

The tool uses pre-trained algorithms and deep learning in order to generate human-like text. GPT3 algorithms were fed an exuberant amount of data, 570GB to be exact, by using  a plethora of OpenAI texts, something called CommonCrawl (a dataset created by crawling the internet).

GPT-3’s capacity exceeds that of Microsoft’s Turing NLG ten times over and it’s known to be the largest existing neural network model existing to date. 

The GPT-3 model is so large that it cannot be stored and operated from a standard laptop, which is why OpenAI released only an API for it, instead of the model itself, as it did for GPT-2.

What Can GPT3 Do?

2019’s GPT-2 had already spiked people’s interest by proving to be a ready-to-go solution for multiple downstream natural language processing tasks that doesn’t require fine-tuning. It’s most novel successor however has completely surpassed expectations by using a more refined formatting approach to draw inferences. While other language prediction models such as Google’s BERT and Microsoft’s Turing NLP require fine-tuning in order to perform downstream tasks, GPT-3 does not. GPT-3 does not require the integration of additional layers that run on top of sentence encodings for specific tasks, it uses a single model for all downstream tasks. 

GPT3 can produce practically any type of content that has a language structure. Therefore, it can be used to answer questions, summarize large pieces of information, create computer code and even produce original texts such as poems and creative fiction.

Text Generation

This is GPT3’s most impressive feature – it’s ability to generate the most human-like quality text ever before seen created by a computer. The application can write coherent articles when just given a title, subtitle and the prompt word ‘Article’. The quality of these articles are so superior, they even manage to fool humans most of the time.

Common Natural Language Processing Tasks

Although producing original articles is an incredibly impressive feature, GPT-3 can also be re-programmed for other general natural language processing tasks without needing to be explicitly fine-tuned. This is a revolutionary leap within the computational space. GPT-3 is capable of doing anything without needing prior examples to understand what is required from it. For example, it can translate tasks and answer questions but then it can quickly pivot to conduct arithmetic computations.  

How Does GPT3 Work?

GPT-3, at its core, is a language prediction model. Despite it using the same attention-based architecture as its GPT-2 predecessor, it surpassed it by three orders of magnitude. Putting this into perspective, while GPT-2 has 1.5 billion parameters and was trained using 40GB of internet text (the equivalent of 10 billion tokens, one token being 4 characters), the GPT-3 has 175 billion parameters and was trained using 499 billion tokens. 

Let that sink in. 175 billion parameters. What does that even mean?

It means that in order for GPT-3 to acquire such an exuberant amount of information, it required more computing power than any other language processing model preceding it. To be exact, GPT-3 required 3.14e23 flops of computing in order for it to be trained. Considering that a mere 15 Tflops would take around 665 years to run, the amount of power needed to compute 3.14e23 flops in a practical amount of time is unimaginable.

Unfortunately time is not the only daunting factor needing to be dealt with, storage is a challenging issue as well. 175 billion parameters need 700GB of memory (each parameter requires 4 Bytes), which is over 10 times the amount of a single GPU’s maximum memory. 

How did OpenAI manage to handle the inconceivable amount of storage and time required for GPT-3 to operate?

OpenAI have yet to disclose the exact way in which they operate their training infrastructure and model implementation, however we know it cost them a hefty $46 million to perform a single model run. 

How Was GPT-3 Trained?

GPT-3 was trained through a process of unsupervised learning on an extensive corpus of text data. The training process encompassed two primary stages: pre-training and fine-tuning.


  1. Pre-training: During the pre-training phase, GPT-3 learned to predict the next word in a sentence by considering the context of preceding words. This was accomplished using a deep neural network architecture known as a Transformer. The model was exposed to a vast dataset containing diverse textual content from sources like books, articles, websites, and more. Through this process, it acquired the ability to discern patterns, grasp grammar rules, absorb factual information, and even exhibit a certain level of reasoning based on the dataset. The Transformer architecture facilitated the effective modeling of long-range dependencies and contextual cues. By leveraging self-attention mechanisms, the model could assign varying levels of importance to different words in a sentence, enabling a comprehensive understanding of context across extended sections of text.
  2. Fine-tuning: Following the pre-training stage, GPT-3 underwent fine-tuning for specific tasks. This involved training the model on a narrower dataset tailored to a particular application. The fine-tuning process was supervised, utilizing labeled data to instruct the model in accomplishing distinct tasks such as language translation, text completion, question answering, and more.

It’s crucial to note that GPT-3 did not receive task-specific labeled data during the pre-training phase. Instead, it gained a foundational understanding of language and text generation. The fine-tuning step then adapted this general knowledge acquired during pre-training to perform targeted tasks with proficiency.

GPT-3 Challenges 

Although its ability to generate language has been dubbed as the best AI has yet to produce, GPT-3 still comes with a few considerations that should not be ignored.

Firstly, the price of this tool surpasses many organization’s budgets by a tremendous amount, making it unattainable for the majority of the population. 

Secondly, the output produced by GPT-3 is still not perfect, especially when it produces long copy. There have been instances where the computer produced insensitive content (racist, sexist, etc) suggesting that the quality of its sentiment analysis is still relatively low.

Thirdly, once the GPT-3 model has been trained, it cannot be ‘re-trained’, meaning additional information cannot be added to its knowledge graph. The repercussions of this is that the model is not as adaptable; once something changes, the entire model must be trained from scratch.

Nevertheless, these issues are not drastic and will definitely be addressed over time as its price begins to drop and its algorithms become more fine-tuned with increasing volumes of training data.

Unlock your digital potential with the #1
adaptive communications platform.