Mythbusting Google’s New Trillion-Parameter AI Language Model
Earlier this month, Google announced that its researchers have developed and benchmarked techniques enabling them to train a language model containing more than one trillion parameters. If you're not sure what parameters are, we highly recommend that you check out our previously published blog post about Open AI's language model, GPT-3. In the context of advanced language models, parameters are the key to machine learning algorithms, the part of the model that's learned from historical training data. As a blanket statement: more parameters equals a vastly more sophisticated language model, which is why when Google claimed it used over one trillion parameters to train its new language model, the conversational AI community was flush with excitement and prognostications.
So, in service to all those curious to learn more about Google's latest proclamation, we sifted through the available facts and research to distinguish between what it is and what it isn't.
What It Is
This new language model (as of now nameless) is a roaring accomplishment due to the techniques its developers employed to train it with a staggering 1.6 trillion parameters (the most to date) including an up to 4x speedup over the previously largest Google-developed language model, T5-XXL.
A paper released by the language model's researchers states that large-scale training is still one of the most effective paths toward powerful models. However, these mammoth language models are few and far between because such massive-scale training is computationally intensive, often prohibitively so.
To tackle this obstacle, the developers implemented The Switch Transformer technique, which uses only a subset of the model's parameters that transform input data within it. The Switch Transformer is based on an AI model first introduced in the early '90s called A Mixture of Experts. These experts (or learners) are not of the Homosapien variety but are instead composed of various neural networks and machine learning models. In as simplest terms possible, these models, each responsible for a specific task, live within a larger “mothership” model and are orchestrated by a "gating network" that determines which experts to interact with to acquire the desired data.
To quote Tristan Greene of TNW: "Put simply, the Brain team (the model's researchers) has figured out a way to make the model itself as simple as possible while squeezing in as much raw compute power as possible to make the increased number of parameters possible.”
Applying the Switch Transformer awarded the developers over 7x speedup without having to exhaust exuberant computational resources. In one test where a Switch Transformer model was trained to translate between over 100 different languages, the researchers observed "a universal improvement" across 101 languages, with 91% of the languages benefitting from an over 4x speedup compared with a baseline model.
To tie this all together, Google's new language model is a laudable achievement in computational linguistics and AI, but should you expect to witness this force in action during your next exchange with an AI-powered chatbot?
What It Isn’t
To the disappointment of conversational AI enthusiasts, this language model isn't suitable for real-world or business-setting scenarios; consuming over one trillion parameters means that the language model absorbed the biases ingrained in the public data it was trained on. Therefore, it is highly likely that when used "in the wild", human to language-model engagements could turn sour, resulting in an ill-informed, or offended, user.
Furthermore, regulating and policing such an extensive cache of data is a tremendous challenge that opens the door to malignant perpetrators using the model to spread misinformation and sow chaos and discord.
As it stands, this language model is geared towards academic study, and even if it was offered to the general public for sale, like Open AI's GPT-3 (Generative Pre-trained Transformer 3) which preceded it, the design is inherently unconducive for customer-facing conversational AI.
Although over one trillion parameters can give the impression that this language model was trained on all the data available online, it cannot automatically update itself - meaning Google’s new model only as good as the data it was fed (plentiful as it may be). One variable, one new product, an update of service, or a change in content can topple the entire house of cards and mislead or misguide a user seeking to accomplish a task. Furthermore, these models are not accommodative to minute iterations and judging by GPT-3's pricing model, any such adjustments will come attached with considerable costs.
The Tip of the Iceberg
This past year has seen inspiring innovation in language models and conversational AI that would have been inconceivable only a decade ago. With every milestone reached and each obstacle overcome, the appetite for more outstanding achievements grows steadily in the boardrooms of tech giants Google and Open AI (which are locked in a language model arms race).
If and how these models will be put into widespread commercial use is still unclear, but their mere existence implies just how much focus, attention, and funds are currently being funneled towards conversational AI. The good news is that while we wait to see how this timeline progresses, there are already exceptional conversational AI solutions for businesses on the market, some fueled with technologies and computational linguistic acrobatics that are no less impressive than what Google and its giant tech peers are rolling out.
Want to learn more about language models, machine learning, and computational linguistics? Check out these related thought pieces from Hyro:
- Infographic: The State of Conversational AI in 2021
- Webinar: Appointments by Conversational AI - Understanding Everything Your Patients Say
- What’s the Difference Between Chatbots and Conversational AI
- Graph Programming