The recent launches in AI have created a tremendous buzz all around. Most interest is around category of LLMs like chatGPT. It is hard not to feel excited as well as scared while using chatGPT as it completes your code, suggests food recipes, write emails, essays, project plans and what not. Everyday a new use case gets posted on twitter, and projects like BabyAGI & AutoGPT are showing great potential of stacking multiple GPTs together and taking on complicated tasks. This rapid rate of releases pose an important question, is AI on a path of exponential growth to AGI, or is it going to plateau after a great productivity boost, as all technological trends do.

Image credit - waitbutwhy.com

Considering that AI as a term encompasses lots of trends, lets focus on the trend closest to an AGI, i.e. LLMs.

For anything to go better at exponential rate, there has to be some kind of network scale effect. For example organisms, cities, social networks, keep getting better on some important parameters as they keep growing (till systemic constraints hit). Is there a similar kind of fly-wheel dynamics for LLMs? Like more smarter LLMs become, more users use it, makes them even more smarter, and finally reach a stage where LLMs start self-improving themselves and become godlike AGIs. To answer this, it is important to briefly see what’s happening inside LLMs.

LLMs are trained on textual human knowledge on the web. At it’s core, LLMs have a neural-net linking all language words with each other through probabilistic linkages. So for any given input text, they predict the next word, and then add that to their input, and continue this recursively, until they feel that answer is long enough. This concept of text neural-net is decades old, but suddenly with huge amount of training data (45TB of text), LLMs seems suddenly capable of completing codes, write essays, poems, movie scripts etc.

Baby GPT with 2 tokens 0/1 and context length 3, viewing it as a finite state markov chain. It was trained on the sequence 111101111011110 for 50 iterations. For any 3 binary digits, it predicts next set of digits. Source - Andrej Karpathy Twitter

Post training on huge amount of text, they are trained with real-time human feedback (RLHF) to ensure their humanness in interaction with users. This makes them extremely efficient gateway to access knowledge, a leap over search engines.

But it also means that they don’t create new knowledge. If there is any concept that hasn’t been expressed in text form over internet, that concept will never be expressed by a LLM. But for the concepts they have been trained on, LLMs act human-like expert in their responses. Though remarkable, this shouldn’t lead us to over-estimate their capabilities, and we shouldn’t assume that they can do exploratory reasoning, and create original knowledge.

You can stack together 100 GPTs in a room communicating together, it won’t turn into a more profound neural net. It only makes them capable of executing more complex tasks, as work gets divided among executing GPTs and monitoring GPTs. Neither having billions of chats with millions of humans would make them more profound. It only makes them better at talking to humans, i.e. understanding what exactly an user is looking for when she posts a question.

LLMs will become become more profound once more human discoveries happen, explanation/results converted into text, and passed on to their training set.

Someone can ask, why then LLMs keep getting better? GPT 2,3…4 each seems to be taking big leaps. How much more capability LLMs will add with current human knowledge itself?

There are signs that we are near the peak.

So far, the main lever for improving the LLMs was amount of data to train it on. As the training data kept on increasing, the capability kept on improving. GPT-1 was trained on 40GB of text, 2 on 570GB, 3 on 45TB and 4 is estimated on few hundred TBs. But now as openAI and other AI experts are admitting, we are close to maxima here. Any additional data will be yielding diminishing returns to model’s capability.

Now that we have established that LLMs don’t create new knowledge themselves via scale effects, and they are already around top of their capability through training on existing human knowledge, the only way they will improve from now is

  1. Humans discover new knowledge, and LLMs get trained on it
  2. Innovations in Neural Net architecture (for example, transformer architecture upgrade)

And both of these paths aren’t exponential in nature.

But there will be a wave of productivity explosion by implementation of these LLMs. Many human workflows will get upgraded within a decade through new features, products and startups. Similar to cloud technology shift (but much bigger in scale), we will keep upgrading till we extract maximum out of this technological shift, and move on to new one.