In late October 2020, I gained access to GPT-3 after emailing Greg Brockman and pitching him the need for broader access distribution. My mind was blown away by its potential in the first hour of tinkering in the GPT-3 playground.
Several months in, it became evident that OpenAI would become the most valuable company of the decade. Some said that it wouldn’t happen because of their non-profit status, while others considered my remarks ridiculous, citing things like “LLMs don’t scale,” “more data and compute isn’t enough,” etc.
Fast forward to 2023, and OpenAI now expects to rake in a staggering $1 billion in revenue by 2024 while inadvertently becoming a consumer tech company with the launch of ChatGPT as a destination app.
So where do we go from here?
We are yet to realize the full potential of LLMs like GPT-4
This dynamic isn’t new. In its early days, the World Wide Web was primarily used for research and document sharing. The skeuomorphic design patterns led the transition until native web-based experiences—like social networking or content streaming—emerged.
AI-native UX is blurry now, but we can see cues in the early product experiments. Tome is trying to reinvent digital storytelling, and Runway is reimagining video creation. Will they be successful? Some of them will. So instead of asking how to build a better spreadsheet, try entertaining the idea of AI replacing spreadsheets completely.
Novel use case and vertical focus trump everything
BigTech enjoys enormous distribution and can worry less about what others are doing. Startups don’t share the same luxury. If you are a new AI startup, the concepts of competitive advantage, defensibility, and economic moats should be your sacred words. There’s a saying that first-time founders are obsessed with products, and second-time founders are obsessed with distribution. AI founders should become obsessed with both.
With every update, ChatGPT will be wiping out the wave of AI upstarts that are simple model wrappers. To achieve success, you should be building a 10x improvement for a niche use case. Ideally, something that hasn’t been possible before with non-AI tech. Bonus points if you’re building a product with a focus on a neglected, conventionally unsexy industry vertical. Think Herdwatch for [blank].
Slow-walking and then running toward AGI
Since doing harm is easier than doing good, we need time to explore the current AI capability and its limits. I’m sure that GPT-4 is not a theoretical or practical limit of LLMs.
Firstly, the world’s yearly hardware budget is $1 trillion, and GPT-4 training run cost was ~$24 million (based on 1 trillion parameters). So we are not even close to limits on the amount of compute. Secondly, OpenAI has developed a projection graph, and it’s likely that they deliberately chose the quality of the model given some heuristics around humanity’s preparedness. No one wants to sleep-walk into the AGI territory without any risk management whatsoever.
Besides, LLMs allow automating most of the tasks in ML research and engineering, which has the potential to shrink AGI timelines exponentially. Together with AI safety, this will be a top priority work for anyone in the field, given its economic potential. Human labor accounts for 50% of the world’s GDP (~$50tr/year). It’s extremely probable that AI would automate at least 10% of that work—and that’s worth ~$5tr/year.
Running LLMs on edge and performance levers
Three factors drive the advance of AI: algorithms, data, and accelerated compute. A ton of innovation is happening across all fronts. The lion’s share of today’s GPUs is used for inference, and it’s costly. For some use cases, cloud-based inference won’t work at all due to latency. Hence, LLMs need to run on edge.
Having a hybrid approach where one LLM runs on edge and offloads more heavy workloads to a cloud-based LLM that’s optimized for heavier tasks would be a possible step forward. For example, LLaMA has been compressed and deployed on MacBook M1, Google Pixel 6, and even 4GB RaspberryPi, thanks to a technique called quantization.
Models like GPT-4 are enormous, but the amount of data they’re trained on isn’t commensurate with their overall size. So a lot of research is happening to find a more efficient architecture and training algorithms. Low-cost fine-tuning, however, is already at your disposal. Standford researchers fine-tuned the smallest of the LLaMA models (7B) with 52K examples for ~$600. It can get compressed down to a 4GB file with 4-bit quantization and generate results that compare well to GPT-3 in initial human assessment.
But for cutting-edge applications, more powerful chips are needed to withhold energy and computing hunger of LLMs. For instance, Amazon Search reduced ML inference costs by 85% with AWS Inferentia. GPUs will still be the mainstream for the foreseeable future, but more developments in the AI chipset arena are underway.
Data is another lever to up the performance of AI, but we need richer and more diverse datasets. That’s why, OpenAI made Whisper soo cheap. Speech-to-text allows researchers to tap into the enormous lake of diverse data—be it a podcast or TV show—the more, the merrier.
It may not be evident now, but LLMs are the tools of immense political influence. If data is the new oil, LLMs are oil refineries, and companies behind models are the Standard Oils of today. As Jack Clark pointed out, oil monopolies led to the enactment of anti-trust laws and the formation of Intelligence services. AI development will eventually give rise to new regulations and laws in every major jurisdiction, too.
In the industry where years are like weeks, and weeks are like days, it’s easy to get lost in the fine print of everything that’s happening. The future, however, is painted in broad strokes, and the finer details will only become clear as time passes and events take place. Understanding the lay of the land is 80% of the job of predicting the future, and that’s exactly what you’ve just done.