The Gen AI Spice Rack

`“He who controls the spice, controls the universe”`

That quote from the famous book (and movies) by Frank Herbert told of the rare, but extremely valuable spice that was only found on the planet Arrakis. This spice was the cause of wars, base of currency and separator of classes. Those who had it, had all the power. Those who didn’t, well…

Wikipedia: In Dune, Arrakis is the most important planet in the universe, as it is the only source of the drug melange. Melange (or, "the spice") is the most essential and valuable commodity in the universe, as it extends life and makes safe interstellar travel possible (among other uses).

Replace spice with anything rare and critical and the rest of the quote applies. In today’s Gen AI world, the spice is GPUs.

GPU are powering Generative AI

GPUs (Graphical Processing Unit) - the cousin of the CPU, but designed to scale at floating point computation and vector calculations was primarily hardware for graphics intensive applications (game play, game design, video and graphics, etc.). But the advances in AI and Gen AI require significant data and compute. Data storage is relatively cheap and commoditized, but specialized compute is not.

There are only a handful of companies that make GPUs - NVIDIA being the most popular. Apple’s proprietary silicon Metal, Google’s TPUs and AMD’s Radeon are also strong competitors. NVIDIA is now a trillion dollar valued company (almost overnight) thanks to the sudden rise in demand from Gen AI!

Large Language Models (LLMs), image generation, audio/video generation all require heavy compute workloads and hardware and software engineering to take advantage of the compute. Massive data centers and deep wallets are needed to build and train these models.

And so, only major tech companies can afford to buy the quantities and configuration of GPUs and see the competitive value in doing so. Therefore, almost all advances in Gen AI will be dictated and controlled by these companies (Amazon, Google, Meta, Apple, Tesla, etc.). The conclusion is - making better models means having more even GPUs to train on even larger datasets to build humongous neural networks.

<aside> 🚘 Tesla just received an order of 10,000 NVIDIA H100 GPUs to power their $300MM supercomputer (Dojo). This is an investment in AI that will pay off for Tesla as AI is their competitive advantage to enable driverless cars.

</aside>

Eventually, software and hardware optimization will catch up as it always has to enable these complex workloads on traditional consumer-grade hardware (CPU or GPU). You can already run quantized versions of LLMs and Stable Diffusion models on a MacBook Pro (ex: llama.cpp). And there are prototypes running on an iPhone even! The open source models are enabling this innovation while also leveling the playing field. But even the open source models are developed by well financed and resourced companies / entities (ex: LLaMA is from Meta, Falcon from the UAE).

Don’t have hundreds or millions of dollars? No problem

You too can be a spice trader!

Here are some options for how to participate in the revolution without a $100 million investment fund.

Mileage will vary based on use case and architecture, but these are great starter options.

Because GPUs are highly specialized, they are not needed for every type of compute or all the time. That’s important when you’re thinking of how to manage this limited and expensive resource.

$$$$ - Invest in 1 NVDIA DGX appliance and set it up for a small team to have access to 8+ H100 GPUs and the full stack package - perfect for an in-house AI lab. Just slide it into the rack and go! You’ll rely on the NVIDIA CUDA libraries to access the GPUs. Multiply with more boxes. With this, you can roll your own AI solutions in a private data center and also make use of the best open source foundation models available. It goes without saying that this requires a level of expertise that goes beyond just data science. If you’re outfitted with infra engineers, data engineers, ML engineers, data scientists and other full-stack skills then chance of success is good.
$$ - Invest in a few high powered Macs (Mac Studio with M2 Ultra chip) and max out on memory ($2000+ each). You’ll rely on Apple’s CoreML and Metal libraries to access GPUs. More and more Python frameworks are supporting these GPUs. This is not powerful enough to train / retrain large foundation models unless you have a LOT of time and patience. However, it is fast enough to run inference using open source models. Also, this assumes you’re not using a PaaS/SaaS language model service (OpenAI, PaLM, etc.) since those are simply hosted APIs and don’t require high powered personal devices.
$ - Rent a GPU. A number of services are offering access to GPUs in the cloud and are metered with predictable costs. If you’re able to manage your usage, you’re reducing your upfront capital costs (and depreciation of technology!). A good listing of Cloud GPU service providers is available here. Your client (desktop/laptop) hardware doesn’t need to be high powered. You can scale up as your needs grow.
$0 - Try a Cloud GPU for FREE! Google Colab is an easy to use, quick-start online service on Google Cloud Platform (GCP) that uses a browser-based Notebook interface to support Python development with access to 1 GPU for free. It’s a great way to get started with minimal investment. Again, barrier to entry is low on the client hardware (desktop/laptop) since this is fully browser-based and cloud-native.

“He who controls the spice, controls the universe”

GPU are powering Generative AI

Don’t have hundreds or millions of dollars? No problem

The More You Know

`“He who controls the spice, controls the universe”`