Training vs. Inference and GPU Demand

Everyone is trying to figure out what AI means for GPU demand.
It’s hard as the true picture is muddied by providers investing in their own customers, demand-pull forward, and strategic buying ahead of having a real use case (see Saudi, UAE, U.K.)
Confounding all this is Meta releasing Llama 2 for almost free, followed most recently by its coding version (by far the most useful application of AI so far).
This matters because training is a lot more GPU-intensive than inference. Free models mean less training needed. This specifically matters for Nvidia’s H100 chip (which by the way weigh over 30 kgs!).
Qualcomm actually thinks processing might happen right in our phones (they of course would benefit most from this).
“Eventually, a lot of the AI processing will move over to the device for several use cases. The advantages of doing it on the device are very straightforward. Cost, of course, is a massive advantage. It’s — in some ways, it’s sunk cost. You bought the device. It’s sitting there in your pocket. It could be processing at the same time when it’s sitting there. So that’s the first one. Second is latency. You don’t have to go back to the cloud, privacy and security, there’s data that’s user-specific that doesn’t need to go to the cloud when you’re running it on the device. But beyond all of these, we see a different set of use cases playing out on the device.” Qualcomm CFO Akash Palkhiwala (via The Transcript).