Understanding Inference: The Next Wave of AI Growth

⚙️ 𝐋𝐋𝐌, 𝐈𝐍𝐅𝐄𝐑𝐄𝐍𝐂𝐄 & 𝐂𝐎𝐌𝐏𝐔𝐓𝐄 — 𝐖𝐡𝐚𝐭’𝐬 𝐖𝐡𝐚𝐭?

LLM (Large Language Model): The “brain” behind ChatGPT-like services—trained to predict and generate text/code/images.
Inference: When the model actually runs. Every response costs compute every single time.
Tokens: The unit of work (little chunks of text/data). More tokens → more compute → more cost.
Compute: Mostly GPUs + high-speed interconnect. Usage scales infrastructure, not just one-off training runs.

🌩 𝐖𝐡𝐞𝐫𝐞 𝐢𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬: 𝐂𝐥𝐨𝐮𝐝 & 𝐃𝐚𝐭𝐚 𝐂𝐞𝐧𝐭𝐞𝐫𝐬

AI-optimized data centers (hyperscalers + specialists) need:

Parallel GPU compute
Liquid cooling for dense racks
Fast networking/optics for token flow
Low latency for real-time apps

Capex is pouring in to expand this stack. The short version: it’s a compute arms race.

🤖 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 = 24/7 𝐔𝐬𝐞 → 24/7 𝐃𝐞𝐦𝐚𝐧𝐝

We’re moving from occasional prompts to autonomous agents:

Customer support bots running all day
Background AI coders and assistants
HR/scheduling/analytics workflows

Agents keep the model running even when humans aren’t typing—usage multiplies. That’s where demand can 10× over the next few years as enterprises embed AI into daily operations.

🔌 𝐏𝐨𝐰𝐞𝐫 𝐢𝐬 𝐭𝐡𝐞 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤

Inference is power-hungry:

GPUs draw ~700–1500W each
Billions of tokens/day = real electricity
24/7 agents create base-load requirements

This is why you see:

Race toward direct liquid cooling
Data centers colocating near reliable power
Growing focus on efficiency per token

The AI boom is inseparable from an energy & cooling build-out.

💰 Who benefits (my lens)

I invest across the inference stack so returns scale with usage, not one-off builds:

GPUs / Accelerators: Nvidia (NVDA) – the compute engine that turns tokens into results.
Networking & Interconnect: Arista (ANET), Broadcom (AVGO), Marvell (MRVL) – low-latency switching, optics and interconnects that keep tokens moving.
Servers, Power & Cooling: Supermicro (SMCI), Vertiv (VRT) – high-density GPU servers plus power, cooling and racks that make them run 24/7.
Foundry & Memory: TSMC (TSM), Micron (MU) – advanced nodes and HBM/DRAM that feed the accelerators.
Design Tools (EDA): Cadence (CDNS) – software IP and tools that enable the next silicon cycles.
Cloud & GPU Providers / Platforms: Microsoft (MSFT), Amazon (AMZN), Oracle (ORCL), CoreWeave (CRWV) – productize GPUs into APIs, managed services and agent platforms.

Thesis: every new user, agent, or workflow burns tokens → consumes compute → draws power/cooling → rides cloud delivery. This portfolio owns the rails of that loop—so revenue grows with how much AI is used, not just how much is built.

📈 𝐖𝐡𝐚𝐭 𝐈 𝐞𝐱𝐩𝐞𝐜𝐭 𝐧𝐞𝐱𝐭

2023–24: Model training & early rollouts
2025–28: Enterprise usage at scale (support, ops, logistics, HR, content)
Every time an AI agent “thinks,” a token is processed, compute is consumed—and the stack monetizes.

That’s the thesis. That’s how I’m positioned.

✅ 𝐁𝐨𝐭𝐭𝐨𝐦 𝐥𝐢𝐧𝐞

I don’t just invest “in AI.” I invest in the picks & shovels of the inference economy.

➕ Follow my live moves

Copy my portfolio on eToro:
https://www.etoro.com/people/lordhumpe?utm_source=site&utm_medium=post&utm_campaign=copy_inference
Transparent stats on Bullaware:
https://bullaware.com/etoro/Lordhumpe?utm_source=site&utm_medium=post&utm_campaign=bullaware_inference

—
John Maeland – AI Infrastructure & Inference Investing
Weekly: portfolio moves, earnings takeaways, real signals in inference & infra.
Subscribe on the top-right to get it in your inbox.

🧠 𝐔𝐍𝐃𝐄𝐑𝐒𝐓𝐀𝐍𝐃𝐈𝐍𝐆 𝐈𝐍𝐅𝐄𝐑𝐄𝐍𝐂𝐄