⚙️ 𝐋𝐋𝐌, 𝐈𝐍𝐅𝐄𝐑𝐄𝐍𝐂𝐄 & 𝐂𝐎𝐌𝐏𝐔𝐓𝐄 — 𝐖𝐡𝐚𝐭’𝐬 𝐖𝐡𝐚𝐭?

  • LLM (Large Language Model): The “brain” behind ChatGPT-like services—trained to predict and generate text/code/images.
  • Inference: When the model actually runs. Every response costs compute every single time.
  • Tokens: The unit of work (little chunks of text/data). More tokens → more compute → more cost.
  • Compute: Mostly GPUs + high-speed interconnect. Usage scales infrastructure, not just one-off training runs.

🌩 𝐖𝐡𝐞𝐫𝐞 𝐢𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬: 𝐂𝐥𝐨𝐮𝐝 & 𝐃𝐚𝐭𝐚 𝐂𝐞𝐧𝐭𝐞𝐫𝐬

AI-optimized data centers (hyperscalers + specialists) need:

  • Parallel GPU compute
  • Liquid cooling for dense racks
  • Fast networking/optics for token flow
  • Low latency for real-time apps

Capex is pouring in to expand this stack. The short version: it’s a compute arms race.


🤖 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 = 24/7 𝐔𝐬𝐞 → 24/7 𝐃𝐞𝐦𝐚𝐧𝐝

We’re moving from occasional prompts to autonomous agents:

  • Customer support bots running all day
  • Background AI coders and assistants
  • HR/scheduling/analytics workflows

Agents keep the model running even when humans aren’t typing—usage multiplies. That’s where demand can 10× over the next few years as enterprises embed AI into daily operations.


🔌 𝐏𝐨𝐰𝐞𝐫 𝐢𝐬 𝐭𝐡𝐞 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤

Inference is power-hungry:

  • GPUs draw ~700–1500W each
  • Billions of tokens/day = real electricity
  • 24/7 agents create base-load requirements

This is why you see:

  • Race toward direct liquid cooling
  • Data centers colocating near reliable power
  • Growing focus on efficiency per token

The AI boom is inseparable from an energy & cooling build-out.


💰 Who benefits (my lens)

I invest across the inference stack so returns scale with usage, not one-off builds:

  • GPUs / Accelerators: Nvidia (NVDA) – the compute engine that turns tokens into results.
  • Networking & Interconnect: Arista (ANET), Broadcom (AVGO), Marvell (MRVL) – low-latency switching, optics and interconnects that keep tokens moving.
  • Servers, Power & Cooling: Supermicro (SMCI), Vertiv (VRT) – high-density GPU servers plus power, cooling and racks that make them run 24/7.
  • Foundry & Memory: TSMC (TSM), Micron (MU) – advanced nodes and HBM/DRAM that feed the accelerators.
  • Design Tools (EDA): Cadence (CDNS) – software IP and tools that enable the next silicon cycles.
  • Cloud & GPU Providers / Platforms: Microsoft (MSFT), Amazon (AMZN), Oracle (ORCL), CoreWeave (CRWV) – productize GPUs into APIs, managed services and agent platforms.

Thesis: every new user, agent, or workflow burns tokens → consumes compute → draws power/cooling → rides cloud delivery. This portfolio owns the rails of that loop—so revenue grows with how much AI is used, not just how much is built.


📈 𝐖𝐡𝐚𝐭 𝐈 𝐞𝐱𝐩𝐞𝐜𝐭 𝐧𝐞𝐱𝐭

  • 2023–24: Model training & early rollouts
  • 2025–28: Enterprise usage at scale (support, ops, logistics, HR, content)
  • Every time an AI agent “thinks,” a token is processed, compute is consumed—and the stack monetizes.

That’s the thesis. That’s how I’m positioned.


✅ 𝐁𝐨𝐭𝐭𝐨𝐦 𝐥𝐢𝐧𝐞

I don’t just invest “in AI.” I invest in the picks & shovels of the inference economy.


➕ Follow my live moves


John Maeland – AI Infrastructure & Inference Investing
Weekly: portfolio moves, earnings takeaways, real signals in inference & infra.
Subscribe on the top-right to get it in your inbox.