⚙️ 𝐋𝐋𝐌, 𝐈𝐍𝐅𝐄𝐑𝐄𝐍𝐂𝐄 & 𝐂𝐎𝐌𝐏𝐔𝐓𝐄 — 𝐖𝐡𝐚𝐭’𝐬 𝐖𝐡𝐚𝐭?
- LLM (Large Language Model): The “brain” behind ChatGPT-like services—trained to predict and generate text/code/images.
- Inference: When the model actually runs. Every response costs compute every single time.
- Tokens: The unit of work (little chunks of text/data). More tokens → more compute → more cost.
- Compute: Mostly GPUs + high-speed interconnect. Usage scales infrastructure, not just one-off training runs.
🌩 𝐖𝐡𝐞𝐫𝐞 𝐢𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬: 𝐂𝐥𝐨𝐮𝐝 & 𝐃𝐚𝐭𝐚 𝐂𝐞𝐧𝐭𝐞𝐫𝐬
AI-optimized data centers (hyperscalers + specialists) need:
- Parallel GPU compute
- Liquid cooling for dense racks
- Fast networking/optics for token flow
- Low latency for real-time apps
Capex is pouring in to expand this stack. The short version: it’s a compute arms race.
🤖 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 = 24/7 𝐔𝐬𝐞 → 24/7 𝐃𝐞𝐦𝐚𝐧𝐝
We’re moving from occasional prompts to autonomous agents:
- Customer support bots running all day
- Background AI coders and assistants
- HR/scheduling/analytics workflows
Agents keep the model running even when humans aren’t typing—usage multiplies. That’s where demand can 10× over the next few years as enterprises embed AI into daily operations.
🔌 𝐏𝐨𝐰𝐞𝐫 𝐢𝐬 𝐭𝐡𝐞 𝐛𝐨𝐭𝐭𝐥𝐞𝐧𝐞𝐜𝐤
Inference is power-hungry:
- GPUs draw ~700–1500W each
- Billions of tokens/day = real electricity
- 24/7 agents create base-load requirements
This is why you see:
- Race toward direct liquid cooling
- Data centers colocating near reliable power
- Growing focus on efficiency per token
The AI boom is inseparable from an energy & cooling build-out.
💰 Who benefits (my lens)
I invest across the inference stack so returns scale with usage, not one-off builds:
- GPUs / Accelerators: Nvidia (NVDA) – the compute engine that turns tokens into results.
- Networking & Interconnect: Arista (ANET), Broadcom (AVGO), Marvell (MRVL) – low-latency switching, optics and interconnects that keep tokens moving.
- Servers, Power & Cooling: Supermicro (SMCI), Vertiv (VRT) – high-density GPU servers plus power, cooling and racks that make them run 24/7.
- Foundry & Memory: TSMC (TSM), Micron (MU) – advanced nodes and HBM/DRAM that feed the accelerators.
- Design Tools (EDA): Cadence (CDNS) – software IP and tools that enable the next silicon cycles.
- Cloud & GPU Providers / Platforms: Microsoft (MSFT), Amazon (AMZN), Oracle (ORCL), CoreWeave (CRWV) – productize GPUs into APIs, managed services and agent platforms.
Thesis: every new user, agent, or workflow burns tokens → consumes compute → draws power/cooling → rides cloud delivery. This portfolio owns the rails of that loop—so revenue grows with how much AI is used, not just how much is built.
📈 𝐖𝐡𝐚𝐭 𝐈 𝐞𝐱𝐩𝐞𝐜𝐭 𝐧𝐞𝐱𝐭
- 2023–24: Model training & early rollouts
- 2025–28: Enterprise usage at scale (support, ops, logistics, HR, content)
- Every time an AI agent “thinks,” a token is processed, compute is consumed—and the stack monetizes.
That’s the thesis. That’s how I’m positioned.
✅ 𝐁𝐨𝐭𝐭𝐨𝐦 𝐥𝐢𝐧𝐞
I don’t just invest “in AI.” I invest in the picks & shovels of the inference economy.
➕ Follow my live moves
- Copy my portfolio on eToro:
https://www.etoro.com/people/lordhumpe?utm_source=site&utm_medium=post&utm_campaign=copy_inference - Transparent stats on Bullaware:
https://bullaware.com/etoro/Lordhumpe?utm_source=site&utm_medium=post&utm_campaign=bullaware_inference
—
John Maeland – AI Infrastructure & Inference Investing
Weekly: portfolio moves, earnings takeaways, real signals in inference & infra.
Subscribe on the top-right to get it in your inbox.
Member discussion: