When does it make sense to run your own LLM inference infrastructure instead of paying per-token to third-party APIs like OpenAI or Anthropic? And how do you execute it once you’ve decided to?
I’m giving a talk and running a half-day hands-on workshop on the topic.
I’ll be updating this post as I do the talk and workshop with content from both. I’ll also write a concise summary post here once both are done.
The talk
Data Science Festival Big Birthday Bash 2026, 16th May 2026, London
Recording: coming soon…
The workshop
AI in Production 2026, 4-5th June 2026, Newcastle Upon Tyne
A hands-on afternoon workshop covering the decision framework for Third-party vs Self-host, applying it in some worked example LLM applications, and then getting hands-on with a deployment of an inference server using current leading open-source technologies.
Slides: see the talk slides above for now - I’ll upload further workshop resources here as they go live.
I don’t know if the workshop will be recorded and made publicly available, but I’ll put a link here if it is.
If you’re thinking about self-hosting, or just starting to grapple with leveraging AI internally in your org, drop me an email and I’d be happy to talk!