Past Events - Fall 2025

calendar

Día 1


Zero Trust Day

Read abstract

arrow

Opening Remarks, KubeCrash Fall 2025

anchor

The key to good platform engineering is reliability, but you can't provide that stability if you have a toothpick in your Jenga tower. We'll talk about avoiding the migration trap, reducing tech debt in core systems, and multimodal approaches to creating a strong and stable platform your organization can rely on today and in the future.

Lisa Shissler Smith

Read abstract

arrow

Keynote: How Solid is Your Platform?

anchor

Communication barriers exclude millions of people from fully participating in everyday interactions. For the deaf and hard-of-hearing community, the absence of scalable, real-time sign language interpretation remains a persistent challenge. In this session, we will demonstrate a forward-looking AI-powered application that translates sign language into spoken language, deployed and orchestrated on Kubernetes. This application leverages generative AI (LxMs) to scale for multiple users, representing a step toward a future where communication is accessible to all.

Using the sign language translation use case, the session will demonstrate how Kubernetes is well-positioned to support AI workloads, how it optimizes cluster resources for video and language processing, and how it integrates seamlessly with generative AI use-cases.

Rob Koch

Read abstract

arrow

Empowering Accessibility Through Kubernetes: The Future of Real-Time Sign Language Interpretation

anchor

As platform engineering continues to gain traction, ensuring the security of the platform is becoming increasingly critical. This panel will delve into the complexities of securing modern platforms, from the vulnerabilities inherent in the supply chain to the threats that emerge at runtime. Our expert panelists will share their insights on the latest security challenges, best practices, and strategies for protecting the platform, its components, and its users. The discussion will cover key topics such as securing dependencies, managing vulnerabilities, implementing robust access controls, and monitoring runtime threats. By exploring the intersection of platform engineering and security, this panel aims to provide a comprehensive understanding of the measures needed to safeguard the platform and ensure the trust and reliability of the services it delivers.

Julia Furst Morgado, Pronomita Dey, Nicol Daňková, Eddie Wassef

Read abstract

arrow

Panel: Securing the Platform: From Supply Chain to Runtime

anchor

The promise of containers is that they can "run anywhere," but for platform teams, the reality is often "escape anywhere" due to the shared host kernel. Standard containerization is not isolation, leaving your platform vulnerable to security breaches and performance degradation.

This session moves beyond the basics of containerization to provide a practical guide to workload isolation. We will dissect the spectrum of isolation technologies—from Linux primitives like namespaces, cgroups, and seccomp to stronger boundaries like microVMs (e.g., Firecracker) and confidential computing. We’ll also look at how Apple’s Containerization Framework has made strong container isolation possible in development environments.

Finally, we’ll dive into the next frontier: GPU isolation for AI workloads and discuss why sharing powerful hardware creates new, significant risks and how to architect a robust, efficient, and truly multi-tenant platform for the era of AI. You’ll come away with an understanding of the real-world tradeoffs of each approach for your platform's security, performance, and operational goals.

Marina Moore

Read abstract

arrow

Practical Guide to Container Isolation

anchor

The "sidecar vs. sidecarless" debate in service meshes is an important one. It's an architectural choice that profoundly impacts how your mesh handles proxy sharing and multi-tenancy. While sidecars offer advantages like no application code changes, language independence, and clear operational models, they also present trade-offs such as potential resource overhead. Depending on what you put into the sidecar, they may turn into side-trucks. Understanding these nuances is key to selecting the right mesh for your environment.

In their session, Marino Wijay, Kong, and William Rizzo, Mirantis and Linkerd Ambassador, will walk you through the different service mesh architectures, benefits, and trade-offs. As you'll see, the core distinction lies in proxy sharing. Join this session to learn more and make the right service mesh decisions for your use case.

Marino Wijay and William Rizzo

Read abstract

arrow

To Sidecar or Not to Sidecar: A Practical Guide to Selecting Your Service Mesh

anchor

AI promises to transform how we build, run, and scale cloud native platforms—but for platform engineers, the real question is: what should you actually automate? With so many AI tools, services, and frameworks available, deciding where to start can be overwhelming. Should you use AI for incident response? Cost optimization? Developer experience? And what does success even look like?

There are so many agentic frameworks which are great to help you build the agentic applications. But taking them into production is a real challenge for the platform engineering teams. Because agents operate on the behalf of humans, what kind of guardrails should be created so that they do not break the existing systems or able to recover from the breaks gracefully.

This panel explores how platform engineers are integrating AI into their workflows today—what’s working, what’s not, and how they’re evaluating tradeoffs. We’ll unpack the hype, share end user stories and discuss how platform teams can strategically adopt AI without losing sight of long-term goals. If you're trying to figure out where AI fits in your platform strategy, this session is for you.

Kaslin Fields, Liana Anca Tomescu, Annie Talvasto, Arun Gupta

Read abstract

arrow

Panel: Navigating AI’s Role in Platform Engineering

anchor

Are you running AI or ML on Kubernetes? Is everything really working… or is it just running while silently failing behind the scenes?

AI and ML pipelines are being increasingly deployed in cloud-native environments, and people are paying a lot of attention to the mechanics of running them. But what comes after getting your huge training workloads running and your model queries properly load-balanced? There’s a whole class of operational challenges that’s not getting talked about very much: how do you know that everything is really working correctly?

Join us for a look into AI observability! In this session, we’ll talk about the metrics that are usually available and what they really mean, what you need to know to use them to spot problems, and how you can use modern cloud-native tooling for a clean AI-centric metrics pipeline. We’ll wrap up with a demo showing these techniques live in the wild so that attendees can walk away with useful, practical insights into the brave new AI world!

Flynn, Chris Khanoyan

Read abstract

arrow

Seeing Clearly into Your AI/ML Black Box

anchor

Over the past two decades, DevOps has helped bridge the gap between development and operations, unlocking faster delivery cycles and more reliable software systems. But today, as enterprises scale to thousands of applications and teams, DevOps as we know it is reaching its limits. The future is not about asking every developer to become an infrastructure expert, it’s about industrializing how software is built and operated.

Platform engineering is that industrialization moment. Just as the factory floor revolutionized manufacturing with assembly lines and specialized machinery, platform engineering provides the “factory floor” for modern software. Through opinionated platforms and curated golden paths, enterprises empower developers to self-serve infrastructure and deploy software securely, without needing to understand the complexities beneath. The platform becomes the assembly line: automated, standardized, and reliable.

In this talk, we’ll explore why platform engineering is replacing DevOps as the defining operating model of enterprise software. We’ll look at how golden paths, internal developer platforms, and self-service abstractions enable organizations to scale innovation without scaling cognitive load. And we’ll ask: what happens when software finally achieves its industrialization moment?

Luca Galante

Read abstract

arrow

Lightning Talk: The future (and industrialization moment) of DevOps

anchor

Running inference on streaming data is complex. Teams must connect to event sources, process streams, and scale inference with minimal overhead. Numaflow, a Kubernetes-native open source platform, simplifies this by allowing teams to do stream processing and inference in the same pipelines natively on Kubernetes. It integrates with Kafka, Pulsar, SQS, and more to support diverse event sources and scales to handle high-throughput workloads. Numaflow (created by the creators of Argo Project) is currently in use in by large enterprises.

This talk shows how Numaflow enables ML engineers to achieve scalable stream processing and inference on streaming data without the infrastructure burden.

Krithika Vijayakumar, Sri Harsha Yayi

Read abstract

arrow

Numaflow: Kubernetes-Native Platform for Inference on Streaming Data

anchor

Danielle Cook, Lisa-Marie Namphy

Read abstract

arrow

Closing Remarks, KubeCrash 2025

anchor