Senior Cloud Dev Advocate @Google
Abdel Sghiouar is a senior Cloud Developer Advocate @Google Cloud. A co-host of the Kubernetes Podcast by Google and a CNCF Ambassador. His focused areas are GKE/Kubernetes, Service Mesh, and Serverless
As Kubernetes handles more complex batch and distributed ML workloads, the limits of default pod-by-pod scheduling become clear. Running these workloads efficiently often requires complex workarounds to manage resource contention.
In this session, Abdel and Lucy Sweet examine how Kubernetes scheduling primitives are adapting. We will cover the Workload and PodGroup APIs, explaining how they enable gang scheduling, a practical requirement for distributed jobs that rely on "all-or-nothing" placement to prevent resource deadlocks.
We will also look at lifecycle management via In-Place Pod Resizing. We will demonstrate how to dynamically adjust CPU and memory for pods to run without triggering restarts, which is crucial for applications with slow initialization times.
Attendees will get a factual look at the current state of these features and how engineers at Google and Uber evaluate them for large-scale environments.