Shortly after migrating several Spark batch pipelines from on-premises infrastructure to Azure Kubernetes Service (AKS), we began seeing repeated executor OOM failures in one of our larger jobs. The ...
Pinterest Engineering has significantly improved the reliability of its Apache Spark workloads, cutting out-of-memory (OOM) failures by 96% through a combination of improved observability, ...