Slack Outlines Four-Phase Journey to a Multi-Cloud AI Serving Platform
Slack's engineering blog details how its AI serving infrastructure evolved through four phases: from self-managed Amazon SageMaker in an escrow VPC, to Amazon Bedrock for reduced operational overhead, then a hybrid Bedrock Provisioned Throughput and On-Demand model to handle 10× traffic swings, and finally a multi-cloud architecture adding Google Cloud Vertex AI. The final setup introduced a provider-agnostic serving layer with secretless authentication, API normalization, unified observability, and intelligent routing based on latency and error metrics. Results include roughly 10% quality improvement on complex reasoning tasks and ~67% latency reduction for short prompts, alongside improved geographic failover and reduced single-provider dependency.