Agenda

Session
Use Case

Automating Real-Time Pipelines at Scale in DoorDash

Time: May 8, 1:00 PM - 1:45 PM
Location: Imperial Ballroom

At DoorDash we process 6 billion messages day across 2500 Real-Time pipelines in total. Our pipelines ingest data from mobile devices and internal services, stream-process the data with Apache Flink and Kafka before writing to Datalake(s). We use Trino, Pinot and other tools to query from Datalake(s).

Along the way, we have built a rich set of automation tools to maintain the lifecycle of these pipelines from provisioning to clean-up. The lessons learnt as our business and data-needs scale guide us in both improving our pipelines and the related-automation.

We hope to share our learnings with the broader Stream Processing community.