At DoorDash we process 6 billion messages day across 2500 Real-Time pipelines in total. Our pipelines ingest data from mobile devices and internal services, stream-process the data with Apache Flink and Kafka before writing to Datalake(s). We use Trino, Pinot and other tools to query from Datalake(s).
Along the way, we have built a rich set of automation tools to maintain the lifecycle of these pipelines from provisioning to clean-up. The lessons learnt as our business and data-needs scale guide us in both improving our pipelines and the related-automation.
We hope to share our learnings with the broader Stream Processing community.