Agenda

Session
Use Case

Real-Time ML Model Monitoring with Datasketches and Apache Pinot at Uber

Time: May 8, 10:15 AM - 11:00 AM
Location: Imperial Ballroom

In this tech talk we would like to present an overview of how, at Uber, we leverage Pinot as a datasketch store for ML model monitoring use case. We also would like to highlight the power of Apache Pinot's aggregation system and capabilities of recently introduced datasketch functions.

Problem and solution: Machine Learning life cycle operates with large-scale data across different storage systems, both online and offline. ML practitioners need to monitor and debug the data quality and distribution for each data set and also cross-compare different data sources. However, it is challenging to have a monitoring application that directly works with raw data from different systems. As a solution, we do data profiling for those datasets, and store the profiling results as data sketches inside Pinot. We leverage Pinot to achieve scalable storage and low-latency queries for data sketches so that we can enable both continuous ML monitoring and adhoc UI based debugging experience. To enable this, we integrated Apache Datasketches into Pinot, particularly focusing on KLL, CPC and Frequent Items sketches, and it clearly shows performance advantages compared to alternative storage solutions like Druid and MySQL.

Join us for a deep dive into optimizing real-time analytics in ML applications with Apache Pinot.