This talk will introduce Quix Streams, an open-source Python library for data-intensive workloads on Kafka.
We will discuss the unique problems that this library is designed to solve, and how it was shaped by the challenges building a Kafka-based solution for Formula 1 cars at McLaren—a solution that needed to process a colossal firehose of sensor data coming in at thousands of samples per second. We’ll also explain why we decided to combine a Kafka API approach with a stream processing library and provide developers with a familiar Pandas DataFrame-like interface.
You’ll also see the library in action with a sentiment analysis demo. In this demo, we’ll calculate sentiment scores for incoming messages in a demo chat app—all in real time, using the HuggingFace Transformer’s API. At the end, we will connect to Twitter streaming API to send a high volume of data into the pipeline to simulate this use case at scale.
You’ll see how the library can simplify tasks such as:
– Subscribing to topics, deserializing incoming messages into table rows
– Running calculations on a rolling window of messages
– Using memory states to apply different functions such as aggregation or filtering
– Automatically outputting the results of calculations into downstream topics
– Managing state without the hassle of checkpointing and queues
Also, we want to give everyone the opportunity to explore the library themselves. We’ll share the library’s GitHub repo and getting started tutorial and show attendees how they can get the sentiment analysis demo up and running in their own environments.