Disseminate

George Theodorakis | Scabbard: Single-Node Fault-Tolerant Stream Processing | #12

Season 2, Ep. 2

•

Monday, November 21, 2022

Summary (VLDB abstract):

Single-node multi-core stream processing engines (SPEs) can process hundreds of millions of tuples per second. Yet making them fault-tolerant with exactly-once semantics while retaining this performance is an open challenge: due to the limited I/O bandwidth of a single-node, it becomes infeasible to persist all stream data and operator state during execution. Instead, single-node SPEs rely on upstream distributed systems, such as Apache Kafka, to recover stream data after failure, necessitating complex clusterbased deployments. This lack of built-in fault-tolerance features has hindered the adoption of single-node SPEs. We describe Scabbard, the frst single-node SPE that supports exactly-once fault-tolerance semantics despite limited local I/O bandwidth. Scabbard achieves this by integrating persistence operations with the query workload. Within the operator graph, Scabbard determines when to persist streams based on the selectivity of operators: by persisting streams after operators that discard data, it can substantially reduce the required I/O bandwidth. As part of the operator graph, Scabbard supports parallel persistence operations and uses markers to decide when to discard persisted data. The persisted data volume is further reduced using workload-specifc compression: Scabbard monitors stream statistics and dynamically generates computationally efcient compression operators. Our experiments show that Scabbard can execute stream queries that process over 200 million tuples per second while recovering from failures with sub-second latencies.

Questions:

Can start off by explaining what stream processing is and its common use cases?
How did you end up researching in this area?
What is Scabbard?
Can you explain the differences between single-node and distributed SPEs?
What are the advantages of single-node SPEs?
What are the pitfalls that have limited single-node SPEs adoption?
What were your design goals when developing Scabbard?
What is the key idea underpinning Scabbard?
In the paper you state there are 3 main contributions in Scabbard can you talk us through each one;
How did you implement Scabbard? Give an overview of architecture?
What was your approach to evaluating Scabbard? What were the questions you were trying to answer?
What did you compare Scabbard against? What was the experimental set up?
What were the key results?
Are there any situations when Scabbard’s performance is sub-optimal? What are the limitations?
Is Scabbard publicly available?
As a software developer how do I interact with Scabbard?
What are the most interesting and perhaps unexpected lessons that you have learned while working on Scabbard?
Progress in research is non-linear, from the conception of the idea for Scabbard to the publication, were there things you tried that failed?
What do you have planned for future research with Scabbard?
Can you tell the listeners about your other research?
How do you approach idea generation and selecting projects?
What do you think is the biggest challenge in your research area now?
What’s the one key thing you want listeners to take away from your research?

Links:

More episodes

View all episodes

2. High Impact in Databases with... Ryan Marcus
59:52
Welcome the first episode of the High Impact series! The High Impact series is inspired by a blog post “Most Influential Database Papers" by Ryan Marcus and today we talk to Ryan! Tune in to hear about Ryan's story so far. We chat about his current work before moving on to discuss his most impactful work. We also dig into what motivates him and how he handles setbacks, as well as getting his take on the current trends. The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust. Links: Most influential database papersRyan's websiteRyan's twitter/XBao: Making Learned Query Optimization PracticalNeo: A Learned Query Optimizer
12. Yazhuo Zhang | SIEVE is Simpler than LRU | #52
43:10
In this episode, we explore the world of caching with Yazhuo Zhang, who introduces the game-changing SIEVE algorithm. Traditional eviction algorithms have long struggled with a trade-off between efficiency, throughput, and simplicity. However, SIEVE disrupts this balance by offering a simpler alternative to LRU while outperforming state-of-the-art algorithms in both efficiency and scalability for web cache workloads. Implemented in five production cache libraries with minimal code changes, SIEVE's superiority shines through in a comprehensive evaluation across 1559 cache traces. With up to a remarkable 63.2% lower miss ratio than ARC and surpassing nine other algorithms in over 45% of cases, SIEVE's simplicity doesn't compromise on scalability, doubling throughput compared to optimized LRU implementations. Join us as Yazhuo reveals how SIEVE is set to redefine caching efficiency, promising faster and more streamlined data serving in production systems.Links:SIEVE is Simpler than LRU: an Efficient Turn-Key Eviction Algorithm for Web Caches (NSDI'24)FIFO Queues are All You Need for Cache Eviction (SOSP'23)Yazhuo's homepageYazhuo's LinkedInYazhuo's Twitter/XCachemon/SIEVE's websiteS3FIFO website
1. Introducing the High Impact Series...
02:40
Introducing the High Impact Series! Hey folks, we have a new series coming soon inspired by a blog post “Most Influential Database Papers" by Ryan Marcus. The series will feature interviews with the authors of some of the most impactful work in the field of databases. We will talk about the story behind some of their most impactful work, getting them to reflect on the impact it has had over years, as well as getting their take on the current trends in the field. Proudly sponsored by Pometry
11. Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51
38:42
In this episode, we talk to Eleni Zapridou and delve into the challenges of data processing within enterprises, where multiple applications operate concurrently on shared resources. Traditional resource boundaries between applications often lead to increased costs and resource consumption. However, as Eleni explains the principle of functional isolation offers a solution by combining cross-task optimizations with performance isolation. We explore GroupShare, an innovative strategy that reduces CPU consumption and query latency, transforming data processing efficiency. Join us as we discuss the implications of functional isolation with Eleni and its potential to revolutionize enterprise data processing.Links:CIDR'24 PaperEleni's TwitterEleni's LinkedIn
10. Pat Helland | Scalable OLTP in the Cloud: What’s the BIG DEAL? | #50
01:20:03
In this thought-provoking podcast episode, we dive into the world of scalable OLTP (OnLine Transaction Processing) systems with the insightful Pat Helland. As a seasoned expert in the field, Pat shares his insights on the critical role of isolation semantics in the scalability of OLTP systems, emphasizing its significance as the "BIG DEAL." By examining the interface between OLTP databases and applications, particularly through the lens of RCSI (READ COMMITTED SNAPSHOT ISOLATION) SQL databases, Pat talks about the limitations imposed by current database architectures and application patterns on scalability.Through a compelling thought experiment, Pat explores the asymptotic limits to scale for OLTP systems, challenging the status quo and envisioning a reimagined approach to building both databases and applications that empowers scalability while adhering to established to RCSI. By shedding light on how today's popular databases and common app patterns may unnecessarily hinder scalability, Pat sparks discussions within the database community, paving the way for new opportunities and advancements in OLTP systems. Join us as we delve into this conversation with Pat Helland, where every insight shared could potentially catalyze significant transformations in the realm of OLTP scalability.Papers mentioned during the episode:Scalable OLTP in the Cloud: What’s the BIG DEAL?Autonomous ComputingDecoupled TransactionsDon't Get Stuck in the "Con" GameThe Best Place to Build a SubwayBuilding on QuicksandSide effects, front and centerImmutability changes everythingIs Scalable OLTP in the Cloud a solved problem?You can find Pat on:Twitter/XLinkedInScattered Thoughts on Distributed Systems
9. Rui Liu | Towards Resource-adaptive Query Execution in Cloud Native Databases | #49
53:52
In this episode, we talk to Rui Liu and explore the transformative potential of Ratchet, a groundbreaking resource-adaptive query execution framework. We delve into the challenges posed by ephemeral resources in modern cloud environments and the innovative solutions offered by Ratchet. Rui guides us through the intricacies of Ratchet's design, highlighting its ability to enable adaptive query suspension and resumption, sophisticated resource arbitration for diverse workloads, and a fine-grained pricing model to navigate fluctuating resource availability. Join us as we uncover the future of cloud-native databases and workloads, and discover how Ratchet is poised to revolutionize the way we harness the power of dynamic cloud resources.Links:CIDR'24 PaperRui's LinkedIn Rui's Twitter/XRui's HomepageYou can find links to all Rui's work from his Google Scholar profile.
8. Yifei Yang | Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries | #48
47:37
In this episode, Yifei Yang introduces predicate transfer, a revolutionary method for optimizing join performance in databases. Predicate transfer builds on Bloom joins, extending its benefits to multi-table joins. Inspired by Yannakakis's theoretical insights, predicate transfer leverages Bloom filters to achieve significant speed improvements. Yang's evaluation shows an average 3.3× performance boost over Bloom join on the TPC-H benchmark, highlighting the potential of predicate transfer to revolutionize database query optimization. Join us as we explore the transformative impact of predicate transfer on database operations.Links:CIDR'24 PaperYifei's LinkedInBuy Me A CoffeeListener Survey
7. Vikramank Singh | Panda: Performance Debugging for Databases using LLM Agents | #47
01:08:12
In this episode, Vikramank Singh introduces the Panda framework, aimed at refining Large Language Models' (LLMs) capability to address database performance issues. Vikramank elaborates on Panda's four components—Grounding, Verification, Affordance, and Feedback—illustrating how they collaborate to contextualize LLM responses and deliver actionable recommendations. By bridging the divide between technical knowledge and practical troubleshooting needs, Panda has the potential to revolutionize database debugging practices, offering a promising avenue for more effective and efficient resolution of performance challenges in database systems. Tune in to learn more! Links:CIDR'24 PaperVikramank's LinkedIn
6. Tamer Eldeeb | Chablis: Fast and General Transactions in Geo-Distributed Systems | #46
01:02:27
In this episode, Tamer Eldeeb sheds light on the challenges faced by geo-distributed database management systems (DBMSes) in supporting strictly-serializable transactions across multiple regions. He discusses the compromises often made between low-latency regional writes and restricted programming models in existing DBMS solutions. Tamer introduces Chablis, a groundbreaking geo-distributed, multi-versioned transactional key-value store designed to overcome these limitations.Chablis offers a general interface accommodating range and point reads, along with writes within multi-step strictly-serializable ACID transactions. Leveraging advancements in low-latency datacenter networks and innovative DBMS designs, Chablis eliminates the need for compromises, ensuring fast read-write transactions with low latency within a single region, while enabling global strictly-serializable lock-free snapshot reads. Join us as we explore the transformative potential of Chablis in revolutionizing the landscape of geo-distributed DBMSes and facilitating seamless transactional operations across distributed environments.CIDR'24 Chablis PaperOSDI'23 Chardonnay paperTamer's Linkedin

Share

Disseminate

George Theodorakis | Scabbard: Single-Node Fault-Tolerant Stream Processing | #12

More episodes

View all episodes

2. High Impact in Databases with... Ryan Marcus

12. Yazhuo Zhang | SIEVE is Simpler than LRU | #52

1. Introducing the High Impact Series...

11. Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51

10. Pat Helland | Scalable OLTP in the Cloud: What’s the BIG DEAL? | #50

9. Rui Liu | Towards Resource-adaptive Query Execution in Cloud Native Databases | #49

8. Yifei Yang | Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries | #48

7. Vikramank Singh | Panda: Performance Debugging for Databases using LLM Agents | #47

6. Tamer Eldeeb | Chablis: Fast and General Transactions in Geo-Distributed Systems | #46