Disseminate

Matthias Jasny | P4DB - The Case for In-Network OLTP | #10

Season 1, Ep. 10

•

Monday, August 8, 2022

Summary:

In this episode Matthias Jasny from TU Darmstadt talks about P4DB, a database that uses a programmable switch to accelerate OLTP workloads. The main idea of P4DB is that it implements a transaction processing engine on top of a P4-programmable switch. The switch can thus act as an accelerator in the network, especially when it is used to store and process hot (contended) tuples on the switch. P4DB provides significant benefits compared to traditional DBMS architectures and can achieve a speedup of up to 8x.

Questions:

0:55: Can you set the scene for your research and describe the motivation behind P4DB?

1:42: Can you describe to listeners who may not be familiar with them, what exactly is a programmable switch?

3:55: What are the characteristics of OLTP workloads that make them a good fit for programmable switches?

5:33: Can you elaborate on the key idea of P4DB?

6:46: How do you go about mapping the execution of transactions to the architecture of a programmable switch?

10:13: Can you walk us through the lifecycle of a switch transaction?

11:04: How does P4DB determine the optimal tuple placement on the switch?

12:16: Is this allocation static or is it dynamic, can the tuple order be changed at runtime?

12:55: What happens if a transaction needs to access tuples in a different order then that laid out on the switch?

14:11: Obviously you can’t fit all data on the switch, only the hot data, how does P4DB execute transactions that access some hot and some cold data that’s not on the switch?

16:04: How did you evaluate P4DB? What are the results?

18:28: What was the magnitude of the speed up in the scenarios in which P4DB showed performance gains?

19:29: Are there any situations in which P4DB performs non-optimally and what are the workload characteristics of these situations?

20:36: How many tuples can you get on a switch?

21:23: Where do you see your results being useful? Who will find them the most relevant?

21:57: Across your time working on P4DB, what are the most interesting, perhaps unexpected, lessons that you learned?

22:39: That leads me into my next question, what were the things you tried while working on P4DB that failed? Can you give any words of advice to people who might work with programmable switches in the future?

23:24: What do you have planned for future research?

24:24: Is P4DB publically available?

24:53: What attracted you to this research area?

25:42: What’s the one key thing you want listeners to take away from your research and your work on P4DB?

Links:

More episodes

View all episodes

29. Mateusz Gienieczko | AnyBlox: A Framework for Self-Decoding Datasets | #69
01:02:28| Tuesday, March 17, 2026|Season 6, Ep. 29
In this episode of Disseminate: The Computer Science Research Podcast, host Dr. Jack Waudby is joined by Mateusz Gienieczko, PhD researcher at TU Munich and co-author of the VLDB Best Paper Award winning paper AnyBlox.They dive deep into a fundamental problem in modern data systems: why cutting-edge data encodings and file formats rarely make it from research into real-world systems — and how AnyBlox proposes a radical solution.Mateusz explains the core idea of self-decoding data, where datasets ship with their own portable, sandboxed decoders, allowing any database system to read any encoding safely and efficiently. Built on WebAssembly, AnyBlox bridges the long-standing gap between database research and practice without sacrificing performance, portability, or security.This episode is essential listening for database researchers, data engineers, system builders, and industry practitioners interested in the future of data formats, analytics performance, and making research matter in practiceLinks:Paper: https://www.vldb.org/pvldb/vol18/p4017-gienieczko.pdfGitHub: https://github.com/AnyBloxMat's Homepage: https://v0ldek.com/
28. Xiangyao Yu | Disaggregation: A New Architecture for Cloud Databases | #68
42:12| Thursday, November 27, 2025|Season 6, Ep. 28
In this episode of Disseminate: The Computer Science Research Podcast, host Jack Waudby sits down with Xiangyao Yu (UW–Madison), one of the leading voices shaping the next generation of cloud-native databases.We dive deep into disaggregation — the architectural shift transforming how modern data systems are built. Xiangyao breaks down:Why traditional shared-nothing databases struggle in cloud environmentsHow separating compute and storage unlocks elasticity, scalability, and cost efficiencyThe evolution of disaggregated systems, from Aurora and Snowflake through to advanced pushdown processing and new modular servicesHis team's research on reinventing core protocols like 2-phase commit for cloud-native environmentsReal-time analytics, HTAP challenges, and the Hermes architectureWhere disaggregation goes next — indexing, query optimizers, materialized views, multi-cloud architectures, and moreWhether you're a database engineer, researcher, or a practitioner building scalable cloud systems, this episode gives a clear, accessible look into the architecture that’s rapidly becoming the default for modern data platforms.Links:Xiangyao Yu's HomepageDisaggregation: A New Architecture for Cloud Databases [VLDB'25]
27. Navid Eslami | Diva: Dynamic Range Filter for Var-Length Keys and Queries | #67
46:50| Thursday, November 13, 2025|Season 6, Ep. 27
In this episode of Disseminate: The Computer Science Research Podcast, Jack sits down with Navid Eslami, PhD researcher at the University of Toronto, to discuss his award-winning paper “DIVA: Dynamic Range Filter for Variable Length Keys and Queries”, which earned Best Research Paper at VLDB.Navid breaks down how range filters extend the power of traditional filters for modern databases and storage systems, enabling faster queries, better scalability, and theoretical guarantees. We dive into:How DIVA overcomes the limitations of existing range filtersWhat makes it the “holy grail” of filtering for dynamic dataReal-world integration in WiredTiger (the MongoDB storage engine)Future challenges in data distribution smoothing and hybrid filteringWhether you're a database engineer, systems researcher, or student exploring data structures, this episode reveals how cutting-edge research can transform how we query, filter, and scale modern data systems.Links:Diva: Dynamic Range Filter for Var-Length Keys and Queries [VLDB'25]Diva on GitHubNavid's LinkedIn
15. Adaptive Factorization in DuckDB with Paul Groß
51:15| Thursday, November 6, 2025|Season 7, Ep. 15
In this episode of the DuckDB in Research series, host Jack Waudby sits down with Paul Groß, PhD student at CWI Amsterdam, to explore his work on adaptive factorization and worst-case optimal joins - techniques that push the boundaries of analytical query performance.Paul shares insights from his CIDR'25 paper “Adaptive Factorization Using Linear Chained Hash Tables”, revealing how decades of database theory meet modern, practical system design in DuckDB. From hash table internals to adaptive query planning, this episode uncovers how research innovations are becoming part of real-world systems.Whether you’re a database researcher, engineer, or curious student, you’ll come away with a deeper understanding of query optimization and the realities of systems engineering.Links:Adaptive Factorization Using Linear-Chained Hash Tables
14. Parachute: Rethinking Query Execution and Bidirectional Information Flow in DuckDB - with Mihail Stoian
36:34| Thursday, October 30, 2025|Season 7, Ep. 14
In this episode of the DuckDB in Research series, host Jack Waudby sits down with Mihail Stoian, PhD student at the Data Systems Lab, University of Technology Nuremberg, to unpack the cutting-edge ideas behind Parachute, a new approach to robust query processing and bidirectional information passing in modern analytical databases.We explore how Parachute bridges theory and practice, combining concepts from instance-optimal algorithms and semi-join filtering to boost performance in DuckDB, the in-process analytical SQL engine that’s reshaping how research meets real-world data systems.Mihail discusses:How Parachute extends semi-join filtering for two-way information flowThe challenges of implementing research ideas inside DuckDBPractical performance gains on TPC-H and CEB workloadsThe future of adaptive query processing and research-driven system designWhether you're a database researcher, systems engineer, or curious practitioner, this deep-dive reveals how academic innovation continues to shape modern data infrastructure.Links:Parachute: Single-Pass Bi-Directional Information Passing VLDB 2025 PaperMihail's homepageParachute's Github repo
13. Anarchy in the Database: Abigale Kim on DuckDB and DBMS Extensibility
46:24| Thursday, October 23, 2025|Season 7, Ep. 13
In this episode of the DuckDB in Research series, host Jack Waudby talks with Abigale Kim, PhD student at the University of Wisconsin–Madison and author of VLDB 2025 paper: “Anarchy in the Database: A Survey and Evaluation of DBMS Extensibility”. They explore how database extensibility is reshaping modern data systems — and why DuckDB is emerging as the gold standard for safe, flexible, and high-performance extensions. Abigale shares the inside story of her research, the surprises uncovered when testing Postgres and DuckDB extensions, and what’s next for extensibility and composable database design.This episode is perfect for researchers, practitioners, and students interested in databases, systems design, and the interplay between academia and industry innovation.Highlights:What “extensibility” really means in a DBMSHow DuckDB compares to Postgres, MySQL, and RedisThe rise of GPU-accelerated DuckDB extensionsWhy bridging research and engineering matters for the future of databasesLinks:Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility VLDB 2025Rethinking Analytical Processing in the GPU EraYou can find Abigale at:XBlueskyPersonal site
12. Recursive CTEs, Trampolines, and Teaching Databases with DuckDB - with Prof. Torsten Grust
51:05| Thursday, October 16, 2025|Season 7, Ep. 12
In this episode of the DuckDB in Research series, host Dr Jack Waudby talks with Professor Torsten Grust from the University of Tübingen. Torsten is one of the pioneers behind DuckDB’s implementation of recursive CTEs.In the episode they unpack:The power of recursive CTEs and how they turn SQL into a full-fledged programming language.The story behind adding recursion to DuckDB, including the using key feature and the trampoline and TTL extensions emerging from Torsten’s lab.How these ideas are transforming research, teaching, and even DuckDB’s internal architecture.Why DuckDB makes databases exciting again — from classroom to cutting-edge systems research.If you’re into data systems, query processing, or bridging research and practice, this episode is for you.Links:USING KEY in Recursive CTEsHow DuckDB is USING KEY to Unlock Recursive Query PerformanceTrampoline-Style Queries for SQLU Tübingen Advent of codeA Fix for the Fixation on FixpointsOne WITH RECURSIVE is Worth Many GOTOsTorsten's homepageTorsten's X
11. DuckDB in Research S2 Coming Soon!
02:06| Thursday, October 16, 2025|Season 7, Ep. 11
Hey folks! The DuckDB in Research series is back for S2!In this season we chat with:Torsten Grust: Recursive CTEsAbigale Kim: Anarchy in the DatabaseMihail Stoian: Parachute: Single-Pass Bi-Directional Information PassingPaul Gross: Adaptive Factorization Using Linear-Chained Hash TablesWhether you're a researcher, engineer, or just curious about the intersection of databases and innovation we are sure you will love this series.
26. Rohan Padhye & Ao Li | Fray: An Efficient General-Purpose Concurrency JVM Testing Platform | #66
58:45| Monday, October 6, 2025|Season 6, Ep. 26
In this episode of Disseminate: The Computer Science Research Podcast, guest host Bogdan Stoica sits down with Ao Li and Rohan Padhye (Carnegie Mellon University) to discuss their OOPSLA 2025 paper: "Fray: An Efficient General-Purpose Concurrency Testing Platform for the JVM".We dive into:Why concurrency bugs remain so hard to catch -- even in "well-tested" Java projects.The design of Fray, a new concurrency testing platform that outperforms prior tools like JPF and rr.Real-world bugs discovered in Apache Kafka, Lucene, and Google Guava.The gap between academic research and industrial practice, and how Fray bridges it.What’s next for concurrency testing: debugging tools, distributed systems, and beyond.If you’re a Java developer, systems researcher, or just curious about how to make software more reliable, this conversation is packed with insights on the future of software testing.Links & Resources:- The Fray paper (OOPSLA 2025):- Fray on GitHub- Ao Li’s research - Rohan Padhye’s research Don’t forget to like, subscribe, and hit the 🔔 to stay updated on the latest episodes about cutting-edge computer science research.#Java #Concurrency #SoftwareTesting #Fray #OOPSLA2025 #Programming #Debugging #JVM #ComputerScience #ResearchPodcast

Share

Disseminate

Matthias Jasny | P4DB - The Case for In-Network OLTP | #10

More episodes

View all episodes

29. Mateusz Gienieczko | AnyBlox: A Framework for Self-Decoding Datasets | #69

28. Xiangyao Yu | Disaggregation: A New Architecture for Cloud Databases | #68

27. Navid Eslami | Diva: Dynamic Range Filter for Var-Length Keys and Queries | #67

15. Adaptive Factorization in DuckDB with Paul Groß

14. Parachute: Rethinking Query Execution and Bidirectional Information Flow in DuckDB - with Mihail Stoian

13. Anarchy in the Database: Abigale Kim on DuckDB and DBMS Extensibility

12. Recursive CTEs, Trampolines, and Teaching Databases with DuckDB - with Prof. Torsten Grust

11. DuckDB in Research S2 Coming Soon!

26. Rohan Padhye & Ao Li | Fray: An Efficient General-Purpose Concurrency JVM Testing Platform | #66