Share

cover art for Kevin Gaffney | SQLite: Past, Present, and Future | #11

Disseminate

Kevin Gaffney | SQLite: Past, Present, and Future | #11

Season 2, Ep. 1
Summary:

In this episode Kevin Gaffney tells us about SQLite, the most widely deployed database engine in existence. SQLite is found in nearly every smartphone, computer, web browser, television, and automobile. Several factors are likely responsible for its ubiquity, including its in-process design, standalone codebase, extensive test suite, and cross-platform file format. While it supports complex analytical queries, SQLite is primarily designed for fast online transaction processing (OLTP), employing row-oriented execution and a B-tree storage format. However, fueled by the rise of edge computing and data science, there is a growing need for efficient in-process online analytical processing (OLAP). DuckDB, a database engine nicknamed “the SQLite for analytics”, has recently emerged to meet this demand. While DuckDB has shown strong performance on OLAP benchmarks, it is unclear how SQLite compares... Listen to the podcast to find out more about Kevin's work on identifying key bottlenecks in OLAP workloads and the optimizations he has helped develop.


Questions:
  • How did you end up researching databases? 
  • Can you describe what SQLite is? 
  • Can you give the listener an overview of SQLite’s architecture? 
  • How does SQLite provide ACID guarantees? 
  • How has hardware and workload changed across SQLite’s life? 
  • What challenges do these changes pose for SQLite?
  • In your paper you subject SQLite to an extensive performance evaluation, what were the questions you were trying to answer? 
  • What was the experimental set up? What benchmarks did you use?
  • How realistic are these workloads? How closely do these map to user studies? 
  • What were the key results in your OLTP experiments?
  • You mentioned that delete performance was poor in the user study, did you observe why in the OLTP experiment?
  • Can you talk us through your OLAP experiment?
  • What were the key analytical data processing bottlenecks you found in SQLite?
  • What were your optimizations? How did they perform? 
  • What are the reasons for SQLite using dynamic programming?
  • Are your optimizations available in SQLite today? 
  • What were the findings in your blob I/O experiment? 
  • Progress in research is non-linear, from the conception of the idea for your paper to the publication, were there things you tried that failed? 
  • What do you have planned for future research? 
  • How do you think SQLite will evolve over the coming years? 
  • Can you tell the listeners about your other research?
  • What do you think is the biggest challenge in your research area now? 
  • What’s the one key thing you want listeners to take away from your research?


Links:

More episodes

View all episodes

  • 7. High Impact in Databases with... Ali Dasdan

    01:03:02||Season 7, Ep. 7
    In this High Impact episode we talk to Ali Dasdan, CTO at Zoominfo. Tune in to hear Ali's story and learn about some of his most impactful work such as his work on "Map-Reduce-Merge".The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Materials mentioned on this episode:Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters (SIGMOD'07)The Art of Doing Science and Engineering: Learning to Learn, Richard HammingHow to Solve It, George PolyaSystems Architecting: Creating & Building Complex Systems, Eberhardt RechtinYou can find Ali on:TwitterLinkedIn
  • 17. Matt Perron | Analytical Workload Cost and Performance Stability With Elastic Pools | #57

    52:10||Season 6, Ep. 17
    In this episode, we dive deep into the complexities of managing analytical query workloads with our guest, Matt Perron. Matt explains how the rapid and unpredictable fluctuations in resource demands present a significant challenge for provisioning. Traditional methods often lead to either over-provisioning, resulting in excessive costs, or under-provisioning, which causes poor query latency during demand spikes. However, there's a promising solution on the horizon. Matt shares insights from recent research that showcases the viability of using cloud functions to dynamically match compute supply with workload demand without the need for prior resource provisioning. While effective for low query volumes, this approach becomes cost-prohibitive as query volumes increase, highlighting the need for a more balanced strategy.Matt introduces us to a novel strategy that combines the best of both worlds: the rapid scalability of cloud functions and the cost-effectiveness of virtual machines. This innovative approach leverages the fast but expensive cloud functions alongside slow-starting yet inexpensive virtual machines to provide elasticity without sacrificing cost efficiency. He elaborates on how their implementation, called Cackle, achieves consistent performance and cost savings across a wide range of workloads and conditions. Tune in to learn how Cackle avoids the pitfalls of traditional approaches, delivering stable query performance and minimizing costs even as demand fluctuates wildly.Links:Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools [SIGMOD'24]Matt's Homepage
  • 6. High Impact in Databases with... Andreas Kipf

    53:06||Season 7, Ep. 6
    In this High Impact episode we talk to Andreas Kipf about his work on "Learned Cardinalities". Andreas is the Professor of Data Systems at Technische Universität Nürnberg (UTN). Tune in to hear Andreas's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.Papers mentioned on this episode:Learned Cardinalities: Estimating Correlated Joins with Deep Learning CIDR'19The Case for Learned Index Structures SIGMOD'18Adaptive Optimization of Very Large Join Queries SIGMOD'18You can find Andreas on:TwitterLinkedIn Google ScholarData Systems Lab @ UTN
  • 16. Marvin Wyrich & Justus Bogner | How Software Engineering Research Is Discussed on LinkedIn | #56

    47:53||Season 6, Ep. 16
    In this episode, we delve into the intersection of software engineering (SE) research and professional practice with experts Marvin Wyrich and Justus Bogner. As LinkedIn stands as the largest professional network globally, it serves as a critical platform for bridging the gap between SE researchers and practitioners. Marvin and Justus explore the dynamics of how research findings are shared and discussed on LinkedIn, providing both quantitative and qualitative insights into the effectiveness of these interactions. They reveal that a significant portion of SE research posts on LinkedIn are authored by individuals outside the original research team and that a majority of comments on these posts come from industry professionals, highlighting a vibrant but underutilized avenue for science communication.Our guests shed light on the current state of this metaphorical bridge, emphasizing the potential for LinkedIn to enhance collaboration and knowledge exchange between academia and industry. Despite the promising engagement from practitioners, the discussion reveals that only half of the SE research posts receive any comments, indicating room for improvement in fostering more interactive dialogues. Marvin and Justus offer practical advice for researchers to better engage with practitioners on LinkedIn and suggest strategies for making research dissemination more impactful. This episode provides valuable insights for anyone interested in leveraging social media for advancing software engineering knowledge and practice.Links:ICSE'24 PaperMarvin's HomepageJustus's Homepage
  • 5. High Impact in Databases with... Joe Hellerstein

    52:56||Season 7, Ep. 5
    In this High Impact episode we talk to Joe Hellerstein.Joe is the Jim Gray Professor of Computer Science at UC Berkeley. Tune in to hear Joe's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.
  • 15. Harry Goldstein | Property-Based Testing | #55

    49:13||Season 6, Ep. 15
    In this episode, we chat with Harry Goldstein about Property-Based Testing (PBT). Harry shares insights from interviews with PBT users at Jane Street, highlighting PBT's strengths in testing complex code and boosting developer confidence. Harry also discusses the challenges of writing properties and generating random data, and the difficulties in assessing test effectiveness. He identifies key areas for future improvement, such as performance enhancements and better random input generation. This episode is essential for those interested in the latest developments in software testing and PBT's future.Links:ICSE'24 Paper Harry's websiteX: @hgoldstein95
  • 4. High Impact in Databases with... Raghu Ramakrishnan

    23:56||Season 7, Ep. 4
    In this High Impact episode we talk to Raghu Ramakrishnan.Raghu is CTO for Data and a Technical Fellow at Microsoft. Tune in to hear Raghu's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.
  • 14. Gina Yuan | In-Network Assistance With Sidekick Protocols | #54

    55:25||Season 6, Ep. 14
    Join us as we chat with Gina Yuan about her pioneering work on sidekick protocols, designed to enhance the performance of encrypted transport protocols like QUIC and WebRTC. These protocols ensure privacy but limit in-network innovations. Gina explains how sidekick protocols allow intermediaries to assist endpoints without compromising encryption.Discover how Gina tackles the challenge of referencing opaque packets with her innovative quACK tool and learn about the real-world benefits, including improved Wi-Fi retransmissions, energy-saving proxy acknowledgments, and the PACUBIC congestion-control mechanism. This episode offers a glimpse into the future of network performance and security.Links:NSDI'2024 PaperGina's HomepageSidekick's Github Repo
  • 3. High Impact in Databases with... Moshe Vardi

    47:39||Season 7, Ep. 3
    Welcome to another episode of the High Impact series - today we talk with Moshe Vardi! Moshe is the Karen George Distinguished Service Professor in Computational Engineering at Rice University where his research focuses on automated reasoning. Tune in to hear Moshe's story and learn about some of his most impactful work.The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.You can find Moshe on X, LinkedIn, and Mastadon @vardi. Links to all his work can be found on his website here.