Episode 125: James Koppel discusses counterfactual inference and automated explanation
In this episode, James Koppel (MIT, James Koppel Coaching) joins me and Dominick Reo to talk about how we can write software to help identify the causes of disasters.
These days, there's often a tendency to think of software primarily as a venue for frivolous pleasures. Maybe there's a new app that's really good at hooking me up with videos of alpacas on skateboards, or making my mom look like a hot dog when she's video chatting with me, or helping me decide what flavor of cupcake I want delivered to my home—because gosh, I just am just way too stressed right now to be able to figure that out. Have you seen how few Retweets I'm getting? If we followed the lead of a lot of the popular rhetoric about the software industry, we might very well come away with the impression that tech exists solely to facilitate precious, self-involved time wasting. And if that's right, then if it doesn't work from time to time, who really cares?
But in fact, software correctness is frequently a life or death matter. Computer software controls our medical life support systems, it manages our health care records, it navigates our airplanes, and it keeps track of our bank account balances. If the author of the software used in any of those systems messed something up, it can and often will lead to planes crashing into mountains, or life support systems malfunctioning for no particular reason, or some other tragedy.
James Koppel is here to tell us that software can do better. It can be designed ‘preventatively’ to avoid large classes of bugs in advance, and there are diagnostic techniques that can help pinpoint those bugs that cannot be ruled out in advance. In this episode, Koppel discusses some work he started in 2015 as a follow-up to Stanford's Cooperative Bug Isolation project, which provided a way to gather detailed diagnostics about the conditions under which programs fail or crash. But the problem he kept running into was that the diagnostic information was too much correlation and not enough causation. If the analysis you did tells you that your app crashes whenever it tries to load a large image, that's ok, but it doesn't tell you what about the large image causes the crash, or what other kinds of large images would also cause a crash, or whether the crash even is a result of largeness or something more specific. Correlation information is a great start, but ultimately, it's of limited use when it comes to directly fixing the problem.
To deal with this, in his more recent work, Koppel and his colleagues have turned to the analysis of counterfactuals and causation, which is an interesting point of collaboration between philosophers and computer scientists. Using a recent paradigm called probabilistic programming, they have identified a way to have a computer program run the clock back and simulate what would have happened, had some condition been different, to determine whether that condition is the cause of a bug. The project is still in its initial stages, but if it works, it promises to deliver major dividends in making the technology we rely on more reliable.
Tune in to hear more about this exciting new area of research!