Static Analysis for Data Scientists


Big data analytics has revolutionized the world of software development in the past decade. Every day, data scientists develop computer programs to gather, triage, and process data, in order to ultimately help us make data-driven decisions. As we rely more and more on such data-manipulating software, we become increasingly vulnerable to poor choices, wrong assumptions, or other (programming or technical) mistakes made during software development. Mistakes that do not cause software failures can have serious consequences, since they give no indication that something went wrong along the way. In safety-critical applications, such mistakes can be deadly. In this chapter, we will present ongoing work to develop an abstract interpretation-based static analysis framework for data scientists. In particular, we will focus on issues arising from unexpected data and describe the challenges involved in designing and developing a practical static analysis that infers necessary expectations on the data read and manipulated using Jupyter notebooks, an increasingly popular development environment among data scientists.

In Vincenzo Arceri, Agostino Cortesi, Pietro Ferrara, Martina Olliaro, “Challenges of Software Verification”, Intelligent Systems Reference Library, Volume 238, 2023