Nowadays, thanks to advances in machine learning and the availability of massive amounts of data, computer software plays an increasingly important role in assisting or even autonomously making decisions with far-reaching societal impact, e.g., in fields such as social welfare, criminal justice, and even health care.
As data science software becomes more and more widespread, we become increasingly vulnerable to programming errors. In particular, programming errors that do not cause failures can have serious consequences since code that produces an erroneous but plausible result gives no indication that something went wrong. In financial applications, flawed code can cause losses of billions of dollars[1]. In medical applications, programming errors are deadly[2].
However, programming errors are not the only concern. A number of recent cases have evidenced the importance of ensuring data privacy[3] as well as software fairness[4]. Going forward, data science software will be subject to more and more legal regulations (e.g., the European General Data Protection Regulation adopted in 2016) as well as administrative audits.
The Lyra research project is a long-term research effort to enhance the understanding and reliabilty of data science software. It aims ad developing new practical and accessible analyses and tools to reason about and provide rigorous guarantees of the behavior of data analytics, big data, machine learning, and deep learning applications. For this purpose, we are currently targeting Python, one of the most popular programming languages for data science. A prototype static analyzer is open-source and available on GitHub.
Completed Projects
- Serge Durand
Static Analysis by Abstract Interpretation of the ACAS Xu Neural Networks
M1 Internship, École Normale Supérieure, 2020 - Radwa Sherif Abdelbar
Automated Checking of Implicit Assumptions on Textual Data
Bachelor’s Thesis, ETH Zurich, SS 2018 - Lowis Engel
Usage Analysis of Data Stored in Map Data Structures
Bachelor’s Thesis, ETH Zurich, SS 2018 - Madelin Schumacher
Automated Generation of Data Quality Checks
Master’s Thesis, ETH Zurich, AS 2017 - Mostafa Hassan
Static Type Inference for Python
Bachelor’s Thesis, ETH Zurich, SS 2017 - Simon Wehrli
Static Program Analysis of Data Usage Properties
Master’s Thesis, ETH Zurich, SS 2017