Nowadays, thanks to advances in machine learning and the availability of massive amounts of data, computer software plays an increasingly important role in assisting or even autonomously making decisions with far-reaching societal impact, e.g., in fields such as social welfare, criminal justice, and even health care.

As data science software becomes more and more widespread, we become increasingly vulnerable to programming errors. In particular, programming errors that do not cause failures can have serious consequences since code that produces an erroneous but plausible result gives no indication that something went wrong. In financial applications, flawed code can cause losses of billions of dollars[1]. In medical applications, programming errors are deadly[2].

However, programming errors are not the only concern. A number of recent cases have evidenced the importance of ensuring data privacy[3] as well as software fairness[4]. Going forward, data science software will be subject to more and more legal regulations (e.g., the European General Data Protection Regulation adopted in 2016) as well as administrative audits.


The Lyra research project is a long-term research effort to enhance the understanding and reliabilty of data science software. It aims ad developing new practical and accessible analyses and tools to reason about and provide rigorous guarantees of the behavior of data analytics, big data, machine learning, and deep learning applications. For this purpose, we are currently targeting Python, one of the most popular programming languages for data science. A prototype static analyzer is open-source and available on GitHub.

Completed Projects


. A Review of Formal Methods applied to Machine Learning. CoRR abs/2104.02466, 2021.

PDF Project arXiv HAL

. MaxSMT-Based Type Inference for Python 3. In CAV, 2018.

PDF Code Project Artifact BibTeX Springer


Static Analysis for Data Scientists
Friday, July 8, 2022 1:30 PM
Static Analysis for Data Scientists
Tuesday, June 14, 2022 1:30 PM
Static Analysis for Data Scientists
Friday, May 20, 2022 2:00 PM
Formal Methods for Robust Artificial Intelligence: State of the Art
Wednesday, January 13, 2021
Static Analysis for Data Science
Monday, November 2, 2020 10:00 AM
A Guided Tour of a Static Analyzer for Data Science Software
Monday, July 20, 2020 7:15 AM
Static Analysis of Data Science Software
Wednesday, October 9, 2019 2:00 PM
What Programs Want: Automatic Inference of Input Data Specifications
Tuesday, April 2, 2019 11:30 AM
An Abstract Interpretation Framework for Input Data Usage
Monday, October 2, 2017 5:00 PM
An Abstract Interpretation Framework for Input Data Usage
Tuesday, September 12, 2017 3:30 PM