Nowadays, thanks to advances in machine learning and the availability of massive amounts of data, computer software plays an increasingly important role in assisting or even autonomously making decisions with far-reaching societal impact, e.g., in fields such as social welfare, criminal justice, and even health care.

As data science software becomes more and more widespread, we become increasingly vulnerable to programming errors. In particular, programming errors that do not cause failures can have serious consequences since code that produces an erroneous but plausible result gives no indication that something went wrong. In financial applications, flawed code can cause losses of billions of dollars[1]. In medical applications, programming errors are deadly[2].

However, programming errors are not the only concern. A number of recent cases have evidenced the importance of ensuring data privacy[3] as well as software fairness[4]. Going forward, data science software will be subject to more and more legal regulations (e.g., the European General Data Protection Regulation adopted in 2016) as well as administrative audits.


The Lyra research project is a long-term research effort to enhance the understanding and reliabilty of data science software. It aims ad developing new practical and accessible analyses and tools to reason about and provide rigorous guarantees of the behavior of data analytics, big data, machine learning, and deep learning applications. For this purpose, we are currently targeting Python, one of the most popular programming languages for data science. A prototype static analyzer is open-source and available on GitHub.

Completed Projects


. Perfectly Parallel Fairness Certification of Neural Networks. CoRR abs/1912.02499, 2019.

PDF Code Project Project BibTeX arXiv

. MaxSMT-Based Type Inference for Python 3. In CAV, 2018.

PDF Code Project BibTeX Springer


A Static Analyzer for Data Science Software
Monday, July 20, 2020
Static Analysis of Data Science Software
Wednesday, October 9, 2019
What Programs Want: Automatic Inference of Input Data Specifications
Tuesday, April 2, 2019
An Abstract Interpretation Framework for Input Data Usage
Monday, October 2, 2017
An Abstract Interpretation Framework for Input Data Usage
Tuesday, September 12, 2017