Detecting Errors in MPI Applications with Deep Learning

Determining whether a parallel program always behaves as expected on any execution is difficult due to non-determinism, and despite the growing number of verification tools, debugging parallel programs remains a significant challenge. The goal of the internship is to explore a new AI-assisted approach for analyzing parallel programs to detect bugs.

Description of the task

We have a large set of incorrect parallel codes that are labeled with different types of errors, including data races and deadlocks. The internship task is to build machine learning models that predict for new unseen codes whether or not they have errors.

In particular, the student will explore different types of input features or models. We will start by exploring LLVM Intermediate Representation (IR) embedding techniques that transform codes into vectors. The idea is that codes that have a similar behaviors will be represented by similar vectors. Therefore, if a new code has a very similar vector to a code with an error, we can expect that this new code also has this error.

The student can also explore more experimental methods. For instance, the compiler middle end optimization process can compile the same code with many different optimizations resulting in different intermediate representations of the same code. We plan to also explore how recompiling codes provides more data to feed the deep learning model and in turn improve the prediction accuracy.

The work will be conducted while collaborating with the University of Iowa, US. Therefore, the student will have the opportunity to discuss and present his/her work both with the PhD students and advisors involved from Inria and from the University of Iowa.

Workflow and Goals

The student will work on extracting features and training models to predict errors. He/she is also strongly encouraged to attend weekly group meetings and present his/her work progress: this is an opportunity to integrate the team STORM. Depending on the research results, the student can also participate in the writing process to publish a research article.

The internship goal for the student is to develop his/her knowledge on machine learning, feature engineering with LLVM IR, as well as his/her writing and presentation skills. The internship is also an opportunity to observe how academical research is conducted.

Mots-clés:

MPI, Machine learning, Verification

Pré-requis:

Candidates are expected to:

  • be able to understand large amount of data
  • have experience with script languages (e.g., python, shell). This is necessary to train models and analyze results.
  • be familiar with C/OpenMP/MPI programming: we are investigating different benchmarks.
  • have already used Linux. The project requires to update OS environment variables to test different ways of execution.

Contacts :

Mihail Popov: mihail.popov@inria.fr Emmanuelle Saillard: emmanuelle.saillard@inria.fr

Lieu du stage :

Inria Bordeaux Sud-Ouest, STORM team