Phd position f/m experimentation with llms for fortran migration

Villeneuve-d'Ascq

Inria

De 40 000 € à 60 000 € par an

Publiée le 30 novembre

Description de l'offre

Le descriptif de l’offre ci-dessous est en Anglais

Type de contrat : CDD

Niveau de diplôme exigé : Bac + 5 ou équivalent

Fonction : Doctorant

Niveau d'expérience souhaité : Jusqu'à 3 ans

A propos du centre ou de la direction fonctionnelle

The Inria University of Lille centre, created in 2008, employs 360 people including 305 scientists in 15 research teams. Recognised for its strong involvement in the socio-economic development of the Hauts-De-France region, the Inria University of Lille centre pursues a close relationship with large companies and SMEs. By promoting synergies between researchers and industrialists, Inria participates in the transfer of skills and expertise in digital technologies and provides access to the best European and international research for the benefit of innovation and companies, particularly in the region.For more than 10 years, the Inria University of Lille centre has been located at the heart of Lille's university and scientific ecosystem, as well as at the heart of Frenchtech, with a technology showroom based on Avenue de Bretagne in Lille, on the EuraTechnologies site of economic excellence dedicated to information and communication technologies (ICT)

Contexte et atouts du poste

This PhD will happen in the context of the Inria LLM4Code défi. LLM4Code is an ambitious project incorporating several INRIA groups and external partners for building reliable and productive solutions based on Large Language Models.

Mission confiée

During the project the phd student will focus on assessing the possibility of performing a software migration with LLMs in the specific context of a given niche technology for a given organization (specific domain, specific development culture).

Context

We are engaged with an industrial partner on a code transformation project that aims to migrate a Fortran-77 + proprietary extension code base into modern Fortran code. The project uses a model driven approach where the existing code is modeled, this model is "refactored" and then regenerated in modern Fortran.

Challenges

The performance of LLMs is correlated with their training data quality. The majority of the training dataset comes from publicly available software artifacts, and often these data can be of questionable quality, riddled with vulnerabilities, biased and produce varying outputs for identical prompts.

Generic LLMs are trained from millions of “documents”. For software engineering and code generation, specialized LLMs (like HuggingFace or Llama) have been trained, but they are bound to contain less Fortran examples as less Fortran project are available in common open-source repositories (like github).

The project will need to evaluate how such imperfect LLMs can be used for migration, what are the consequences on the quality of the result and what techniques (if any) can be used to improve these results.

Outcome

The project will propose a methodology to realize code migration of a niche technology for a specific organization using LLMs.

More importantly, it will identify the key points required in such a project and the advantages and drawback of such a project as compared for example to a deterministic model based approach?

Bibliography

* Frank F. Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. “A systematic evaluation of large language models of code”. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS 2022). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3520312.3534862

* Mahmood, Hina & Jilani, Atif & Rauf, Abdul. (2023). “Code Swarm: A Code Generation Tool Based on the Automatic Derivation of Transformation Rule Set”. International Journal of Software Engineering & Applications. 14. 1-11.

* Gustavo Pinto, Cleidson de Souza, João Batista Neto, Alberto de Souza, Tarcísio Gotto, and Edward Monteiro, “Lessons from Building CodeBuddy: A Contextualized AI Coding Assistant”, arXiv e-prints, 2023. doi:10.48550/arXiv.2311.18450.

Principales activités

Responsibilities:

* Analysis and reverse engineering of existing codebases (leveraged by Software Heritage archive)
* Applying LLM for analysis of existing code, tests and migration results
* Contributing to summarization and dissemination of results, writing scientific articles.

Compétences

* Good foundation in Machine Learning and Software Engineering.
* Proficiency in OOP is required (knowing of Pharo programming language is a plus)
* Excellent problem-solving abilities and a strong interest in research.
* Ability to work independently and collaboratively in a dynamic team.
* Good communication skills (English required, French is a plus)

Avantages

* Subsidized meals
* Partial reimbursement of public transport costs
* Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
* Possibility of teleworking and flexible organization of working hours
* Professional equipment available (videoconferencing, loan of computer equipment, etc.)
* Social, cultural and sports events and activities
* Access to vocational training
* Social security coverage

Rémunération

2100€ gross per month for the 1st and 2nd years

2190€ gross per month for the 3rd year

Informations générales

* Thème/Domaine : Architecture, langages et compilation
Ingénierie logicielle (BAP E)
* Ville : Villeneuve d'Ascq
* Centre Inria : Centre Inria de l'Université de Lille
* Date de prise de fonction souhaitée : 2025-02-01
* Durée de contrat : 2 ans, 11 mois
* Date limite pour postuler : 2024-12-24

Attention: Les candidatures doivent être déposées en ligne sur le site Inria. Le traitement des candidatures adressées par d'autres canaux n'est pas garanti.

Consignes pour postuler

Sécurité défense :
Ce poste est susceptible d’être affecté dans une zone à régime restrictif (ZRR), telle que définie dans le décret n°2011-1425 relatif à la protection du potentiel scientifique et technique de la nation (PPST). L’autorisation d’accès à une zone est délivrée par le chef d’établissement, après avis ministériel favorable, tel que défini dans l’arrêté du 03 juillet 2012, relatif à la PPST. Un avis ministériel défavorable pour un poste affecté dans une ZRR aurait pour conséquence l’annulation du recrutement.

Politique de recrutement :
Dans le cadre de sa politique diversité, tous les postes Inria sont accessibles aux personnes en situation de handicap.

Contacts

* Équipe Inria : EVREF
* Directeur de thèse :
Safina Larisa / larisa.safina@inria.fr

A propos d'Inria

Inria est l’institut national de recherche dédié aux sciences et technologies du numérique. Il emploie 2600 personnes. Ses 215 équipes-projets agiles, en général communes avec des partenaires académiques, impliquent plus de 3900 scientifiques pour relever les défis du numérique, souvent à l’interface d’autres disciplines. L’institut fait appel à de nombreux talents dans plus d’une quarantaine de métiers différents. 900 personnels d’appui à la recherche et à l’innovation contribuent à faire émerger et grandir des projets scientifiques ou entrepreneuriaux qui impactent le monde. Inria travaille avec de nombreuses entreprises et a accompagné la création de plus de 200 start-up. L'institut s'eﬀorce ainsi de répondre aux enjeux de la transformation numérique de la science, de la société et de l'économie.

#J-18808-Ljbffr

Postuler

Créer une alerte

Sauvegarder

Offre similaire

Ingénieur scientifique contractuel construction de machines virtuelles langage h/f

Villeneuve-d'Ascq

CDI

Inria

Construction

Offre similaire

Chargé(e) des ressources humaines en apprentissage (h/f)

Villeneuve-d'Ascq

Alternance

Inria

Chargé de ressources humaines

Offre similaire

Ingénieur de recherche en génie logiciel (h/f)

Villeneuve-d'Ascq

Inria

Ingénieur de recherche