Before routine care texts can be used for clinical and research purposes, they must first be made readable by computer-based natural language processing (NLP) programmes. This requires large amounts of annotated text from routine patient care. Annotated texts are documents that contain additional information through systematic annotations, e.g. information on diagnoses or medications. The annotations are manually reviewed by medical students and serve as a reference for further improvement of the automatic annotation. Information structured in this way can be used together with existing data for analysis and statistical modelling.
The IT infrastructure that has been built during the development and networking phase of the Medical Informatics Initiative (MII) between 2018 and 2022 offers the possibility of making clinical documents accessible on a large scale and enriching them with systematic annotations. The MII methodology platform GeMTeX aims to address the two major bottlenecks of current language models: data accessibility and data annotation.
Large collection of German-language medical texts from patient care is being created
Within the framework of GeMTeX, six university medical centres in Munich, Leipzig, Essen, Berlin, Dresden and Erlangen are collecting documents from electronic patient files (ePA) with the consent of the patients. Using natural language processing, the documents are processed in compliance with data protection regulations and made available in anonymised form for joint use. This creates a valuable text repertoire for research and development.
In addition, GeMTeX will create a central technical and organisational structure to collect anonymised texts and process them for enrichment according to guidelines. The resulting text database can be used to train AI models and test their usefulness in everyday clinical practice.
The GeMTeX Methodology Platform was launched on 1 June 2023 and is funded by the German Federal Ministry of Education and Research (BMBF) with around seven million euros until 31 August 2026.
Further information:
https://www.smith.care/en/gemtex_mii/about-gemtex/
Interview with Christina Lohr and Luise Modersohn, research assistants in the GeMTeX project
Contact:
Project Lead
Prof. Dr. Martin Boeker
Network coordinator
Head of the DIFUTURE Consortium
Professor of Medical Informatics
Technical University of Munich/University Hospital rechts der Isar
© Klinikum rechts der Isar, Technische Universität München
Prof. Dr. Markus Löffler
Deputy Network Coordinator
Head of the SMITH Consortium
Institute für Medical Informatics, Statistics and Epidemiology (IMISE)
Leipzig University
© Universitätskilinikum Hamburg-Eppendorf/Ronald Frommann
Project coordination:
Janina Kind
Administrative Project Management
SMITH-Office
Leipzig University
© UKL
Dr. Frank Meineke
Scientific Project Management/Technical management
Institute for Medical Informatics, Statistics and Epidemiology (IMISE)
Leipzig University
© Swen Reichhold
Luise Modersohn
Scientific Project Management/Lead Annotation
Institute for AI and Informatics in Medicine
Technical University of Munich/University Hospital rechts der Isar
© K. Czoppelt/Klinikum rechts der Isar
Christina Lohr
Scientific Project Management
Institute for Medical Informatics, Statistics and Epidemiology (IMISE)
Leipzig University
©Christina Lohr
Partners:
Charité – University Hospital Berlin
ID GmbH & Co. KGaA
- Technical University of Darmstadt
- Dresden University of Technology
- University Hospital Erlangen
University Hospital Essen
- Averbis GmbH
- Hannover Medical School
- Heidelberg University Hospital
- German National Library of Medicine (ZB MED)
- Leipzig University
University of Leipzig Medical Center
- Ludwig Maximilian University of Munich
Technical University of Munich
- University of Münster
- Hasso Plattner Institute for Digital Engineering gGmbH
Tübingen University Hospital
- Medical University of Graz (Associated Partner)