German Medical Text Corpus


Matthias Gietzelt

Project partners


The GeMTeX Project is funded by the Federal Ministry of Education and Research within the national ”Medical Informatics Initiative" with approx. 6.8 million euros, of which approx. 200,000 euros have been made available to the MHH (promotional referrence: 01ZZ2314J).


In everyday clinical practice, numerous texts are produced, such as doctors' letters and reports, which contain valuable information about the development, course, and treatment of a disease. These texts could be used by natural language processing (NLP) tools to assist doctors and researchers in their work. However, the full potential of clinical documents cannot be realised due to a lack of standardisation. The GeMTeX (German Medical Text Corpus) methodology platform aims to fill this gap and make medical texts from patient care available for research projects. The goal is to create the largest medical text corpus in the German language.

Hannover Medical School is focussing on the processing of molecular-pathological findings, which contain a number of particular technical terms, bioinformatical relationships and special terminologies.