19.02.2025. With the core dataset, university medical sites have defined which datasets the Data Integration Centers (DICs) of the MII must at least maintain for all inpatient patient data. These modules specify the scope and standardization of medical content. Recently, the Oncology Extension Module has been published. We spoke with Thomas Debertshäuser from the Berlin Institute of Health at Charité (BIH), responsible for the technical implementation, and Dr. Linda Gräßel from the University Medical Center Freiburg, responsible for clinical expertise in the core dataset team, about this release.
What does the Oncology Module of the core dataset include?
Thomas Debertshäuser: The Oncology Module enables the integration of oncology treatment data into the Data Integration Centers (DICs) and allows standardized utilization. This is based on the oncological core dataset (oBDS) in its current version from 2021. The module includes a detailed description of the primary diagnosis and histology, information on administered therapies (including complications and side effects), as well as cancer-specific parameters such as TNM classification, staging over time, and tumor board recommendations. These data can be extracted by the DICs from existing, well-maintained tumor documentation systems and made available for research.
Why was it important to expand the core dataset with this module? Are further extensions planned?
Thomas Debertshäuser: The module enables the search for new oncology-related data elements via the Health Research Data Portal (FDPG) in distributed analyses. Previously, cancer patient searches were only possible using specific diagnosis codes, such as ICD-10 or ICD-O. The Oncology Module expands this by adding cancer-specific data points, such as TNM classification or the intention of chemotherapy, which were previously not searchable.
In Q1 2025, our focus will be on supporting DICs in local implementation. For the 2026 module version, we plan to integrate organ-specific modules for breast, prostate, colorectal, and skin cancer, as well as refine cancer-specific classifications such as FIGO stages for gynecological tumors and the WHO classifications for leukemia and lymphoma used in hematology. Additionally, the Molecular Tumor Board (MTB) module is currently in development, which builds upon the Oncology Module. Of course, we also take community feedback into account and closely follow all developments.
How was the Oncology Module developed, and who was involved?
Thomas Debertshäuser: Oncology was identified early on as a relevant but non-mandatory module. Initial considerations on how to best implement it began as early as 2019. A major step was the decision to base it on the cancer registry-based dataset oBDS. However, one challenge was the necessary integration with other modules (diagnosis, procedures, and medication), which only became stable and mature in recent years.
The technical profiling was carried out primarily in early 2024, led by my colleagues and me at BIH, with support from other MII sites. In April and May 2024, we conducted a public comment phase, gathering feedback from key stakeholders, including cancer registries and the German Consortium for Translational Cancer Research (DKTK). Close collaboration with the BMBF-funded cross-site project PM4Onco was particularly valuable, as Freiburg was simultaneously working on the Molecular Tumor Board (MTB) core dataset. Early engagement with stakeholders proved beneficial. In total, we received over 100 comments, demonstrating the high relevance of this topic and making us very pleased with the engagement.
Dr. Gräßel, for what research questions can the Oncology Module be used?
Dr. Linda Gräßel: The module is useful for all research questions requiring clinical data on oncology patients. One example is real-world data analyses, such as studying treatment effectiveness outside of clinical trials. Since clinical trials often occur under controlled conditions, differences in toxicity, dose reductions, and drug efficacy may emerge in real-world settings. Moreover, trials frequently have strict inclusion criteria, excluding certain patient groups. Only standardized and comprehensive clinical datasets allow for meaningful scientific evaluation of such aspects—especially across multiple treatment centers. The Oncology Core Dataset will play a crucial role in enabling such analyses on a larger scale.
Additionally, the module is essential for basic research. Imagine conducting a study on a specific disease and receiving biological samples from participating centers. The dataset enables linking biomaterials to clinical data, for example, determining whether a sample comes from the initial diagnosis or what treatments have already been administered. Of course, all of this is done in compliance with ethical and data protection regulations for each project.
What added value does the MII core dataset provide for medical research?
Dr. Linda Gräßel: The core dataset offers broad applicability through standardized formats and interoperability. This facilitates cross-institutional research projects, especially in personalized medicine, where only a small number of patients with identical characteristics are treated at a single site. Multicenter analyses with harmonized clinical data significantly increase the statistical power of studies.
Standardized datasets are also a key prerequisite for Artificial Intelligence (AI), which will play an increasing role in medical research and, eventually, in clinical decision support systems (DSS). The PM4Onco project is driving these advancements. Therefore, the availability of clinical data through the Oncology Module and the future MTB Module is of great importance and is actively supported by this initiative.