dataquieR → ODM (XLSX Converter)

This repository contains a small Python script that converts an XLSX file in “dataquieR format” (metadata used to control automated data quality checks with the R package dataquieR) into one or multiple CDISC ODM XML files.

The goal is to transform the metadata table (variables, labels, value labels/codelists, missing lists, …) into a valid ODM structure (StudyEventDef, FormDef, ItemGroupDef, ItemDef, CodeList, …).

What the script does

1) Reads the XLSX

The first sheet is treated as the main metadata sheet (variables).
All remaining sheets are treated as lookup sheets (e.g. missing list tables).

2) Builds the ODM structure

The output ODM has the following structure:

Study (OID = file basename)
MetaDataVersion (OID = MDV.1)
One StudyEventDef per generated “group”
Inside each study event:
- one FormDef per form
- one ItemGroupDef per form (1:1 mapping)
- multiple ItemDef (one per variable/row)
- CodeList elements for value labels (including merged missing lists)

3) Grouping (StudyEvent / FormDef)

Rows are grouped into a 2-level structure:

StudyEvent key
- by default derived from HIERARCHY
- if column DCE is present and not empty, it overrides the StudyEvent key
Form key
- by default derived from HIERARCHY
- if column STUDY_SEGMENT is present and not empty, it overrides the Form key

So conceptually:

DCE → StudyEvent (if present)
STUDY_SEGMENT → Form (if present)
otherwise: derived from HIERARCHY

4) Codelists / VALUE_LABELS

VALUE_LABELS (English) and VALUE_LABELS_DE (German) are parsed into dictionaries.
Identical codelists are deduplicated: they are only written once and referenced from all corresponding variables.

5) Missing lists (MISSING_LIST_TABLE)

If a row contains MISSING_LIST_TABLE, the script attaches the missing list codes to the variable’s final codelist.
Missing list tables are taken from the corresponding additional sheet (same sheet name).
Missing codes are appended and marked with an alias:
- Alias Context="ORIGIN_CODELIST" Name="<sheet>"

6) Splitting into multiple ODM files

To avoid huge ODM files, the script can split output automatically:

Default behavior (without --force_single_odm):
- if any generated output group exceeds ~5700 variables, it will be split further
- splitting logic uses HIERARCHY-based repartitioning/chunking
With --force_single_odm:
- everything is written into a single ODM output (even if very large)

Input expectations (XLSX)

Required / commonly used columns

The script expects a “dataquieR-like” metadata table and typically uses these columns:

VARNAMES or VAR_NAMES (variable name)
HIERARCHY
STUDY_SEGMENT (optional, affects forms)
DCE (optional, affects study events)
LABEL, LABEL_DE
NOTE, NOTE_DE
DATA_TYPE
VALUE_LABELS, VALUE_LABELS_DE
MISSING_LIST_TABLE (optional)

Missing list tables (other sheets)

If MISSING_LIST_TABLE references a sheet name, that sheet should usually contain:

CODE_VALUE
CODE_LABEL
and optionally additional columns (they will be written as <Alias Context="..." Name="..."/>)

Installation

Linux/macOS

python3 -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip pip install -r requirements.txt

Windows

py -m venv .venv ..venv\Scripts\Activate.ps1 python -m pip install --upgrade pip pip install -r requirements.txt

Usage

Linux/macOS

python3 dataquieR2ODM.py /path/to/your/file.xlsx

Windows

python dataquieR2ODM.py "C:\path\to\your\file.xlsx"

Force a single ODM

python3 dataquieR2ODM.py /path/to/your/file.xlsx --force_single_odm

Output

Ist written to ../output/ relative to the script location. File naming: Study__.xml

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataquieR2ODM.py		dataquieR2ODM.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dataquieR → ODM (XLSX Converter)

What the script does

1) Reads the XLSX

2) Builds the ODM structure

3) Grouping (StudyEvent / FormDef)

4) Codelists / VALUE_LABELS

5) Missing lists (MISSING_LIST_TABLE)

6) Splitting into multiple ODM files

Input expectations (XLSX)

Required / commonly used columns

Missing list tables (other sheets)

Installation

Linux/macOS

Windows

Usage

Linux/macOS

Windows

Force a single ODM

Output

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

IMI-HD/dataquieR

Folders and files

Latest commit

History

Repository files navigation

dataquieR → ODM (XLSX Converter)

What the script does

1) Reads the XLSX

2) Builds the ODM structure

3) Grouping (StudyEvent / FormDef)

4) Codelists / VALUE_LABELS

5) Missing lists (MISSING_LIST_TABLE)

6) Splitting into multiple ODM files

Input expectations (XLSX)

Required / commonly used columns

Missing list tables (other sheets)

Installation

Linux/macOS

Windows

Usage

Linux/macOS

Windows

Force a single ODM

Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages