Tutorial: End-to-End Neuro-Oncology Preprocessing

This tutorial walks through a complete OncoPrep workflow — from raw DICOMs to preprocessed derivatives and tumor segmentation — using a single subject.

Prerequisites

pip install oncoprep

You will also need:

  • A directory of DICOM files for one or more subjects

  • Docker installed (for the segmentation step)

  • ~4 GB free disk space for intermediate files

Step 1 — DICOM to BIDS conversion

Organize your raw data so each subject has its own directory:

raw_dicoms/
└── 001/
    ├── T1_MPRAGE_SAG_P2_1_0_ISO_0032/
    ├── T1_MPRAGE_SAG_P2_1_0_ISO_POST_0071/
    ├── T2_SPC_DA-FL_SAG_P2_1_0_0012/
    └── COR_FLAIR_0103/

Run the conversion:

oncoprep-convert raw_dicoms/ bids_output/ --subject 001

The output is a valid BIDS dataset:

bids_output/
├── dataset_description.json
└── sub-001/
    └── anat/
        ├── sub-001_T1w.nii.gz
        ├── sub-001_T1w.json
        ├── sub-001_ce-gadolinium_T1w.nii.gz
        ├── sub-001_T2w.nii.gz
        └── sub-001_FLAIR.nii.gz

Tip

For batch conversion of many subjects, run without the --subject flag:

oncoprep-convert raw_dicoms/ bids_output/

Step 2 — Preprocessing

Run the anatomical preprocessing pipeline:

oncoprep bids_output/ derivatives/ participant \
  --participant-label 001 \
  --nprocs 4

This will:

  1. Validate the BIDS dataset structure

  2. Conform images to 1 mm isotropic resolution

  3. Register T1ce, T2w, and FLAIR to the T1w reference image

  4. Skull-strip using ANTs brain extraction

  5. Normalize to MNI152NLin2009cAsym template space

  6. Write all outputs as BIDS derivatives

When segmentation is enabled (Step 3 below), template normalization is deferred until after the tumor mask is available. The dilated whole-tumor mask is used as a cost-function exclusion region (-x) for ANTs SyN, preventing pathological tissue from distorting the warp.

Note

Processing a single subject typically takes 15–30 minutes on 4 cores.

Choosing a skull-stripping backend

OncoPrep supports three backends:

# ANTs brain extraction (default)
oncoprep ... --skull-strip-backend ants

# HD-BET (GPU-accelerated, requires pip install "oncoprep[hd-bet]")
oncoprep ... --skull-strip-backend hdbet

# FreeSurfer SynthStrip
oncoprep ... --skull-strip-backend synthstrip

Choosing a registration backend

# ANTs SyN (default, slower, more accurate)
oncoprep ... --registration-backend ants

# PICSL Greedy (faster)
oncoprep ... --registration-backend greedy

Step 3 — Tumor segmentation

OncoPrep ships two segmentation backends. The default (--default-seg) uses nnInteractive, a zero-shot 3D promptable foundation model that requires no Docker containers — just a ~400 MB model checkpoint that downloads automatically on first use.

nnInteractive (default, CPU or GPU)

oncoprep bids_output/ derivatives/ participant \
  --participant-label 001 \
  --run-segmentation --default-seg

nnInteractive (Isensee et al., 2025; arXiv:2503.08373) was trained on 120+ diverse 3D datasets and performs zero-shot inference on glioma MRI — it has never seen BraTS training data. OncoPrep fully automates the prompting step by deriving seed points from multi-modal intensity anomalies (see Segmentation for the algorithm). Runs on CUDA, Apple Silicon (MPS), or CPU.

Docker ensemble (GPU required)

oncoprep bids_output/ derivatives/ participant \
  --participant-label 001 \
  --run-segmentation

This runs all 14 BraTS models and fuses their predictions using majority voting. Requires a GPU with CUDA support. Takes ~30–60 minutes.

The output is a discrete segmentation label map:

derivatives/oncoprep/sub-001/anat/sub-001_desc-tumor_dseg.nii.gz

With BraTS labels:

Label

Region

1

Necrotic Core (NCR)

2

Peritumoral Edema (ED)

3

Enhancing Tumor (ET)

4

Resection Cavity (RC)

Step 4 — Radiomics (optional)

Extract quantitative imaging features from the segmentation:

pip install "oncoprep[radiomics]"

oncoprep bids_output/ derivatives/ participant \
  --participant-label 001 \
  --run-radiomics --default-seg

--run-radiomics implies --run-segmentation, so you don’t need both flags.

Output:

derivatives/oncoprep/sub-001/anat/sub-001_desc-radiomics_features.json

The JSON contains features for each tumor region (NCR, ED, ET, WT, TC) across feature classes (shape, first-order, GLCM, GLRLM, GLSZM, GLDM, NGTDM).

Step 5 — Group-Level ComBat Harmonization (multi-site studies)

If your study includes subjects scanned on different scanners or at different sites, radiomics features will contain systematic batch effects. OncoPrep’s group-level ComBat harmonization removes these effects while preserving biological covariates.

Prerequisites

  • Participant-level radiomics must be complete for all subjects (at least 3 subjects across at least 2 scanner batches)

  • Install the radiomics extras: pip install "oncoprep[radiomics]"

Quick start: auto-generate batch labels

The simplest approach lets OncoPrep derive scanner batch labels from the BIDS JSON sidecars (Manufacturer, ManufacturerModelName, MagneticFieldStrength — fields that survive anonymization):

# Step 1: Run participant-level radiomics for all subjects
for subj in 001 002 003 004 005 006; do
  oncoprep bids_output/ derivatives/ participant \
    --participant-label $subj \
    --run-radiomics --default-seg
done

# Step 2: Run group-level ComBat harmonization
oncoprep bids_output/ derivatives/ group \
  --generate-combat-batch

This generates:

  • derivatives/oncoprep/combat_batch.csv — the auto-generated batch CSV

  • derivatives/oncoprep/sub-XXX/anat/sub-XXX_desc-radiomicsCombat_features.json — harmonized features for each subject

  • derivatives/oncoprep/group_combat_report.html — summary report

Custom batch CSV with biological covariates

For more control, provide your own batch CSV with age and sex:

subject_id,batch,age,sex
sub-001,SiteA_Prisma_3T,45,M
sub-002,SiteA_Prisma_3T,52,F
sub-003,SiteB_SIGNA_15T,60,M
sub-004,SiteB_SIGNA_15T,38,F
sub-005,SiteA_Prisma_3T,71,M
sub-006,SiteB_SIGNA_15T,44,F
oncoprep bids_output/ derivatives/ group \
  --combat-batch /path/to/site_labels.csv

Age is treated as a continuous covariate and sex as categorical — both are preserved by ComBat (their variance is not removed).

Longitudinal datasets

For multi-session studies, OncoPrep automatically detects longitudinal data and handles it correctly. Use observation IDs in the batch CSV:

subject_id,batch,age,sex
sub-001_ses-01,SiteA_Prisma_3T,45,M
sub-001_ses-02,SiteA_Prisma_3T,46,M
sub-002_ses-01,SiteB_SIGNA_15T,52,F
sub-002_ses-02,SiteB_SIGNA_15T,53,F

Or let auto-generation handle it — --generate-combat-batch emits one row per subject × session automatically.

Inspect the report

Open the generated HTML report in your browser:

open derivatives/oncoprep/group_combat_report.html

The report shows:

  • Number of observations and scanner batches

  • Mean variance change (negative = batch variance reduced)

  • Batch distribution table

  • Longitudinal mode (if detected) with unique subject count

  • List of all harmonized output files

Python API

from pathlib import Path
from oncoprep.workflows.group import run_group_analysis

retcode = run_group_analysis(
    output_dir=Path("derivatives"),
    bids_dir=Path("bids_output"),
    generate_batch_csv=True,       # auto-generate from BIDS sidecars
    combat_parametric=True,        # parametric empirical Bayes (default)
)
assert retcode == 0

For full details, see Group-Level ComBat Harmonization.

Step 6 — Quality control with MRIQC (temporarily disabled)

Note: MRIQC integration is temporarily disabled in this release. The --run-qc flag is accepted but ignored. This section is preserved for reference and will be updated when MRIQC support is re-enabled.

Step 7 — Reports

Generate an HTML quality-assurance report:

oncoprep bids_output/ derivatives/ participant \
  --participant-label 001 --reports-only

Open the generated sub-001.html in your browser for a summary of preprocessing steps, registration quality, and segmentation overlays.

Using the Python API

For scripting and integration, you can build workflows directly:

from pathlib import Path
from oncoprep.workflows.base import init_oncoprep_wf

wf = init_oncoprep_wf(
    bids_dir=Path("bids_output"),
    output_dir=Path("derivatives"),
    subject_session_list=[("001", None)],
    work_dir=Path("work"),
    run_segmentation=True,
    default_seg=True,
)

# Run with 4 parallel processes
wf.run(plugin="MultiProc", plugin_args={"n_procs": 4})

Running individual workflows

You can also run sub-workflows in isolation:

from oncoprep.workflows.anatomical import init_anat_preproc_wf

anat_wf = init_anat_preproc_wf(
    bids_dir="/path/to/bids",
    output_dir="/path/to/derivatives",
    omp_nthreads=4,
    skull_strip_backend="ants",
)
anat_wf.run()
from oncoprep.workflows.radiomics import init_anat_radiomics_wf

radio_wf = init_anat_radiomics_wf(
    output_dir="/path/to/derivatives",
    extract_shape=True,
    extract_firstorder=True,
    extract_glcm=True,
    extract_glrlm=False,
    extract_glszm=False,
    extract_gldm=False,
    extract_ngtdm=False,
)
radio_wf.run()

End-to-End Reference

The following sections document every feature available through the oncoprep CLI, from minimal invocations to fully-loaded commands combining all pipeline stages.

Minimal invocation (preprocessing only)

oncoprep /path/to/bids /path/to/derivatives participant \
  --participant-label 001

Feature-by-feature breakdown

BIDS filtering

Select specific participants, sessions, or apply custom pybids filters:

# Multiple participants
oncoprep /data/bids /data/out participant \
  --participant-label 001 002 003

# Specific sessions
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --session-label 01 02

# Custom pybids filter file (JSON)
oncoprep /data/bids /data/out participant \
  --bids-filter-file my_filters.json

# Session-wise independent processing (each session treated separately)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --subject-anatomical-reference sessionwise

Example my_filters.json:

{
  "t1w": {"datatype": "anat", "suffix": "T1w", "extension": ".nii.gz"},
  "flair": {"datatype": "anat", "suffix": "FLAIR", "extension": ".nii.gz"}
}

Skull-stripping options

Three backends and multiple control modes:

# ANTs brain extraction (default)
oncoprep ... --skull-strip-backend ants

# HD-BET GPU-accelerated (requires pip install "oncoprep[hd-bet]")
oncoprep ... --skull-strip-backend hdbet

# FreeSurfer SynthStrip
oncoprep ... --skull-strip-backend synthstrip

# Force skull-stripping even on pre-stripped inputs
oncoprep ... --skull-strip-mode force

# Skip skull-stripping entirely
oncoprep ... --skull-strip-mode skip

# Fixed seed for deterministic reproduction (combine with --omp-nthreads 1)
oncoprep ... --skull-strip-fixed-seed --omp-nthreads 1

# Use a different skull-stripping template
oncoprep ... --skull-strip-template OASIS30ANTs

Registration options

# ANTs SyN registration (default, more accurate)
oncoprep ... --registration-backend ants

# PICSL Greedy (faster)
oncoprep ... --registration-backend greedy

# Register to a custom output space
oncoprep ... --output-spaces MNI152NLin2009cAsym

# Longitudinal mode (builds unbiased within-subject template)
oncoprep ... --longitudinal

Segmentation options

# Full multi-model ensemble (GPU required, 14 BraTS models)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-segmentation

# Single default model (CPU-friendly, faster)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-segmentation --default-seg

# Custom model path
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-segmentation --seg-model-path /path/to/model

# Force CPU-only (disable GPU)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-segmentation --no-gpu

# Use pre-cached model images (.sif or .tar)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-segmentation \
  --seg-cache-dir /path/to/seg_cache

# Choose container runtime explicitly
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-segmentation \
  --container-runtime singularity

Radiomics feature extraction

# Run radiomics (implies --run-segmentation automatically)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-radiomics

# Radiomics with single default model (CPU, fast)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --run-radiomics --default-seg

Quality control with MRIQC (temporarily disabled)

Note: MRIQC integration is temporarily disabled in this release. The --run-qc flag is accepted but ignored.

Privacy (defacing)

# Remove facial features from anatomical images
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --deface

TemplateFlow & offline mode

# Pre-fetch templates on a login node (for HPC)
oncoprep --fetch-templates \
  --templateflow-home /scratch/templateflow \
  --output-spaces MNI152NLin2009cAsym \
  --skull-strip-template OASIS30ANTs

# Run on a compute node without internet
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --templateflow-home /scratch/templateflow \
  --offline

Performance tuning

# Parallel processing
oncoprep ... --nprocs 12 --omp-nthreads 4

# Memory limit
oncoprep ... --mem-gb 48

# Low-memory mode (trades disk I/O for RAM)
oncoprep ... --low-mem

# Custom Nipype plugin (YAML config for SGE, PBS, SLURM)
oncoprep ... --use-plugin my_plugin.yml

# Resource monitoring (memory + CPU tracking)
oncoprep ... --resource-monitor

# Debug verbosity (-v = verbose, -vv = more, -vvv = debug)
oncoprep ... -vvv

Reports & debugging

# Generate only HTML reports (skip processing)
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --reports-only

# Include logs from a previous failed run in reports
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --reports-only --run-uuid 20260210-143022_abc123

# Export workflow graph as SVG
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --write-graph

# Stop immediately on first crash (for debugging)
oncoprep ... --stop-on-first-crash

# Generate boilerplate methods text only
oncoprep /data/bids /data/out participant \
  --participant-label 001 \
  --boilerplate

# Skip re-running existing derivatives
oncoprep ... --fast-track

Docker invocations

Basic preprocessing in Docker

docker run --platform linux/amd64 --rm \
  -v /path/to/bids:/data/bids:ro \
  -v /path/to/output:/data/output \
  -v /path/to/work:/data/work \
  nko11/oncoprep:latest \
  /data/bids /data/output participant \
  --participant-label 001

Docker with all features (GPU + segmentation + radiomics)

docker run --rm --gpus all \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v /path/to/bids:/data/bids:ro \
  -v /path/to/output:/data/output \
  -v /path/to/work:/data/work \
  nko11/oncoprep:latest \
  /data/bids /data/output participant \
  --participant-label 001 002 \
  --session-label 01 \
  --run-segmentation \
  --run-radiomics \
  --deface \
  --nprocs 8 \
  --mem-gb 32 \
  --work-dir /data/work \
  -vv

HPC / Singularity invocations

Output structure (all features enabled)

derivatives/
├── oncoprep/
│   ├── dataset_description.json
│   ├── logs/
│   │   └── CITATION.md
│   └── sub-001/
│       └── ses-01/
│           └── anat/
│               ├── sub-001_ses-01_desc-preproc_T1w.nii.gz
│               ├── sub-001_ses-01_desc-preproc_T1w.json
│               ├── sub-001_ses-01_desc-preproc_T1ce.nii.gz
│               ├── sub-001_ses-01_desc-preproc_T2w.nii.gz
│               ├── sub-001_ses-01_desc-preproc_FLAIR.nii.gz
│               ├── sub-001_ses-01_desc-brain_mask.nii.gz
│               ├── sub-001_ses-01_space-MNI152NLin2009cAsym_desc-preproc_T1w.nii.gz
│               ├── sub-001_ses-01_desc-tumor_dseg.nii.gz
│               ├── sub-001_ses-01_desc-radiomics_features.json
│               └── sub-001_ses-01_desc-defaced_T1w.nii.gz

Quick reference: all CLI flags

Flag

Purpose

Default

--participant-label

Subject IDs

All subjects

--session-label

Session IDs

All sessions

--bids-filter-file

Custom pybids filters

None

--subject-anatomical-reference

first-lex / unbiased / sessionwise

first-lex

--output-spaces

Template space(s)

MNI152NLin2009cAsym

--skull-strip-template

Skull-strip atlas

OASIS30ANTs

--skull-strip-backend

ants / hdbet / synthstrip

ants

--skull-strip-mode

auto / skip / force

auto

--skull-strip-fixed-seed

Deterministic seed

Off

--registration-backend

ants / greedy

ants

--longitudinal

Multi-session template

Off

--deface

Remove facial features

Off

--run-segmentation

Enable tumor segmentation

Off

--default-seg

Single model (CPU)

Off

--seg-model-path

Custom model path

None

--no-gpu

Force CPU-only

Off

--container-runtime

auto / docker / singularity / apptainer

auto

--seg-cache-dir

Pre-downloaded model cache

Auto

--run-radiomics

Feature extraction

Off

--combat-batch CSV

Custom batch CSV for group-level ComBat

None

--combat-parametric

Parametric empirical Bayes for ComBat

True

--combat-nonparametric

Non-parametric ComBat

Off

--generate-combat-batch

Auto-generate batch CSV from BIDS

Off

--run-qc

MRIQC quality control (temporarily disabled)

Off

--nprocs

CPU count

All available

--omp-nthreads

Threads per process

Auto

--mem-gb

Memory limit (GB)

Unlimited

--low-mem

Trade disk for memory

Off

--use-plugin

Custom Nipype plugin YAML

MultiProc

--work-dir

Working directory

./work

--templateflow-home

Template cache path

$TEMPLATEFLOW_HOME

--offline

Disable network access

Off

--fast-track

Reuse existing derivatives

Off

--reports-only

Skip processing

Off

--run-uuid

Previous run UUID for reports

None

--write-graph

Export DAG as SVG

Off

--boilerplate

Generate methods text

Off

--stop-on-first-crash

Abort on error

Off

--resource-monitor

Track CPU/memory

Off

--notrack

Disable telemetry

Off

--sloppy

Low-quality (testing only)

Off

-v / -vv / -vvv

Verbosity level

Standard

Troubleshooting

Docker “permission denied”

Make sure your user is in the docker group or run with sudo:

sudo usermod -aG docker $USER
# Log out and back in

“Illegal instruction” on Apple Silicon

The Docker image targets linux/amd64. On ARM Macs, AVX instructions may fail under Rosetta emulation. Use the native pip install for local development:

pip install -e ".[dev]"

Out of memory during segmentation

Reduce parallel processes or increase available memory:

oncoprep ... --nprocs 2 --mem-gb 8 --low-mem