Tutorial: End-to-End Neuro-Oncology Preprocessing¶
This tutorial walks through a complete OncoPrep workflow — from raw DICOMs to preprocessed derivatives and tumor segmentation — using a single subject.
Prerequisites¶
pip install oncoprep
You will also need:
A directory of DICOM files for one or more subjects
Docker installed (for the segmentation step)
~4 GB free disk space for intermediate files
Step 1 — DICOM to BIDS conversion¶
Organize your raw data so each subject has its own directory:
raw_dicoms/
└── 001/
├── T1_MPRAGE_SAG_P2_1_0_ISO_0032/
├── T1_MPRAGE_SAG_P2_1_0_ISO_POST_0071/
├── T2_SPC_DA-FL_SAG_P2_1_0_0012/
└── COR_FLAIR_0103/
Run the conversion:
oncoprep-convert raw_dicoms/ bids_output/ --subject 001
The output is a valid BIDS dataset:
bids_output/
├── dataset_description.json
└── sub-001/
└── anat/
├── sub-001_T1w.nii.gz
├── sub-001_T1w.json
├── sub-001_ce-gadolinium_T1w.nii.gz
├── sub-001_T2w.nii.gz
└── sub-001_FLAIR.nii.gz
Tip
For batch conversion of many subjects, run without the --subject flag:
oncoprep-convert raw_dicoms/ bids_output/
Step 2 — Preprocessing¶
Run the anatomical preprocessing pipeline:
oncoprep bids_output/ derivatives/ participant \
--participant-label 001 \
--nprocs 4
This will:
Validate the BIDS dataset structure
Conform images to 1 mm isotropic resolution
Register T1ce, T2w, and FLAIR to the T1w reference image
Skull-strip using ANTs brain extraction
Normalize to MNI152NLin2009cAsym template space
Write all outputs as BIDS derivatives
When segmentation is enabled (Step 3 below), template normalization is
deferred until after the tumor mask is available. The dilated
whole-tumor mask is used as a cost-function exclusion region (-x) for
ANTs SyN, preventing pathological tissue from distorting the warp.
Note
Processing a single subject typically takes 15–30 minutes on 4 cores.
Choosing a skull-stripping backend¶
OncoPrep supports three backends:
# ANTs brain extraction (default)
oncoprep ... --skull-strip-backend ants
# HD-BET (GPU-accelerated, requires pip install "oncoprep[hd-bet]")
oncoprep ... --skull-strip-backend hdbet
# FreeSurfer SynthStrip
oncoprep ... --skull-strip-backend synthstrip
Choosing a registration backend¶
# ANTs SyN (default, slower, more accurate)
oncoprep ... --registration-backend ants
# PICSL Greedy (faster)
oncoprep ... --registration-backend greedy
Step 3 — Tumor segmentation¶
OncoPrep ships two segmentation backends. The default
(--default-seg) uses nnInteractive, a zero-shot 3D promptable foundation
model that requires no Docker containers — just a ~400 MB model checkpoint that
downloads automatically on first use.
nnInteractive (default, CPU or GPU)¶
oncoprep bids_output/ derivatives/ participant \
--participant-label 001 \
--run-segmentation --default-seg
nnInteractive (Isensee et al., 2025; arXiv:2503.08373) was trained on 120+ diverse 3D datasets and performs zero-shot inference on glioma MRI — it has never seen BraTS training data. OncoPrep fully automates the prompting step by deriving seed points from multi-modal intensity anomalies (see Segmentation for the algorithm). Runs on CUDA, Apple Silicon (MPS), or CPU.
Docker ensemble (GPU required)¶
oncoprep bids_output/ derivatives/ participant \
--participant-label 001 \
--run-segmentation
This runs all 14 BraTS models and fuses their predictions using majority voting. Requires a GPU with CUDA support. Takes ~30–60 minutes.
The output is a discrete segmentation label map:
derivatives/oncoprep/sub-001/anat/sub-001_desc-tumor_dseg.nii.gz
With BraTS labels:
Label |
Region |
|---|---|
1 |
Necrotic Core (NCR) |
2 |
Peritumoral Edema (ED) |
3 |
Enhancing Tumor (ET) |
4 |
Resection Cavity (RC) |
Step 4 — Radiomics (optional)¶
Extract quantitative imaging features from the segmentation:
pip install "oncoprep[radiomics]"
oncoprep bids_output/ derivatives/ participant \
--participant-label 001 \
--run-radiomics --default-seg
--run-radiomics implies --run-segmentation, so you don’t need both flags.
Output:
derivatives/oncoprep/sub-001/anat/sub-001_desc-radiomics_features.json
The JSON contains features for each tumor region (NCR, ED, ET, WT, TC) across feature classes (shape, first-order, GLCM, GLRLM, GLSZM, GLDM, NGTDM).
Step 5 — Group-Level ComBat Harmonization (multi-site studies)¶
If your study includes subjects scanned on different scanners or at different sites, radiomics features will contain systematic batch effects. OncoPrep’s group-level ComBat harmonization removes these effects while preserving biological covariates.
Prerequisites¶
Participant-level radiomics must be complete for all subjects (at least 3 subjects across at least 2 scanner batches)
Install the radiomics extras:
pip install "oncoprep[radiomics]"
Quick start: auto-generate batch labels¶
The simplest approach lets OncoPrep derive scanner batch labels from the BIDS JSON sidecars (Manufacturer, ManufacturerModelName, MagneticFieldStrength — fields that survive anonymization):
# Step 1: Run participant-level radiomics for all subjects
for subj in 001 002 003 004 005 006; do
oncoprep bids_output/ derivatives/ participant \
--participant-label $subj \
--run-radiomics --default-seg
done
# Step 2: Run group-level ComBat harmonization
oncoprep bids_output/ derivatives/ group \
--generate-combat-batch
This generates:
derivatives/oncoprep/combat_batch.csv— the auto-generated batch CSVderivatives/oncoprep/sub-XXX/anat/sub-XXX_desc-radiomicsCombat_features.json— harmonized features for each subjectderivatives/oncoprep/group_combat_report.html— summary report
Custom batch CSV with biological covariates¶
For more control, provide your own batch CSV with age and sex:
subject_id,batch,age,sex
sub-001,SiteA_Prisma_3T,45,M
sub-002,SiteA_Prisma_3T,52,F
sub-003,SiteB_SIGNA_15T,60,M
sub-004,SiteB_SIGNA_15T,38,F
sub-005,SiteA_Prisma_3T,71,M
sub-006,SiteB_SIGNA_15T,44,F
oncoprep bids_output/ derivatives/ group \
--combat-batch /path/to/site_labels.csv
Age is treated as a continuous covariate and sex as categorical — both are preserved by ComBat (their variance is not removed).
Longitudinal datasets¶
For multi-session studies, OncoPrep automatically detects longitudinal data and handles it correctly. Use observation IDs in the batch CSV:
subject_id,batch,age,sex
sub-001_ses-01,SiteA_Prisma_3T,45,M
sub-001_ses-02,SiteA_Prisma_3T,46,M
sub-002_ses-01,SiteB_SIGNA_15T,52,F
sub-002_ses-02,SiteB_SIGNA_15T,53,F
Or let auto-generation handle it — --generate-combat-batch emits one
row per subject × session automatically.
Inspect the report¶
Open the generated HTML report in your browser:
open derivatives/oncoprep/group_combat_report.html
The report shows:
Number of observations and scanner batches
Mean variance change (negative = batch variance reduced)
Batch distribution table
Longitudinal mode (if detected) with unique subject count
List of all harmonized output files
Python API¶
from pathlib import Path
from oncoprep.workflows.group import run_group_analysis
retcode = run_group_analysis(
output_dir=Path("derivatives"),
bids_dir=Path("bids_output"),
generate_batch_csv=True, # auto-generate from BIDS sidecars
combat_parametric=True, # parametric empirical Bayes (default)
)
assert retcode == 0
For full details, see Group-Level ComBat Harmonization.
Step 6 — Quality control with MRIQC (temporarily disabled)¶
Note: MRIQC integration is temporarily disabled in this release. The
--run-qcflag is accepted but ignored. This section is preserved for reference and will be updated when MRIQC support is re-enabled.
Step 7 — Reports¶
Generate an HTML quality-assurance report:
oncoprep bids_output/ derivatives/ participant \
--participant-label 001 --reports-only
Open the generated sub-001.html in your browser for a summary of
preprocessing steps, registration quality, and segmentation overlays.
Using the Python API¶
For scripting and integration, you can build workflows directly:
from pathlib import Path
from oncoprep.workflows.base import init_oncoprep_wf
wf = init_oncoprep_wf(
bids_dir=Path("bids_output"),
output_dir=Path("derivatives"),
subject_session_list=[("001", None)],
work_dir=Path("work"),
run_segmentation=True,
default_seg=True,
)
# Run with 4 parallel processes
wf.run(plugin="MultiProc", plugin_args={"n_procs": 4})
Running individual workflows¶
You can also run sub-workflows in isolation:
from oncoprep.workflows.anatomical import init_anat_preproc_wf
anat_wf = init_anat_preproc_wf(
bids_dir="/path/to/bids",
output_dir="/path/to/derivatives",
omp_nthreads=4,
skull_strip_backend="ants",
)
anat_wf.run()
from oncoprep.workflows.radiomics import init_anat_radiomics_wf
radio_wf = init_anat_radiomics_wf(
output_dir="/path/to/derivatives",
extract_shape=True,
extract_firstorder=True,
extract_glcm=True,
extract_glrlm=False,
extract_glszm=False,
extract_gldm=False,
extract_ngtdm=False,
)
radio_wf.run()
End-to-End Reference¶
The following sections document every feature available through the
oncoprep CLI, from minimal invocations to fully-loaded commands
combining all pipeline stages.
Minimal invocation (preprocessing only)¶
oncoprep /path/to/bids /path/to/derivatives participant \
--participant-label 001
Full-featured invocation (all features enabled)¶
This command runs the complete pipeline with every major feature flag:
oncoprep /path/to/bids /path/to/derivatives participant \
--participant-label 001 002 \
--session-label 01 02 \
--bids-filter-file /path/to/filters.json \
--subject-anatomical-reference first-lex \
--output-spaces MNI152NLin2009cAsym \
--skull-strip-template OASIS30ANTs \
--skull-strip-backend ants \
--skull-strip-mode auto \
--skull-strip-fixed-seed \
--registration-backend ants \
--longitudinal \
--deface \
--run-segmentation \
--run-radiomics \
--container-runtime auto \
--seg-cache-dir /path/to/seg_cache \
--templateflow-home /path/to/templateflow \
--work-dir /path/to/work \
--nprocs 8 \
--omp-nthreads 4 \
--mem-gb 32 \
--resource-monitor \
--stop-on-first-crash \
--write-graph \
-vvv
Feature-by-feature breakdown¶
BIDS filtering¶
Select specific participants, sessions, or apply custom pybids filters:
# Multiple participants
oncoprep /data/bids /data/out participant \
--participant-label 001 002 003
# Specific sessions
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--session-label 01 02
# Custom pybids filter file (JSON)
oncoprep /data/bids /data/out participant \
--bids-filter-file my_filters.json
# Session-wise independent processing (each session treated separately)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--subject-anatomical-reference sessionwise
Example my_filters.json:
{
"t1w": {"datatype": "anat", "suffix": "T1w", "extension": ".nii.gz"},
"flair": {"datatype": "anat", "suffix": "FLAIR", "extension": ".nii.gz"}
}
Skull-stripping options¶
Three backends and multiple control modes:
# ANTs brain extraction (default)
oncoprep ... --skull-strip-backend ants
# HD-BET GPU-accelerated (requires pip install "oncoprep[hd-bet]")
oncoprep ... --skull-strip-backend hdbet
# FreeSurfer SynthStrip
oncoprep ... --skull-strip-backend synthstrip
# Force skull-stripping even on pre-stripped inputs
oncoprep ... --skull-strip-mode force
# Skip skull-stripping entirely
oncoprep ... --skull-strip-mode skip
# Fixed seed for deterministic reproduction (combine with --omp-nthreads 1)
oncoprep ... --skull-strip-fixed-seed --omp-nthreads 1
# Use a different skull-stripping template
oncoprep ... --skull-strip-template OASIS30ANTs
Registration options¶
# ANTs SyN registration (default, more accurate)
oncoprep ... --registration-backend ants
# PICSL Greedy (faster)
oncoprep ... --registration-backend greedy
# Register to a custom output space
oncoprep ... --output-spaces MNI152NLin2009cAsym
# Longitudinal mode (builds unbiased within-subject template)
oncoprep ... --longitudinal
Segmentation options¶
# Full multi-model ensemble (GPU required, 14 BraTS models)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-segmentation
# Single default model (CPU-friendly, faster)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-segmentation --default-seg
# Custom model path
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-segmentation --seg-model-path /path/to/model
# Force CPU-only (disable GPU)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-segmentation --no-gpu
# Use pre-cached model images (.sif or .tar)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-segmentation \
--seg-cache-dir /path/to/seg_cache
# Choose container runtime explicitly
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-segmentation \
--container-runtime singularity
Radiomics feature extraction¶
# Run radiomics (implies --run-segmentation automatically)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-radiomics
# Radiomics with single default model (CPU, fast)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--run-radiomics --default-seg
Quality control with MRIQC (temporarily disabled)¶
Note: MRIQC integration is temporarily disabled in this release. The
--run-qcflag is accepted but ignored.
Privacy (defacing)¶
# Remove facial features from anatomical images
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--deface
TemplateFlow & offline mode¶
# Pre-fetch templates on a login node (for HPC)
oncoprep --fetch-templates \
--templateflow-home /scratch/templateflow \
--output-spaces MNI152NLin2009cAsym \
--skull-strip-template OASIS30ANTs
# Run on a compute node without internet
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--templateflow-home /scratch/templateflow \
--offline
Performance tuning¶
# Parallel processing
oncoprep ... --nprocs 12 --omp-nthreads 4
# Memory limit
oncoprep ... --mem-gb 48
# Low-memory mode (trades disk I/O for RAM)
oncoprep ... --low-mem
# Custom Nipype plugin (YAML config for SGE, PBS, SLURM)
oncoprep ... --use-plugin my_plugin.yml
# Resource monitoring (memory + CPU tracking)
oncoprep ... --resource-monitor
# Debug verbosity (-v = verbose, -vv = more, -vvv = debug)
oncoprep ... -vvv
Reports & debugging¶
# Generate only HTML reports (skip processing)
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--reports-only
# Include logs from a previous failed run in reports
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--reports-only --run-uuid 20260210-143022_abc123
# Export workflow graph as SVG
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--write-graph
# Stop immediately on first crash (for debugging)
oncoprep ... --stop-on-first-crash
# Generate boilerplate methods text only
oncoprep /data/bids /data/out participant \
--participant-label 001 \
--boilerplate
# Skip re-running existing derivatives
oncoprep ... --fast-track
Docker invocations¶
Basic preprocessing in Docker¶
docker run --platform linux/amd64 --rm \
-v /path/to/bids:/data/bids:ro \
-v /path/to/output:/data/output \
-v /path/to/work:/data/work \
nko11/oncoprep:latest \
/data/bids /data/output participant \
--participant-label 001
Docker with all features (GPU + segmentation + radiomics)¶
docker run --rm --gpus all \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /path/to/bids:/data/bids:ro \
-v /path/to/output:/data/output \
-v /path/to/work:/data/work \
nko11/oncoprep:latest \
/data/bids /data/output participant \
--participant-label 001 002 \
--session-label 01 \
--run-segmentation \
--run-radiomics \
--deface \
--nprocs 8 \
--mem-gb 32 \
--work-dir /data/work \
-vv
HPC / Singularity invocations¶
Full-featured Singularity run (PBS)¶
#!/bin/bash
#PBS -l ncpus=12,mem=48GB,walltime=04:00:00,jobfs=100GB
#PBS -l storage=gdata/$PROJECT+scratch/$PROJECT
#PBS -l ngpus=1
#PBS -l wd
module load singularity
SEG_CACHE=/scratch/$PROJECT/$USER/seg_cache
TF_HOME=/scratch/$PROJECT/$USER/templateflow
singularity run --nv \
--bind $SEG_CACHE:/seg_cache \
--bind $TF_HOME:/templateflow \
--bind /scratch/$PROJECT/$USER/bids:/data/bids:ro \
--bind /scratch/$PROJECT/$USER/derivatives:/data/output \
--bind $PBS_JOBFS:/work \
/scratch/$PROJECT/$USER/oncoprep.sif \
/data/bids /data/output participant \
--participant-label 001 \
--run-segmentation \
--run-radiomics \
--deface \
--container-runtime singularity \
--seg-cache-dir /seg_cache \
--templateflow-home /templateflow \
--offline \
--work-dir /work \
--nprocs $PBS_NCPUS \
--mem-gb 48 \
--omp-nthreads 4 \
--stop-on-first-crash \
-vv
Full-featured Singularity run (SLURM)¶
#!/bin/bash
#SBATCH --job-name=oncoprep
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=12
#SBATCH --mem=48G
#SBATCH --time=04:00:00
#SBATCH --gres=gpu:1
module load singularity
SEG_CACHE=/scratch/$USER/seg_cache
TF_HOME=/scratch/$USER/templateflow
singularity run --nv \
--bind $SEG_CACHE:/seg_cache \
--bind $TF_HOME:/templateflow \
--bind /scratch/$USER/bids:/data/bids:ro \
--bind /scratch/$USER/derivatives:/data/output \
--bind $TMPDIR:/work \
/scratch/$USER/oncoprep.sif \
/data/bids /data/output participant \
--participant-label 001 \
--run-segmentation \
--run-radiomics \
--deface \
--container-runtime singularity \
--seg-cache-dir /seg_cache \
--templateflow-home /templateflow \
--offline \
--work-dir /work \
--nprocs $SLURM_CPUS_PER_TASK \
--mem-gb 48 \
--omp-nthreads 4 \
-vv
Python API: full-featured invocation¶
from pathlib import Path
from oncoprep.workflows.base import init_oncoprep_wf
wf = init_oncoprep_wf(
bids_dir=Path("/data/bids"),
output_dir=Path("/data/derivatives"),
subject_session_list=[("001", ["01", "02"]), ("002", None)],
work_dir=Path("/data/work"),
run_uuid="20260211-120000_manual",
omp_nthreads=4,
nprocs=12,
mem_gb=48,
skull_strip_template="OASIS30ANTs",
skull_strip_fixed_seed=True,
skull_strip_mode="auto",
skull_strip_backend="ants",
registration_backend="ants",
longitudinal=False,
output_spaces=["MNI152NLin2009cAsym"],
use_gpu=True,
deface=True,
run_segmentation=True,
run_radiomics=True,
default_seg=False,
seg_model_path=None,
sloppy=False,
container_runtime="auto",
seg_cache_dir=Path("/data/seg_cache"),
)
wf.run(plugin="MultiProc", plugin_args={"n_procs": 12})
Output structure (all features enabled)¶
derivatives/
├── oncoprep/
│ ├── dataset_description.json
│ ├── logs/
│ │ └── CITATION.md
│ └── sub-001/
│ └── ses-01/
│ └── anat/
│ ├── sub-001_ses-01_desc-preproc_T1w.nii.gz
│ ├── sub-001_ses-01_desc-preproc_T1w.json
│ ├── sub-001_ses-01_desc-preproc_T1ce.nii.gz
│ ├── sub-001_ses-01_desc-preproc_T2w.nii.gz
│ ├── sub-001_ses-01_desc-preproc_FLAIR.nii.gz
│ ├── sub-001_ses-01_desc-brain_mask.nii.gz
│ ├── sub-001_ses-01_space-MNI152NLin2009cAsym_desc-preproc_T1w.nii.gz
│ ├── sub-001_ses-01_desc-tumor_dseg.nii.gz
│ ├── sub-001_ses-01_desc-radiomics_features.json
│ └── sub-001_ses-01_desc-defaced_T1w.nii.gz
Quick reference: all CLI flags¶
Flag |
Purpose |
Default |
|---|---|---|
|
Subject IDs |
All subjects |
|
Session IDs |
All sessions |
|
Custom pybids filters |
None |
|
|
|
|
Template space(s) |
|
|
Skull-strip atlas |
|
|
|
|
|
|
|
|
Deterministic seed |
Off |
|
|
|
|
Multi-session template |
Off |
|
Remove facial features |
Off |
|
Enable tumor segmentation |
Off |
|
Single model (CPU) |
Off |
|
Custom model path |
None |
|
Force CPU-only |
Off |
|
|
|
|
Pre-downloaded model cache |
Auto |
|
Feature extraction |
Off |
|
Custom batch CSV for group-level ComBat |
None |
|
Parametric empirical Bayes for ComBat |
|
|
Non-parametric ComBat |
Off |
|
Auto-generate batch CSV from BIDS |
Off |
|
MRIQC quality control (temporarily disabled) |
Off |
|
CPU count |
All available |
|
Threads per process |
Auto |
|
Memory limit (GB) |
Unlimited |
|
Trade disk for memory |
Off |
|
Custom Nipype plugin YAML |
MultiProc |
|
Working directory |
|
|
Template cache path |
|
|
Disable network access |
Off |
|
Reuse existing derivatives |
Off |
|
Skip processing |
Off |
|
Previous run UUID for reports |
None |
|
Export DAG as SVG |
Off |
|
Generate methods text |
Off |
|
Abort on error |
Off |
|
Track CPU/memory |
Off |
|
Disable telemetry |
Off |
|
Low-quality (testing only) |
Off |
|
Verbosity level |
Standard |
Troubleshooting¶
Docker “permission denied”¶
Make sure your user is in the docker group or run with sudo:
sudo usermod -aG docker $USER
# Log out and back in
“Illegal instruction” on Apple Silicon¶
The Docker image targets linux/amd64. On ARM Macs, AVX instructions may
fail under Rosetta emulation. Use the native pip install for local
development:
pip install -e ".[dev]"
Out of memory during segmentation¶
Reduce parallel processes or increase available memory:
oncoprep ... --nprocs 2 --mem-gb 8 --low-mem