Convert any neuroimaging dataset to BIDS format using LLM-powered intelligence
Uses LLM for semantic understanding of your dataset structure
Comprehensive workflow from data ingestion to BIDS validation
Works with any naming convention - no hardcoded rules
my_dataset/ ├── Beijing_sub82352/ │ ├── anat_mprage/ │ │ ├── scan.dcm │ │ └── scan_002.dcm │ └── func_rest/ │ ├── fmri_001.dcm │ └── fmri_002.dcm ├── Cambridge_sub06272/ │ ├── anat_mprage/ │ │ └── scan.dcm │ └── func_rest/ │ └── fmri_001.dcm ├── Beijing_sub19283/ │ └── ... └── README.txt
Issues:
bids_dataset/ ├── dataset_description.json ├── README.md ├── participants.tsv ├── sub-Beijing82352/ │ ├── anat/ │ │ └── sub-Beijing82352_T1w.nii.gz │ └── func/ │ └── sub-Beijing82352_task-rest_bold.nii.gz ├── sub-Cambridge06272/ │ ├── anat/ │ │ └── sub-Cambridge06272_T1w.nii.gz │ └── func/ │ └── sub-Cambridge06272_task-rest_bold.nii.gz ├── sub-Beijing19283/ │ └── ... └── README.txt BIDS v1.10.0, NIfTI format
Improvements:
[1/9] Ingesting data...
Run the complete conversion in one command. The pipeline automatically executes all 9 stages from data ingestion to BIDS validation.
Organize your neuroimaging data in a single directory:
my_dataset/ ├── subject1/ │ ├── anat/ │ │ ├── T1w_001.dcm │ │ └── ... │ └── func/ │ ├── fmri_001.dcm │ └── ... ├── subject2/ ├── README.txt # Optional └── protocol.pdf # Optional
Execute the full pipeline with a single command:
⚠️ Required: The --describe parameter is mandatory. Provide a clear description of your dataset to help the AI understand its structure.
💡 Highly Recommended: Add --nsubjects N if you know the number of subjects for much faster and more accurate processing.
Watch the 9-stage pipeline execute:
Your standardized dataset is ready:
outputs/run1/bids_compatible/ ├── dataset_description.json ├── README.md ├── participants.tsv ├── sub-001/ │ ├── anat/sub-001_T1w.nii.gz │ └── func/sub-001_task-rest_bold.nii.gz └── ...
Execute each stage individually for fine-grained control. Perfect for debugging, customization, or learning the pipeline workflow. The pipeline consists of 7 stages (or 8 for mixed-modality datasets).
Copy and organize input files into the staging area:
Output: _staging/extracted/
Files copied, directory structure preserved
Created file: ingest_info.json
Contains file inventory and basic statistics
Analyze dataset structure and extract metadata from all sources:
Output: evidence_bundle.json
Comprehensive dataset analysis including:
Classify files by modality using LLM (skip for single-modality datasets):
Output: classification_plan.json
Separates files into modality pools:
mri_pool/ - MRI filesnirs_pool/ - fNIRS filesunknown/ - Ambiguous filesGenerate the three required BIDS metadata files:
# Generate all three files python cli.py trio --output outputs/run1/ --model gpt-4o # Or generate individually python cli.py trio --output outputs/run1/ --model gpt-4o --trio dataset_description python cli.py trio --output outputs/run1/ --model gpt-4o --trio readme python cli.py trio --output outputs/run1/ --model gpt-4o --trio participants
Output files:
dataset_description.json - Dataset metadata (Name, BIDSVersion, License, Authors)README.md - Comprehensive dataset documentationparticipants.tsv - Subject demographics (participant_id, sex, age, group)Generated using LLM with evidence from documents and DICOM headers
Create detailed file-by-file conversion plan:
Output: BIDSPlan.yaml
Contains comprehensive conversion instructions:
Execute the conversion plan and build BIDS structure:
Output: bids_compatible/ directory
Performs the following operations:
Created files: conversion_log.json, BIDSManifest.yaml
Validate BIDS compliance and report issues:
Validation Checks:
Uses official bids-validator if installed, otherwise performs internal checks
Install Auto-BIDSify from PyPI or GitHub:
📦 Via pip (Recommended):
🔗 From GitHub:
💻 For development:
git clone https://github.com/yiyiliu-rose/autobidsify.git cd autobidsify pip install -e .
Requirements: Python 3.8+, pip
If you want to use OpenAI models (like GPT-4o), set your OpenAI API key as an environment variable:
💡 Note: This step is only required if you plan to use OpenAI's LLM models. You can skip this if using other supported models.
Linux/Mac:
Windows (CMD):
Windows (PowerShell):
Or add to ~/.bashrc for persistence:
🔑 Get your API key: OpenAI Platform
Once installed, you can use Auto-BIDSify in two ways:
Run the complete conversion in one command - perfect for most users
Execute stages individually for debugging and customization
💡 Tip: Start with the End-to-End Pipeline for quick results. Switch to Step-by-Step Mode if you need more control.
Run complete end-to-end pipeline
| Option | Status | Description | Example |
|---|---|---|---|
--input | ✓ Required | Input dataset directory | my_data/ |
--output | ✓ Required | Output directory | outputs/run1/ |
--describe | ✓ Required | Dataset description (mandatory) | "fMRI study" |
--nsubjects | ⭐ Highly Recommended | Number of subjects (improves accuracy) | 10 |
--model | LLM model (default: qwen) | gpt-4o | |
--modality | Data type: auto/mri/nirs/mixed | mri | |
--id-strategy | Subject ID: auto/numeric/semantic | auto |
• python cli.py ingest --input INPUT --output OUTPUT• python cli.py evidence --output OUTPUT• python cli.py classify --output OUTPUT• python cli.py trio --output OUTPUT --model MODEL [--trio TYPE]• python cli.py plan --output OUTPUT --model MODEL• python cli.py execute --output OUTPUT• python cli.py validate --output OUTPUTReal-world fNIRS dataset from Harvard Dataverse investigating resting-state connectivity in tinnitus patients
Authors: San Juan J, Hu X-S, Issa M, Bisconti S, Kovelman I, Kileny P
Published: PLoS ONE 2017, 12(6): e0179150
DOI: 10.7910/DVN/ZNZZBV
Modality: fNIRS (Homer3 .nirs)
Subjects: 13 participants
License: Public Domain (PD)
harvard_dataverse/ ├── BZZ003.nirs ├── BZZ004.nirs ├── BZZ005.nirs ├── BZZ007.nirs ├── BZZ008.nirs ├── ... └── BZZ028.nirs 13 files, flat structure
Issues:
tinnitus_fnirs_rsfc/ ├── dataset_description.json ├── README.md ├── participants.tsv ├── sub-03/ │ └── nirs/sub-03_task-passive-listening_nirs.snirf ├── sub-04/ │ └── nirs/sub-04_task-passive-listening_nirs.snirf ├── sub-05/ │ └── nirs/sub-05_task-passive-listening_nirs.snirf ├── ... └── sub-28/ └── nirs/sub-28_task-passive-listening_nirs.snirf 13 subjects, BIDS v1.10.0, SNIRF format
Improvements:
{
"Name": "Replication Data for: fNIRS RSFC in Tinnitus",
"BIDSVersion": "1.10.0",
"DatasetType": "raw",
"License": "PD",
"Authors": ["San Juan J", "Hu X-S", "Issa M", "Bisconti S", "Kovelman I", "Kileny P"]
}| participant_id | original_id |
|---|---|
| sub-03 | BZZ003 |
| sub-04 | BZZ004 |
| sub-05 | BZZ005 |
| ... (10 more) | |
# README for BIDS Dataset: fNIRS RSFC in Tinnitus ## Overview This dataset contains replication data for investigating tinnitus effects on resting state functional connectivity using fNIRS. ... ## Dataset Description - **Title**: Replication Data for: fNIRS RSFC in Tinnitus - **Authors**: San Juan J, Hu X-S, Issa M, et al. - **DOI**: 10.7910/DVN/ZNZZBV - **Journal**: PLoS ONE 12(6): e0179150 (2017) - **Subjects**: 13 participants ... ## Data Acquisition fNIRS measuring brain activity during resting state in tinnitus patients. ... ## References San Juan J, et al. (2017) PLoS ONE 12(6): e0179150 ...
Large-scale multi-site MRI dataset from the Cambridge Centre for Ageing and Neuroscience
Study: Cambridge Centre for Ageing and Neuroscience
Sites: Beijing, Cambridge, ...
Source: CamCAN Open Data
Modality: MRI (DICOM → NIfTI)
Subjects: 3,763 participants (multi-site cohort)
Scans: T1-weighted anatomy + resting-state fMRI
License: CC-BY-4.0
my_dataset/ ├── Beijing_sub82352/ │ ├── anat_mprage/scan.dcm │ └── func_rest/fmri.dcm ├── Cambridge_sub06272/ ├── Beijing_sub19283/ └── ... 3,763 subjects, DICOM format
Issues:
bids_compatible/ ├── dataset_description.json ├── README.md ├── participants.tsv ├── sub-Beijing82352/ │ ├── anat/sub-Beijing82352_T1w.nii.gz │ └── func/sub-Beijing82352_task-rest_bold.nii.gz ├── sub-Cambridge06272/ ├── sub-Beijing19283/ └── ... 3,763 subjects, NIfTI format
Improvements:
{
"Name": "CamCAN Multi-Site Study",
"BIDSVersion": "1.10.0",
"License": "CC-BY-4.0",
"Authors": ["Research Team"]
}| participant_id | site | original_id |
|---|---|---|
| sub-Beijing82352 | Beijing | Beijing_sub82352 |
| sub-Cambridge06272 | Cambridge | Cambridge_sub06272 |
| sub-Beijing19283 | Beijing | Beijing_sub19283 |
| ... (3,760 more) | ||
💡 ID Strategy: Original naming was site_subXXXXX. Since different sites could have overlapping IDs, Auto-BIDSify renamed to sub-siteXXXXX for global uniqueness.
# Cambridge Centre for Ageing and Neuroscience (CamCAN) Dataset ## Overview The CamCAN project investigates how individuals maintain cognitive abilities with age, integrating epidemiological, cognitive, and neuroimaging data across five phases. ... ## Dataset Description Multi-phase neuroimaging study with ~700 participants: - **Phase 1**: Demographics, health, cognitive data (~2700 adults, 2010-2012) - **Phase 2**: Detailed cognitive, MRI, MEG data (CC700: 2011-2013) - **Phase 3**: Repeat MRI/MEG scans (CC280: 2012-2014) ... ## Data Acquisition **MRI Modalities**: T1, T2, DWI, resting-state fMRI, task fMRI **Imaging Parameters**: - T1 MPRAGE: TR 2250ms, TE 2.99ms, 1mm isotropic - fMRI: TR 1970ms, TE 30ms, 3×3×4.44mm - DWI: 30 directions, b=0,1000,2000 ... ## File Organization Organized as BIDS repositories by modality: - Phase 2 Arm 1 (CC700) Raw MRI/MEG - Phase 2 Arm 2 (Frail) Raw MRI/MEG - Phase 3 (CC280) Raw MRI/MEG ... ## Usage Notes Non-commercial research only. Proper acknowledgment required. ... ## References Shafto et al. (2014) BMC Neurology 14(204) doi: 10.1186/s12883-014-0204-1 ...
| Feature | fNIRS | MRI |
|---|---|---|
| Input | Homer3 .nirs | DICOM |
| Output | .snirf (BIDS) | NIfTI |
| Subjects | 13 (non-consecutive) | 3,763 (multi-site) |
| ID Strategy | Numeric (sub-03, sub-04...) | Semantic (sub-Beijing82352...) |
| Conversion | Re-organization | DICOM→NIfTI |