DataLad datasets: Advanced features#
Dicom sorting#
Because DataLad automatically unpacks the DICOM tar archives, a cloned visit dataset will have a structure similar to the following:
$ datalad tree
[DS~0]
├── P001234_P001234/
│ └── incoming/
│ └── 3T/
│ └── 2_ABC_DEF/
│ └── P001234/
└── icf/
The P001234
directory contains DICOM files in a flat
hierarchy, and the icf
directory contains the original tar
archive.
The dataset exposes a few select DICOM header fields (with information
describing its corresponding DICOM series) as git-annex metadata.
You can preview available metadata for a selected file with git annex
metadata
(subset shown):
$ git annex metadata <file name>
metadata <file name>
Modality=MR
ProtocolName=ep2d_diff_dir98_AP
PulseSequenceName=*epse2d1_140
SeriesDescription=ep2d_diff_dir98_AP
SeriesNumber=10
These metadata can be used to organize DICOM files according to a
logical structure. For example, grouping by all available (as
specified with *
) protocol names and series numbers:
$ git annex view "protocolname=*" "seriesnumber=*"
view (searching...)
Switched to branch 'views/protocolname=_;seriesnumber=_'
ok
❱ datalad tree
[DS~0]
(...)
├── t1_mprage_0.9mm/
│ └── 6/
├── t2w_space_0.9mm/
│ └── 7/
└── tfMRI_tapping
└── 4/
└── 5/
The view can be filtered, e.g. to only show anatomical (T1 or T2,
as specified with t[12]
) sequences:
$ git annex vfilter "ProtocolName=t[12]*"
$ datalad tree
[DS~0]
├── t1_mprage_0.9mm/
│ └── 6/
└── t2w_space_0.9mm/
└── 7/
Order of the components can be inverted using vcycle:
$ git annex vcycle
$ datalad tree
[DS~0]
├── 6/
│ └── t1_mprage_0.9mm/
└── 7/
└── t2w_space_0.9mm/
Previous views, and the starting branch, can be restored with vpop:
$ git annex vpop
As the operations only create views (and the annexed data organization remains the same), these operations are very fast.
For more information, refer to the git-annex-view documentation.