Access data with DataLad#
This section describes accessing the ICF data by cloning DataLad datasets which have already been created and made available, most likely on local infrastructure. Dataset generation is described in the previous section, Generate DataLad datasets.
This workflow uses DataLad with DataLad-Next extension (see DataLad requirements). DataLad datasets index data in their original (ICF) location. Obtaining data hosted in the ICF store requires access credentials for a given study, issued by the ICF. DataLad acts only as a client software. See Manage DataLad credentials for details.
Clone & get#
If a visit dataset has been prepared and placed in an accessible location, it can be cloned with DataLad from a URL containing the following components:
a set of configuration parameters, always constant
store base URL (e.g.,
file:///data/group/groupname/local_dicom_store) [1]study ID (e.g.,
my-study)visit ID (e.g.,
P000123)a file name suffix / template,
_annex{{annex_key}}(verbatim), always constant
The pattern for the URL is:
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=<store base URL>/<study ID>/<visit ID>_{{annex_key}}'
Given the exemplary values above, the pattern would expand to:
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}'
A full datalad clone command could then look like this:
datalad clone 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///tmp/local_dicom_store/my-study/P000123_{{annex_key}}' my_clone
Note
The clone command will not fail if the datalad-annex:: URL
points to a nonexisting target. If you see the following warning:
[WARNING] You appear to have cloned an empty repository.
[WARNING] Cloned /path/to/my_clone but could not find a branch with commits
it is likely that the provided URL is mistyped or otherwise not correct.
Note
The URL is arguably a bit clunky. A convenience short cut can be provided via configuration item datalad.clone.url-substitute.<label> and a substitution rule based on regular expressions. For example, clone URLs can be shortened to require only an identifier (here, file:///data/group/groupname/local_dicom_store), study ID, and visit ID (inm-icf/<study-ID>/<visit-ID>) with the following configuration:
git config --global datalad.clone.url-substitute.inm-icf ',^file:///data/group/groupname/local_dicom_store/([^/]+)/(.*)$,datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/\1/\2_{{annex_key}}'
This configuration allows DataLad to take any URL of the form file:///data/group/groupname/local_dicom_store/<study-ID>/<visit-ID> and assemble the required datalad-annex::... URL on its own, and a clone call shortens into datalad clone file:///data/group/groupname/local_dicom_store/my-study/P000123.
You are free to adjust this configuration custom to your needs and preferences.
Further documentation on it can be found in the DataLad Docs.
Cloning will retrieve a lightweight dataset, which does not (yet)
contain file content. File content can be retrieved with datalad
get. DataLad will handle download and unpacking of the tar file.
Take a look at the section DataLad datasets: Advanced features to learn about useful
convenience features DataLad adds on top of this.
Footnotes