The lack of computational reproducibility threatens data science in several domains. In particular, it has been shown that different operating systems can lead to different analysis results. This study aims to identify and quantify the effect of the operating system on neuroimaging analysis pipelines. We developed a framework to evaluate the reproducibility of these neuroimaging pipelines across operating systems. The framework essentially leverages software containerization and system-call interception to record results provenance without having to instrument the pipelines. A tool (Repro-tools) was developed to compare results obtained in different conditions. We used our framework to evaluate the effect of the operating system on results produced by pipelines from the Human Connectome Project (HCP), a large open- data initiative to study the human brain. In particular, we focused on pre-processing pipelines for anatomical and functional data, namely PreFreeSurfer, FreeSurfer, Post- FreeSurfer and fMRIVolume. We used data from 5 subjects released by the HCP. Re- sults highlight substantial differences in the output of the HCP pipelines obtained in two versions of Linux (CentOS6 and CentOS7). Inter-OS differences corresponding to normalized root mean square errors of up to 0.27 were observed, which corresponds to visually important differences. We provide visualizations of the most important differences for various pipeline steps. No meaningful inter-run differences were observed, which shows that the inter-OS differences do not originate from the use of pseudo- random numbers or silent crashes of the pipelines. We hypothesize that the observed inter-OS differences come from numerical instabilities in the pipelines, triggered by rounding and truncation differences that originate in the update of mathematical libraries in different systems. An apparent solution to this issue is to freeze the execution environment using, for instance, software containers. However, this would only mask instabilities while they should ultimately be corrected in the pipelines.
Splitting and merging ultra-high resolution 3D images is a requirement for parallel or distributed processing operations. Naive algorithms to split and merge 3D blocks from ultra-high resolution images perform very poorly, due to the number of seeks required to reconstruct spatially-adjacent blocks from linear data organizations on disk. The current solution to deal with this problem is to use file formats that preserve spatial proximity on disk, but this comes with additional complexity. We introduce a new algorithm called Multiple reads/writes to split and merge ultra-high resolution 3D images efficiently from simple file formats. Multiple reads/writes write contiguously in the reconstructed image, which leads to substantial performance improvements compared to existing algorithms. We parallelize our algorithm using multi-threading, which further improves the performance for data stored on a Hadoop cluster. We also show that on-the-fly lossless compression with the lz4 algorithm reduces the split and merge time further.
We present Boutiques, a system to automatically publish, integrate and execute applications across computational platforms. Boutiques applications are installed through software containers described in a rich and flexible JSON language. A set of core tools facilitate the construction, validation, import, execution, and publishing of applications. Boutiques is currently supported by several distinct virtual research platforms, and it has been used to describe dozens of applications in the neuroinformatics domain. We expect Boutiques to improve the quality of application integration in computational platforms, to reduce redundancy of effort, to contribute to computational reproducibility, and to foster Open Science.
Valérie Hayot participated to the Neurostorm hackathon in Woodshole.
To be presented at IEEE Big Data 2017: Sequential algorithms to split and merge ultra-high resolution 3D images, Valérie Hayot-Sasson, Yongping Gao, Yuhong Yan, Tristan Glatard.
Towards a Sustainable Digital Society: From Clouds to Connected Objects
Organized at Concordia as part of the Entretiens Jacques Cartier event. The aim of the symposium is to allow interactions between researchers working on different perspectivesin communication networks, terminal equipment, connected object and data centers, but sharing a common concern to design the digital society of tomorrow in a development perspective Sustainable development.
Valérie Hayot-Sasson and Lalet Scaria have been selected to participate to Neurohackweek 2017, congratulations!
Lalet Scaria, Greg Kiar, Valérie Hayot-Sasson and Tristan Glatard participate to the Coding Sprint organized by the Stanford Center for Reproducible Neuroscience.
Lalet Scaria presented an abstract at Neuroinformatics 2017: Reproducibility of Human Connectome Project pipelines across operating systems.
New pre-print available: Sequential algorithms to split and merge ultra-high resolution 3D images.