NeuroJSON.io serves human-readable, searchable neuroimaging datasets using universally accessible JSON format and URL-based RESTful APIs. NeuroJSON.io is built upon highly scalable document-store NoSQL database technologies, specifically, open-source Apache CouchDB engine, that can handle millions of datasets without major performance penalties. It provides fine-grained data search capabilities to allow users to find, preview and re-combine complex data records from public datasets before download.
NeuroJSON data sharing framework offers a number of key benefits -- fine-grained findability/searchability, human-readability, long-term reusability of complex datasets, programmable and universal data access via REST-APIs.
The data-sharing models at NeuroJSON explicitly focus on extracting and processing human-undesirable metadata associated with complex neuroimaging datasets. Our dataset-to-JSON converter and CouchDB views-based automated metadata extraction/standardization/aggregation pipeline allow us to build a "search engine" of public neuroimaging datasets across multiple consortia and databases. The ability to search datasets cross-database search is unique and is essential for accommodating rapid future growth of neuroimaging data.
Open scientific data must be human-understandable. This is one of the key promises that the NeuroJSON project offers and pleas to the broad research community to abide to. Human-understand data is the only way, in our opinion, to ensure that valuable research data can be reused in the future - that is not jeopardized by the constant upgrade and phasing-out of domain-specific data file formats.
Human-understandability of data is especially meaningful with our current backdrop of rapid advances of machine intelligence. With many emerging machine learning models, artificial-intelligence (AI) models have shown abilities to grasp complex relationships and reorganize language-representatble information in ways that resembles human intelligence. By reinforcing human-understandability to scientific data, we open the window to allow powerful AI models to understand, process and transform complex datasets for tasks that are not possible in the past.
NeuroJSON's data sharing framework is extremely scalable. Unlike many file-based data sharing models, light-weight metadata-focused data sharing and highly efficient NoSQL database allows NeuroJSON.io to store, index, and serve potentially unlimited numbers of databases/organizations/repositories, each with millions of individual datasets or subjects with little resources needed. Apache CouchDB not only delivers industry-proven performance and scalability, but also is open-source and widely accessible.