NeuroJSON guiding principles

The long-term vision of the NeuroJSON project and the software ecosystem were built upon the below guiding principles:

1. Open-data MUST BE provided in "source-code" data format
2. Open-data MUST BE human-understandable
It has been widely understood that open-source software must ensure "Freedom 1" - that the software must be understood by users. It ensures such freedom by demanding the access to the source code (which must be human-understandable). Although the FAIR principle has been widely recognized as the guiding principle for sharing open-data, it has not been recognized that open-data must be provided in the "source-code" format to ensure that such data can be fully understood and reused by future users. It is our vision that JSON and our JSON-based JData data annotations can fill this need and serve as the universal "source-code format" for all open-data sharing. Adopting and demanding the provision of such free-data source format is crucial to ensure they can be reused in the future.
3. Diverse data formats do harm to open-science and open-data
Diverse data formats, often developed in ad hoc fashion by software developers or targeted at application-specific needs, are liability towards the long-term viability of the data they store. Most data formats are not self-contained. Their utility relies on the proper implementation of separate parser libraries and specifications that are external to the data files. When the parser upgrades or retires, or the external specification advances to new versions, data stored in the legacy formats are rendered unusable unless the parsers for older format are continue to be maintained, which adds costs and complexity. Scientific community should consider adopting standardized container formats to store value data assets without relying on application-specific data formats.
4. Small is beautiful
This is one of the well recognized tenets of "Unix Philosophy". Same philosophy should be applied to open-data sharing. Conventional data sharing via large-sized all-in-one zipped file packages fails to enable users to read, search, understand and select the data in the package unless they fully download the package. This is extremely inefficient and not scalable. Extracting human-readable small-sized metadata from complex datasets is the key to enable rapid, efficient access and offer highly efficient data manipulation, inteoperation and scalable data processing.
5. Metadata is your data!
In the US legal system (as well as most of the world), raw measurement data are considered "facts", and thus are not copyrightable (therefore, experimenters are NOT in a position to set any license to the data because they do not own the copyright). However, metadata and annotations that experimenters use to organize their measurement datasets to make those understandable are, in many cases but not all, copyrightable. In a way, the metadata/annotation portion of a dataset is the only part that the experimenters can "own" and decide how to disseminate. Creating a data dissemination system that specifically focuses on indexing, distributing and handling human-understandable metadata not only makes the data dissemination extremely compact and efficient (see "Small is beautiful" principle above), it also readily enables scalability (that easily grows to large datasets) and findability/searchable that the FAIR principle demands.
Powered by Habitat