JSON-Mmap Specification - Memory-Mapped JSON

JSONmmap

Memory-mapped file I/O for JSON and Binary JSON - dramatically accelerate reading and writing of large data files

📋 Synopsis

Format overview

⚡ Fast Access

Direct byte seeking

✏️ In-Place Updates

Modify without rewriting

📍 Storage Options

Inline or standalone

📋 JSON-Mmap Format Synopsis

Structure: Array of Arrays
[ ["path", [locator]], ["meta", value], ... ]
Path String
$ root
$.key child
$[0] array index
Locator Vector
[start, len]
[start, len, ws₁]
[start, len, ws₁, ws₂]
Byte: 1.......10........20
Data: {"name": "Andy"}
Mmap: ["$.name", [10, 6, 1]]

⚡ Fast Random Access

Use byte offsets to seek directly to any data record without parsing the entire file.

// C example: read value at known position
FILE *fp = fopen("data.json", "rb");
fseek(fp, start - 1, SEEK_SET);  // 1-indexed
fscanf(fp, "%d", &value);
fclose(fp);

// Python example
with open('data.json', 'rb') as f:
    f.seek(start - 1)
    chunk = f.read(length)

✏️ In-Place Updates

Modify values on disk using available whitespace without rewriting the entire file.

// Available space for new value:
max_bytes = length + ws_before + ws_after

// Write new value with padding
fseek(fp, start - ws_before - 1, SEEK_SET);
int len = snprintf(buf, max, "%d", newval);

// Pad remaining space with whitespace
memset(buf + len, ' ', max - len);
fwrite(buf, max, 1, fp);

🪶 Lightweight Format

JSON-Mmap uses the same format as your data - no additional parser needed.

// JSON-Mmap is just JSON arrays!
[
    ["MmapVersion", "0.5"],
    ["$",      [1, 256]],
    ["$.data", [15, 200, 2]]
]

// Parse with any JSON library
mmap = JSON.parse(mmap_string);
[path, [start, len]] = mmap[1];

📍 Flexible Storage

Store mmap inline with data, embedded in metadata, or as a separate file.

// Inline direct (header + data)
[["$",[1,80]],...]  {"actual":"data"}

// Standalone file pair
data.json       ← your data
data.json.jmmap ← mmap table

// Embedded in metadata object
{"_info_": {"mmap": [["$",...]]}}
{"data": "..."}

JSON-Mmap defines a lightweight mapping table for fast disk-mapped file I/O. Read or update specific data records without parsing the entire file. Spec Version 0.5 (Draft 1)

Scroll to explore the specification

Structure

JSON-Mmap Syntax

A JSON-Mmap is an array of arrays. Each sub-array contains a path string or metadata key, followed by a locator vector or value.

[ ["MmapVersion", "0.5"], // metadata entry ["$", [1, 80]], // root object locator ["$.name", [12, 6, 2]], // path with locator ["$.data", [33, 47, 1]], // nested data locator ... ]

Path Strings

Strings starting with $ are JSON-Path style references pointing to specific data records in the associated file.

Metadata Keys

Strings NOT starting with $ are metadata keys storing auxiliary information about the mmap.

Locator Vectors

Arrays of integers containing byte offset, length, and optional whitespace information for fast seeking.

JSON-Path Style

Path String Notation

Path strings follow JSON-Path syntax to reference specific data records. The path must always start with $ representing the root object.

Notation Meaning Example
$Root object (same as $0)$ → entire document
$ii-th root object (0-indexed) in concatenated JSON$1 → second root object
.Child of object to the left$.name → name field
[i]i-th element of array (0-indexed)$.arr[0] → first element
.['key']Named child (use when key has . [ ])$.['a.b'] → key "a.b"

Path Examples

// Sample JSON document
{
    "name": "Andy",
    "schedule": {
        "Monday": [8, 12],
        "Friday": {"AM": 9, "PM": [14.5, 15.5]}
    }
}

// Path references:
$.name"Andy"
$.schedule.Monday        → [8, 12]
$.schedule.Monday[0]8
$.schedule.Friday.PM[1]15.5

Byte Positions

Locator Vector Format

The locator vector contains byte-level information for fast seeking and in-place updates.

[<start>, <length>, <ws-before>, <ws-after>, ...]
IndexNameDescription
0startByte position (1-indexed) of first significant character
1lengthTotal byte length between first and last significant character
2ws-before(Optional) Whitespace bytes before the value
3ws-after(Optional) Whitespace bytes after the value

Visualized Example

Byte: 1 11 21 31 41 51 61 {"name" : "Andy" , "schedule": { "Mon": [ 10 , 14] } } ^----^ ^-----------------^ "Andy" schedule object
[
    ["$",                [1, 56]],
    ["$.name",           [12, 6, 2]],    // start=12, len=6, ws-before=2
    ["$.schedule",       [33, 23, 1]],
    ["$.schedule.Mon[1]",[49, 2, 1]]
]
💡 In-Place Updates: Max writable bytes = length + ws-before + ws-after. Whitespace can be consumed when writing larger values!

Flexibility

Storage Options

JSON-Mmap tables can be stored in three different ways depending on your use case.

📎 Inline Direct

Mmap as header, data follows immediately

[
  ["$", [1,80]],
  ...
]← mmap ends
{"data": ...}← data starts

📦 Inline Embedded

Mmap nested inside a container object

{
  "_DataInfo_": {
    "mmap": [["$",[...]],...]
  }
}
{"data": ...}

📄 Standalone File

Separate .jmmap file alongside data

// data.json.jmmap
[
  ["ReferenceFileName","data.json"],
  ["$", [...]],
  ...
]
📁 File Extensions: Use .jmmap for JSON-based mmap files, .bmmap for BJData-based mmap files.

Try It Live

JSON-Mmap Generator

Generate a JSON-Mmap table from your JSON data. See byte offsets calculated in real-time.

Generate Mmap
Lookup Path
Click "Generate JSON-Mmap" to see results...

Applications

Use Cases

JSON-Mmap enables powerful file I/O optimizations for large JSON and binary JSON documents.

🔍 Fast Read Access

Use fseek() to jump directly to data without parsing

int read_value(int start, int len) {
  fseek(fp, start-1, SEEK_SET);
  fscanf(fp, "%d", &val);
  return val;
}

✏️ In-Place Update

Overwrite values using available whitespace

// maxlen = len + ws_before + ws_after
fseek(fp, start-ws_before-1, SEEK_SET);
snprintf(buf, maxlen, "%d", newval);
// pad remaining with spaces
fwrite(buf, maxlen, 1, fp);

🔗 Reference Pointers

When value grows too large, use path reference

// Original location too small?
// Write "$1" pointer, append data
{"big_array": "$1"}
[1,2,3,...huge array...]
🧠 Perfect For: Medical imaging (DICOM), scientific datasets, database backends, log files, configuration files, and any large hierarchical JSON data.

Learn More

Resources & Links

Explore the full specification and related NeuroJSON projects.

Metadata Keys Reference

KeyDescription
"MmapVersion"Specification version (e.g., "0.5")
"ReferenceFileName"Associated data file name
"ReferenceFileURI"Associated data file URL
"ReferenceFileBytes"Byte size of referenced file
"ReferenceFileSHA256"SHA256 hash of referenced file
"MmapByteLength"Total byte length of mmap table

Version: 0.5 (Draft 1)

Status: Under Development

License: Apache 2.0

JSON Mmap: .jmmap

Binary Mmap: .bmmap

Whitespace: space, \n, \r, \t

Copyright © 2022 Qianqian Fang <q.fang at neu.edu>
Part of the NeuroJSON Project

Powered by Habitat