#Introducing zarr-matlab: Zarr v3, Natively in MATLAB
Modern scientific data lives in the cloud, and the format it increasingly lives in is Zarr: chunked, compressed N-dimensional arrays designed so you can read the piece you need instead of downloading the file you don't. Zarr has first-class implementations in Python, Rust, C++, Julia, and JavaScript. MATLAB — still the daily driver for a huge share of working scientists — has been the conspicuous gap: MathWorks' built-in support covers only the older v2 format.
Today we're releasing zarr-matlab, a free, open-source, pure-MATLAB implementation of the Zarr v3 specification.
#What it does
% Create a compressed, chunked array — and index it like any MATLAB matrix z = zarr.create("weather.zarr", [720 1440], "single", ... Path="temp", ChunkShape=[180 360], ... Codecs={zarr.codecs.ZstdCodec(5)}, FillValue=single(NaN)); z(:, :) = mydata; block = z(1:100, end-99:end); % reads only the chunks it touches % Open data written by zarr-python, xarray, or any v3 implementation g = zarr.open("https://example.org/climate.zarr"); % read-only over HTTP tile = g.item("precipitation").read([1 1], [512 512]);
The feature list is the full v3 spec:
- Every data type: integers, floats (including float16), complex, variable-length strings, and datetime64 — stored losslessly
- Real compression: zstd, blosc (with shuffle/bitshuffle), gzip, and crc32c checksums, with prebuilt binaries for Linux, Windows, and Apple Silicon bundled in the File Exchange installer
- Sharding: thousands of small chunks stored inside each cloud object, read individually via byte-range requests
- Cloud-friendly stores: local directories, zip archives, in-memory, and read-only HTTP(S) — plus consolidated metadata, so opening a big hierarchy costs one request instead of hundreds
- MATLAB-native ergonomics: ordinary 1-based indexing with
endand logical masks,disp/treedisplays, a.mltbxyou can double-click
#Interoperability is the contract
The reason to store data in Zarr is that your collaborators, and your future self, can read it from any language. So zarr-matlab treats compatibility with zarr-python as its design contract: on every commit, CI has zarr-python write a 43-case matrix covering every data type × codec × sharding layout, verifies MATLAB reads every byte correctly, then verifies zarr-python can read everything MATLAB writes. Documentation is also executed by the test suite.
#Where this is going
zarr-matlab is a foundation as much as a product. Two projects are already building on it:
- hdmf-zarr-matlab implements the hdmf-zarr conventions for links and object references — the path to reading and writing NWB files stored as Zarr from MATLAB.
- matzarr attacks a problem every MATLAB lab with cloud data has: reading
.matfiles from object storage without downloading them. A.matv7.3 file is secretly HDF5; matzarr indexes its chunk byte-ranges once into a small virtual-Zarr sidecar, after which MATLAB can open the file's entire structure in one request and stream exactly the slices it needs — cell arrays, structs and all.
If you work with array data in MATLAB — neurophysiology, imaging, climate, anything — give it a try:
- Install: File Exchange or GitHub releases
- Docs: catalystneuro.github.io/zarr-matlab
- Issues and interop reports: github.com/catalystneuro/zarr-matlab
CatalystNeuro builds open-source software for neuroscience data standardization and reuse. If your lab needs help adopting cloud-native data workflows — in Python or MATLAB — get in touch.
