virtual-stores-feasibility-report/technical-features.qmd at main · NASA-IMPACT/virtual-stores-feasibility-report · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
title: "Technical Features"
subtitle: Additional information on technical features and their maturity levels
---

# Mature Support

**Current mature implementations of virtual stores share the following characteristics:**

- Virtual stores are used to provide alternate views (defined below) into the data. These views can range from an entire collection, to subsets of a collection, to a subset of a granule.
- Consistent spatial and temporal resolution across each variable (i.e. array) within a virtual store
- Uniform chunk grid shape and compression scheme across each variable within a virtual store

::: {.callout-note}
We define the term "view" in a virtual store context to mean the different ways the same dataset can be represented. A view of the same dataset could be representing the entire dataset as a cube or aggregating some a subset of the dataset. Different views will make sense for any particular dataset given its characteristics and the intended use cases; see [Virtual Stores at NASA](nasa-applications.qmd) for some real-world examples.
:::

## Supported data formats

Data format support is limited to formats with VirtualiZarr parser implementations. Parsers maintained by VirtualiZarr include:

* [HDFParser](https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/parsers/hdf/hdf.py)
* [Kerchunk (Parquet or JSON)](https://github.com/zarr-developers/VirtualiZarr/tree/main/virtualizarr/parsers/kerchunk)
* [DMRPPParser](https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/parsers/dmrpp.py)
* [FITSParser](https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/parsers/fits.py)
* [NetCDF3Parser](https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/parsers/netcdf3.py)
* [ZarrParser](https://github.com/zarr-developers/VirtualiZarr/blob/main/virtualizarr/parsers/zarr.py)

Parsers maintained outside of VirtualiZarr include:

* [virtual-tiff](https://virtual-tiff.readthedocs.io/en/latest/)
* [hrrr-parser](https://github.com/virtual-zarr/hrrr-parser)

Custom parsers for additional formats can be implemented using the VirtualiZarr [CustomParsers](https://virtualizarr.readthedocs.io/en/stable/custom_parsers.html) guide.

# Developing Support

## Non-uniform grids

Some NASA collections present additional complexity due to having varying compression schemes and chunk grids across a logical array (usually across granules). Currently, Zarr assumes a consistent chunk shape and compression across the entire array.

::: {.callout-note}
Support for both varying chunk shapes and compression schemes is in active development and should be available by summer 2026. For variable-length grids, see https://github.com/zarr-developers/zarr-python/pull/3802.
:::


# No plans for support

* There are no plans to support formats outside of the "Supported data formats" list above at this time.