Loading...
 
ESA > Join & Share > Forums > LTDP SAFE > Compressed data file

LTDP SAFE

Help

Show posts:
Jump to forum:

Compressed data file

The Organization for the Advancement of Structured Information Standards (OAIS) specifies that the products have to be accompanied by its representation information. This representation is currently achieved by the SAFE packages data file and its schema in SDF language.

This consideration imposes strong restrictions on the product formats and excludes data archival in compressed format like NetCDF or JPEG-2000.

For the new SAFE standard specification, the use of compressed data for archival (and under which conditions) shall be analysed, bearing in mind that any approach should be conformant to the OAIS specification.


Re: Compressed data file

Note that NetCDF can be created in compressed and uncompressed forms (although that does not change the fundamental issue).

Compressed data file

So according to you an uncompressed netCDF would fit the OAIS requirement? What is the fundamental issue your refer to?


Re: Compressed data file

By the fundamental issue I meant that SAFE and the SDF schema can't currently describe compressed data formats, which I think is the main issue here (at least that's what I understood from the SAFE public review meeting)

My comment was just to highlight that it may be possible to describe NetCDF if it is not compressed.


Re: Re: Compressed data file

Yes, the main issue is the usage of compressed format as a schema cannot be created for them.


Re: Compressed data file

For what concerns not compressed netCDF, the main problem is that as the structure of the file is not fixed (dimensions and then number of data matrices depend on the application domain). Even if the structure of the metadata part can be fixed at product type level the composition of the data part and in particular the sizes of the data matrices depend on the specific product and as there are no delimiters for the beginning/end of a data array the creation of a schema that allows to describe the whole product is not an easy task, if not impossible. Therefore the uncompressed netCDF presents a similar problem to other compressed formats.


Re: Compressed data file

I think you could somehow (i.e. the SAFE Manifest seems to be very flexible) achieve to add a compressed data file and add metadata describing the (de-)compression algorithm and the compressed file. For the decompressed data then there could be the representation info as it is currently intended.
However, thinking long-term the compression algorithm implementation seems to be the problem.
Now imagine a small virtual machine which hosts the decompression software. The VM could be exported into Open Virtualization Format (OVF) which is an ISO standard since last November (ISO/IEC 17203:2011). The OVF could be added to a SAFE product as dataObject with the XML schema of the OVF descriptor as representation info.
The exported VM should be smaller, than the compression savings, but this is probably not an issue, especially in an archive where many products reference the same VM.

I think that this could be OAIS compliant and relatively safe for popular guest platforms.



Re: Compressed data file

> The Organization for the Advancement of Structured Information Standards (OAIS) specifies that the products have to be accompanied by its representation information. This representation is currently achieved by the SAFE packages data file and its schema in SDF language.
>
> This consideration imposes strong restrictions on the product formats and excludes data archival in compressed format like NetCDF or JPEG-2000.
>
> For the new SAFE standard specification, the use of compressed data for archival (and under which conditions) shall be analysed, bearing in mind that any approach should be conformant to the OAIS specification.

KSAT would not recommend compression of data before storing it in the archive on a general basis. The compression algorithm introduces another potential area of failure if this is not known in the future when the data is to be unpacked. As storage media is getting cheaper and new storage technology evolves, there may be less use for file compression in the future.



Show posts:
Jump to forum: