Loading...
 
ESA > Join & Share > Forums > LTDP SAFE > Representation of Corrupted data using DFDL

LTDP SAFE

Help

Show posts:
Jump to forum:

Representation of Corrupted data using DFDL

During the PDR-C collocation meeting it was agreed to analyse the possibility of using DFDL to identify corrupted elements in a SAFE product (see PDR-C Action 03 included in the PDR-C Review Report).

The attached document “Corrupted Data Handling using DFDL Trade-Off” (PDGS-SAFE-GMV-TN-12/0183) provides the analysis and conclusions reached on this topic.

All your comments will be appreciated.

Adrián Sanz (GMV)
LTDP SAFE Project Manager


Re: Representation of Corrupted data using DFDL

I am a bit disappointed with the outcome of this trade-off, I expected to have a bit more info about the capabilities of DFDL annotations.

On the other hand, there is something in the document which I don't understand and causes some concern. You say that with DFDL you are not able to target specific "locations" of a SAFE product time-based, as can be done with SAFE 1.3 . This is strange, because at least taking into account the namespaces, this would seem to be a SAFE feature, not an SDF feature. Why would replacing SDF with DFDL have any impact on this? If this were true, then it would be an argument against changing to DFDL because we would be losing interesting functionality. Are you saying that in SAFE 2.0 we don't have anymore the capability of referring to missing/corrupted information?

DFDL and SDF annotations in theory should enable you do to almost anything in terms of representation, so this is unconvincing.

Remember that the idea behind the trade-off was to see if DFDL annotations could provide a more elegant solution to the issue.

Relevant RIDs:

ESA_PANEL-171 (B. Buckl)
ESA_PANEL-78 (S. Zinke)


Re: Re: Representation of Corrupted data using DFDL

Thank you for your comments Paulo,

Find our comments below:

''> On the other hand, there is something in the document which I don't understand and causes some concern. You say that with DFDL you are not able to target specific "locations" of a SAFE product time-based, as can be done with SAFE 1.3 . This is strange, because at least taking into account the namespaces, this would seem to be a SAFE feature, not an SDF feature. Why would replacing SDF with DFDL have any impact on this?
In SAFE 1.3 quality information is provided using SAFE metadata not SDF. Replacing SDF with DFDL is another subject, and does not have a direct impact on how the data quality information is used in SAFE.
What we mentioned in the trade-off is that DFDL is not able to provide the specification of data location using time based intervals as it is currently done in SAFE 1.3 using metadata (this is not a SDF feature).''

The recommendation that comes out of this trade-off, is that in SAFE 2.0, the information regarding data quality should be done as it is currently done in SAFE 1.3 therefore we are not loosing the capability of referring to missing/corrupted information.


''> If this were true, then it would be an argument against changing to DFDL because we would be losing interesting functionality. Are you saying that in SAFE 2.0 we don't have anymore the capability of referring to missing/corrupted information?
> DFDL and SDF annotations in theory should enable you do to almost anything in terms of representation, so this is unconvincing.''

As I mentioned in my previous paragraph, the capability of referring to missing/corrupted information will be kept in SAFE 2.0 by using metadata not DFDL.

In any case you can find attached a table containing the mapping between SDF and DFDL functions so you can see that we are not losing functionalty.

''
> Remember that the idea behind the trade-off was to see if DFDL annotations could provide a more elegant solution to the issue.''

The objective of the trade-off was to analyse the possibility of using annotations provided by the DFDL schemas to point to the corrupted elements (considering that the DFDL schemas are at product-type level). We didn’t find any possibility of doing so and this is what we tried to reflect in the trade-off.

Regards

Adrian Sanz (GMV)
LTDP SAFE Project Manager



Re: Representation of Corrupted data using DFDL

I am not sure if I am missing a point here, but in my understanding DFDL is to describe the structure of a binary file, irrespective of its actual content. As such, I see DFDL describing the product type not the product instance. If this is true, than, IMO, DFDL cannot describe corrupted or missing elements as those are properties of the instance not the type. Unless: if there is a certain structure within a binary file, where you would always find corrupted or missing elements - but this seem to make no sense to me.
I understand that the way to describe missing or corrupt elements is through SAFE and the manifest.
My RID -78 was about using XPath, and the answer provided has clarified this sufficiently, IMO.
From Action A03 I understood that the analyis should check if annotations from the DFDL can be used for locating corrupted elements. I still believe it could be done, but not only using DFDL but using a combination of DFDL and the manifest (SAFE).
--
Stephan


Re: Re: Representation of Corrupted data using DFDL

Dear Stephan,

Regarding your last post please find some comments inline.

> I am not sure if I am missing a point here, but in my understanding DFDL is to describe the structure of a binary file, irrespective of its actual content. As such, I see DFDL describing the product type not the product instance. If this is true, than, IMO, DFDL cannot describe corrupted or missing elements as those are properties of the instance not the type. (...)
> I understand that the way to describe missing or corrupt elements is through SAFE and the manifest.

This is also our opinion. DFDL should be describing the file structure and not providing information about their content other than that. The conclusion from the analysis during this tradeoff was that meta information should keep being described through SAFE.



>Unless: if there is a certain structure within a binary file, where you would always find corrupted or missing elements - but this seem to make no sense to me.

> My RID -78 was about using XPath, and the answer provided has clarified this sufficiently, IMO.
> From Action A03 I understood that the analyis should check if annotations from the DFDL can be used for locating corrupted elements. I still believe it could be done, but not only using DFDL but using a combination of DFDL and the manifest (SAFE).

Regarding this last paragraph, we were discussing this among us and we are not clear of what you mean.

We came to 3 different approaches to address this issue:
A - Provide metadata information (SAFE) concerning the location of corrupted elements within a DFDL specification using XPath.

  • ) This option is similar to how it is currently handled in version 1.3 of SAFE.


B - Provide in DFDL Schema information identifying the possible location of corrupted data

  • ) This option does not identify the exact element with corrupted data and it does not provide information if there is any corrupted data.

It only specifies the area where corrupted will be located, in case it exists.

C - Provide in DFDL Element with the corrupted data a reference to the manifest/matadata file containing the specification and location of the existing corrupted data.

Options B and C have a dependency on what DFDL provides to handle corrupted data. DFDL being specification language, does not provide many options regarding the specification of metadata.

For example it allows to handle missing information because it has a direct impact on the binary file structure and even for this situation, missing data is identified not by providing metadata, but by failing to find a specific signature on the information or even the information itself. From the DFDL definition document:

Definition 'missing element':
(…)an element is missing
- IF an intiator is defined AND dfdl:emptyValueDelimiterPolicy is 'initiator' or 'both' but the initiator is not found in the data stream.
- OR the content region in the data stream is empty.

--

Miguel Vieira (GMV)
LTDP SAFE Project Team


Re: Re: Re: Representation of Corrupted data using DFDL

Dear Miguel,
I was referring to the approach A you mentioned, similar to what's currently described. I would encourage you, though, to use your explanation given to my RID to update the documentation for reaching clarity. I am still not sure that only XPath to address the location within the DFDL would be enough - I could imagine that additional attributes for the specification (location) of the missing/corrupted elements could be needed. But that might be only because I haven't seen real life examples, yet.
Regards
--
Stephan



Show posts:
Jump to forum: