Loading...
 
ESA > Join & Share > Forums > LTDP SAFE > Representation language trade-off results

LTDP SAFE

Help

Show posts:
Jump to forum:

Representation language trade-off results

According the SRR outcomes, a new representation language is expected to be adopted for SAFE in order to cover some of the format needs mainly standardisation and open-source availability (among others) to assure the long-term data preservation.

The attached document “Representation Information Language Trade-off” (PDGS-SAFE-GMV-TN-12-0070) provides the analysis and conclusions reached on this topic. This document reveals that DFDL is the most adequate candidate to be used in SAFE and it is recommended to replace the current implementation based on SDF.

Current topic discussion and the analysis included in the attached document are devoted to reach a consensus before the PDR-C because the final solution may imply a change in the format design that has to be considered for the SAFE Core Specification update.

All your comments will be appreciated.


Adrián Sanz (GMV)
LTDP SAFE Project Manager


Re: Representation language trade-off results

Detailed feedback has been provided by Stephan Zinke (EUMETSAT Panel member) to be used as element for discussion on this topic.

The attached document provides some comments on the trade-off written by GMV.



Re: Representation language trade-off results

Here are our comments about the issues identified in the Stephan's memorandum:


Stephan's Comment
General: I was under the assumption that as well a solution “design from scratch” should be evaluated, but I might be mistaken.
GMV's Answer
According the outcomes presented in the “Representation information in XML trade-off” (PDGS-SAFE-GMV-TN-12-0066) there is no need to design any language from scratch. This is the reason why we decided not include any reference to this possibility in this trade-off.


Stephan's Comment
Section 3.2.2, p10: “the 128 first characters of the Latin Alphabet No. 1”. It is assumed ISO-8859-1 character set is meant, can you confirm?
GMV's Answer
Yes it is ISO-8859-1. This standard is also mentioned in the example described in appendix B.2


Stephan's Comment
Section 3.3.1, p11: “it is a dialect of XML”. I’m not sure what is meant by this. Isn’t BinX a specialisation and extension of XML?
GMV's Answer
You’re right probably this sentence is not very clear.

A BinX document is a simple XML document based on the BinX schema (binx.xsd) including the following tags:
< ?xml version=”1.0”>
< binx version="1.0">
< definitions>
< /definitions>
< dataset>
< /dataset>
< /binx>


Stephan's Comment
Section 3.3.2, p12: “the typeDef mechanism cannot be parameterised. If additional parameterised type definitions are necessary, the user will have to extend the schema itself”. Is that something bad, allowing a user to enhance?
GMV's Answer
Agree. This is not bad. The document should be corrected.
BinX (and DFDL as well) was designed to allow user extensions. However, DFDL provides a particularly strong basis for this, by explicitly separating the description of the structure of the data from its meaning.

BinX specification is not maintained anymore (at least from the time being) and it works has evolved into DFDL. This probably should be needed in the conclusions of the trade-off.

Stephan's Comment
Section 3.3.3, p13: “and a separate text file describing the data schema using the BinX language”. So actually this an XML file, which complies to binx.xsd.
GMV's Answer
Yes, it is an XML document (compliant with binx.xsd) but in fact it as a pseudo-XML Schema (containing BinX language) describing the structure of the binary file using the predefined types included in binx.xsd.

I can advance you that this XML document does not allow value restrictions (e.g. range definitions for some data types) so it seems not feasible to validate the binary file with this.


Stephan's Comment
Section 5, EAST: Pro “Open source standard language” vs. Con “Existing tool suite is not open source”. Isn’t this a contradiction?
GMV's Answer
No, it isn’t. The language specification is open (CCSDS provide the language specification but not the tools). The existing tools (or at least the most complete set of tools) have been developed by CNES and are available only as pre-compiled applications. The source code of these tools is not available for the community.


Stephan's Comment
Section 5, SDF: Pro “Open source language” vs. Con “Proprietary language specification”. Isn’t this a contradiction?
GMV's Answer
No, it isn’t. The language specification has been defined by a private company but it is public available right now (open source). However, the license terms for using this language is not very clear so the use of this format implies to assume some risk for the future, as this company could claim some rights on using this language (some parallelism on what happened with GIF format).


Stephan's Comment
Section 5: According to the definition, zero points means “Language is not compliant with SAFE needs”. Thus, any language assessment which has a zero anywhere, would need to be excluded at once. Thus, the given “mean” values don’t really help.
GMV's Answer
Probably you’re right. But in any case, this consideration does not impact in the final result as DFDL scores in all categories. However, we will consider your comment for the final version of this trade-off.


Re: Re: Representation language trade-off results

Thanks for the feedback on my comments, GMV. Some further comments below:

> Stephan's Comment
> General: I was under the assumption that as well a solution “design from scratch” should be evaluated, but I might be mistaken.
> GMV's Answer
> According the outcomes presented in the “Representation information in XML trade-off” (PDGS-SAFE-GMV-TN-12-0066) there is no need to design any language from scratch. This is the reason why we decided not include any reference to this possibility in this trade-off.

I don't understand then the conclusion reached based on what has been presented in PDGS-SAFE-GMV-TN-12-0066, namely, why there is no reason to design a language from scratch, and PDGS-SAFE-GMV-TN-12-0066 describes another issue than PDGS-SAFE-GMV-TN-12-0070. This new one could still be XML based. Maybe the conclusion reached in 0066 should be rather mentioned in 0070?


> Stephan's Comment
> Section 5, EAST: Pro “Open source standard language” vs. Con “Existing tool suite is not open source”. Isn’t this a contradiction?
> GMV's Answer
> No, it isn’t. The language specification is open (CCSDS provide the language specification but not the tools). The existing tools (or at least the most complete set of tools) have been developed by CNES and are available only as pre-compiled applications. The source code of these tools is not available for the community.

I was using "Open Source" in the past only for SW and the source code. One could argue probably that a language could be as well open source. Maybe this is just a question of definition. So, fine with me.



Thanks,
Stephan Zinke, for EUMETSAT



Re: Representation language trade-off results

Hello Stephan, I don't really understand your comment:


>I don't understand then the conclusion reached based on what has been presented in PDGS-SAFE-GMV-TN-12->0066, namely, why there is no reason to design a language from scratch, and PDGS-SAFE-GMV-TN-12-0066 >describes another issue than PDGS-SAFE-GMV-TN-12-0070. This new one could still be XML based. Maybe the >conclusion reached in 0066 should be rather mentioned in >0070?

The conclusion from PDGS-SAFE-GMV-TN-12-0066 describes that it is not needed to develop a new language from scratch but to reuse a existing one (SDF or DFDL). PDGS-SAFE-GMV-TN-12-0070 supports the idea on using DFDL... so I don't see any contractiction between them. I agree that the conclusions from 0066 should be included in 0070, but this seems removed from your comments.

Could you clarify this, please?

Many thanks


Adrián Sanz (GMV)
LTDP SAFE Project Manager


Re: Re: Representation language trade-off results

Dear Adrian,
If I am not mistaken, the topic of -0066 was to "analyse the feasibility of using an alternative approach to the representation language, based on instances of a common XML schema, instead of representation of the binary data with a single XML schema".
I am wondering thus, that by the analysis you carried out you come to conclusions which were not objective of the analysis, nor are those then supported by the analysis itself.
The question which language to use is topic of -0070. IMO, it should have been included as one option to look at, i.e. compare the development of a new language from scratch with the options presented in -0070, i.e. SDF, DFDL, BinX, etc.
In -0066 I'm missing the supporting argument(s) for why it is enough to "reuse a existing one (SDF or DFDL)". If that was the case, BinX and EAST would not have neeed looking at ini -0070, or? :-)
What I understand from -0066 is that you believe that "therefore a “pure” XML Schema should require a very general high level schema providing very little added-value to the final validation of the whole product."

"...To validate the binary/text file (Step 2) a semantic validation would be still needed".
This is true. But I believe it is as well true for representation information languages expressed as XML schemata. In that sense I don't see much the difference.

As said previously, however, the results of -0070 indicate that DFDL is the suitable replacement for SDF, and DFDL is based on XML schemata. Thus, the analysis in -0060 is kind of superseded...

Regards
--
Stephan


Re: Re: Re: Representation language trade-off results

I share Stephan's first comment that besides the four presented possibilities - DFDL, SDF, BinX and EAST - the idea for this trade-off was to have a fifth one consisting of a new language to be designed from scratch, even if the comparison would not be very fair because it would be based on hypothetical (not actual) characteristics of that new language. I propose that it is included in the trade-off, even if dismissed briefly upfront.

I also share the view that this issue is different from the one in trade-off 0066. The two issues are related but still independent enough to be looked at separately. Trade-off 0066 resulted from a RID/idea from Dominic Lowe and was "only" about whether instead of having new schemas for each new product type, we could have a very generic schema able to represent all possible product types (quite challenging and, I agree, of dubious added value because it would have to be really generic) and then instances of this generic XML schema for each new product type.

Anyway, the real issue is then whether the decision of having 1 schema + M instances instead of N schemas influences trade-off 0070 and in particular the need for a new language to be designed. I don't think so.

We should also not forget the comment that Bernard Buckl from DLR made during the SRR, that there are probably strong (unavoidable) reasons for languages like DFDL and SDF to use annotations to be able to represent binary data. I think this is the really interesting discussion, but it is more relevant to trade-off 0066.

Regarding this trade-off, I had a few comments of my own:

- I am slightly concerned about the immaturity of DFDL and in particular of the Daffodil parser. I'm sure we will see updates to both in the coming months and this is not optimal considering the upcoming SAFE developments.
- Maybe the trade-off could consider other parameters in the analysis, such as "maturity", "complexity" and "limitations". DFDL, for example, would probably score low on maturity, EAST would score low on complexity (i.e. it's complex), and BinX would score low on limitations (it has several).
- Stephan also mentioned this. I found it curious that BinX uses a master schema and then representation information consists of XML instances/documents. This is exactly the subject of trade-off 0066.

About the conclusion of the trade-off, no doubt about changing to DFDL as it stands.

Paulo



Re: Representation language trade-off results

Dear Adrian and Hector,

DFDL seems to be the best decision for this trade-off and also for us. The second option, EAST, is less intuitive and it will require expert programmers apart from being based in Ada and not in XML (by the way, Ada should be preserved too). However, in the short term it is worrying its low consolidation and maturity level. Too many changes and upgrades on DFDL would impact (and probably delay) SAFE in a relevant way.

Maria



Show posts:
Jump to forum: