GPOD SMOS L2SM Processor User Manual
The GPOD SMOS Level2 Soil Moisture (SMOSL2SM) Service provides time-driven and data-driven scheduled processing of SMOS Level 1C products into Level 2 Soil Moisture products. The service can also be used for on-demand processing if reference auxiliary data is available in the SMOS Catalogue and on storage for the day prior to the day you wish to process.
Setting up the Soil Moisture Service the first time:
Install the SMOS Catalogue:
The Soil Moisture Service relies on a dedicated catalogue. The SMOS catalogue provides three custom fields for its use with SMOS products. These are to be found in the smos.dataset table, i.e.:
· polarisationmode: contains a "Full" flag if the MIR_SCS_1C (ocean) or MIR_SCL_1C (land) product is in full polarisation mode.
· isPartOf: a custom field which tags each output auxiliary file and output MIR_SMUDP2 and MIR_SMDAP2 as belonging to a single processing run. This field supports the chained processing mode of the Soil Moisture output products, where the auxiliary files produced for one day of processing serve as input to the next day's processing. It also supports the start of a processing run, inasmuch as "isPartOf=reference" is used to tag pre-processed auxiliary files which can be used for the first day's processing. The SMOS L2 SM G-POD service looks for auxiliary files bearing "isPartOf=reference" for the day priorto the first day of processing in order to start off the chain.
· processingversion: a field which serves to identify a given set of parameters for a single processing run.
Further, the SMOS catalogue does not use the extended tables of the standard CAS catalogue.
The SMOS Catalogue can be installed in an instance of postgres which is running other CAS databases, or it can be installed in a separate instance of CAS. Once you have chosen the instance from which you will run your SMOS database, retrieve the smos catalogue installation file from:
Then pack out smoscatalogue.tgz with the command:
tar xvfz smoscatalogue.tgz
Make sure you have access to the postgres database instance. Then:
psql –-host <hostname.CAS.postgres.server> [--port ] –username <CatalogueAdmin|root|superuser*> --file init_smos.sql
The database username you choose here will be the owner of the smos database as created by the script. You may wish to change this later with the SQL command:
psql –U <CatalogueAdmin|root|superuser*> [--port ] smos –c "ALTER DATABASE smos OWNER TO "
To finalise the smos database, you should create a privileged user for smos administration in psql with the postgres command:
You should then change ownership and add grants for the new smosadmin user. A sample script, set_smos_grants.sh, is provided in the installation package. Examine the script and modify if necessary before running it.
Populate the SMOS Catalogue:
Before starting to insert records into the SMOS catalogue, the SMOS product and auxiliary files series records must be inserted. There are two methods for doing this:
1. Use the template, smoscat_series.rdf-template, contained in the installation package. Copy the template to smoscat_series.rdf and edit to replace the %%SMOSHTTPBASEURL%%placeholders with the base address of your new SMOS catalogue. This base address can be one of two styles:
http://__%%SMOSHTTPBASEURL%%__/MIR_OSUDP2/rdf could become http://__myold.cathost/smoscat/smos__/MIR_OSUDP2/rdf
if you will be installing the SMOS catalogue in a server that is already running an instance of CAS, or if you plan to use a dedicated hostname and wish to hide the schema ("smos")"
http://__%%SMOSHTTPBASEURL%%__/MIR_OSUDP2/rdf could become for example http://__mynew.cathost/catalogue__/MIR_OSUDP2/rdf
Then, insert the smoscat_series.rdf into your catalogue with the command:
tellCASto insert series into http://__mynew.cathost/catalogue__/[schema|schema]/rdf from smoscat_series.rdf
or the curl equivalent:
curl -XPUT data @smoscat_series.rdf http://__mynew.cathost/catalogue__/[schema|schema]/rdf
2. Method 2. Insert datasets URL-to-url from http://smos.terradue.com/catalogue/rdf
With this method, you can use the tellCAStoperl script to insert series definitions into your new catalogue from another catalogue. For example:
tellCASto insert series into http://__mynew.cathost/catalogue__/[schema|schema]/rdf from "http://smos.terradue.com/catalogue?count=80"
would insert all 80 SMOS series registered in smos.terradue.com into your new SMOS catalogue.
You can select those series you wish to insert with the ?count=Nor ?q=expressionoperators. Refer to the CAS User Guide for examples. For example, if you wanted to insert just the MIR_SCL_1C series, you could execute the command:
~np~tellCASto insert series into http://ify-ce03.terradue.com/catalogue/gpod/rdf from "http://smos.terradue.com/catalogue/rdf?q=MIR_SCL_1C"~/np~
Remember to put the smos.terradue.com URL in double quotes if your URL contains '?', so as to protect it from the shell.
In case of errors, series definitions can only be deleted with a the regular postgres client, psql. Changes can always be done by either using psqlor modifying and re-submitting the RDF file.
Once series are inserted, you can populate the catalogue with dataset records.
A sample script for populating the SMOS Catalogue, zipNmvSMOSdata, is provided in the installation package for this purpose. It is meant to be run where the data are stocked uncompressed, as it analyses the HDR file and stats the data for size. It also uses the package smosfootprint, provided by ESRIN, to extract the product footprints. The smosfootprintscripts and jarfile are included in the installation package in the bin subdirectory alongside zipNmvSMOSdata, and if you want to generate footprint records during catalogue inserts you should be sure smosfootprintis always installed in the same directory as zipNmvSMOSdata.
To use the zipNmvSMOSdata script, just provide as parameters:
- the top-level directory where data are temporarily placed,
- the top-level directory where the data are to be stored,
- and finally the catalogue URL to insert records into
zipNmvSMOSdata from /EO_DATA/toregister/for_soilmoisture/DGG_REF/DGG_CURRENT_REF/ to /EO_DATA/ and insert into http://smos.terradue.com/catalogue
Run the script with no parameters for usage. zipNmvSMOSdatawrites logs to $HOME/tmp and is meant primarily for use in the crontab.
Hardware and Software requirementsfor nodes that will run the GPOD implementation of the Soil Moisture service:
The nodes running soil moisture jobs should have the following hardware characteristics:
64-bit CPU required, 3.0 GHz recommended.
12GB of free RAM per soil moisture job to be run
Adequate disk space: up to 100 MB can be used in temporary and storage space per job.
libxml2 xmlstproc Perl modules XML::DOM, XML::Parser, XML::Parser::Expat and dependencies (available from http://www.cpan.org/) IDL
In addition, to save transmission time for the large SMOS third-party libraries, the third-party librairies used may be installed directly on the nodes in the directory /home/shared/utils/eo/share/smos/third_party_libraries.
HTTPD: Finally, the SMOS database then requires an Apache web daemon that will serve as a front end to the database. An httpd-SMOS.conf-templatefile is included in the package. You must make a copy of the template file:
cp httpd-SMOS.conf-template httpd-SMOS.conf
customize the httpd-SMOS.conf-templatefile, replacing the %% placeholders with values appropriate for your system. Then include the httpd-SMOS.conffile in your main httpd.conffile and restart your Apache daemon.
Install the service:
The service is installed on your portal web server. If you have installed the latest release of the G-POD portal, you will find the SMOSL2SM service already in place, under the gpod root/services directory. If not, you can download the smos_l2sm.tgzpackage from http://maps.terradue.com/downloads/
To set up the service for the first time, you may either run the SQL script, create_or_update_service.sql, included in the installation package:
mysql –u -p portal_databasename < create_or_update_service.sql l
or you can add the service and its related data series using the Administrator Control Panel:
Log on to the GPOD website with an administrator account
Choose Admin>Control Panel in the top menubar.
Click on the Services button in the Control Panel
Once the list of Services is displayed, scroll down to the bottom of the page and click on the Create New button at lower left.
Fill in each field of the Create New Service form with appropriate values, as suggested in the table below.
Click on the Create button at bottom page to add the new service to your GPOD portal database.
Again, choose Admin>Control Panel in the top menubar to return to the Control Panel.
Click on the Dataset Seriesbutton in the Control Panel.
Once the list of series is displayed, scroll down to the bottom of the page and click on the Create Newbutton at lower left.
Fill in each field of the Create New Series form with appropriate values, as suggested in the table below.
Click on the Create button at bottom page to add the new series to your GPOD portal database.
Now, click on Admin>Control Panel again at the top menubar to load the Control Panel, click on the Services button, search for SMOSL2SM, and load the service definition page. Scroll down to the fields Default input dataset seriesand Supported input dataset series and select the SMOS MIRAS Level 1C Land Productsfrom the select list to set these for the new service, and click the Modifybutton at bottom page.
Using the SMOS Soil Moisture service
To set up time-driven scheduled processing:
Log in to the GPOD portal with a user account which is enabled for scheduled processing, click on
Once the SMOS L2 SM Scheduler page is displayed, manually enter $(EXECDATE) and $(EXECDATE)+1D in the two fields under the panel.
Enter a suitable caption for the scheduled task in the Task Caption.
Choose a appropriate for SMOS processing.
Choose a appropriate for SMOS processing (64-bit architecture).
Tick under the Scheduler Parameterspane.
Enter your scheduler caption in under the Scheduler Parameterspane.
Enter your scheduler caption in under the Scheduler Parameterspane.
Enter 100 in the field.
Enter 1 in the field.
Click in the Validity Start Fieldand use the pop-up calendar to set your first day of processing as start date.
Click in the Validity Stop Fieldand use the pop-up calendar to set 2010-11-02T23:59:59 as stop date.
Enter 1D in the field.
Under the pane, set the <Campaign Start Date:> to the first day you want to process.
You must have the reference auxiliary files for the day prior to your Campaign Start Date available on storage and in the SMOS Catalogue. The series names of these auxiliary files are:
· AUX_DGGTLV (Tau Low Vegetation)
· AUX_DGGTFO (Tau Forest)
· AUX_DGGROU (Surface Roughness)
· AUX_DGGRFI (Radio Frequency Interference)
· AUX_ECMWF (ECMWF parameters)
All reference auxiliary file records must be bear the value "reference" in the isPartOf field.
Choose the processor version desired in the pulldown menu.
Enter three digits, for example 009,in the <Output File Version:> field. This file version number will identify your results, so that you can easily match them with your set of configuration parameters.
The field is used by the Soil Moisture task as it inserts processed results into the SMOS catalogue. It uses the URL that you insert here as a base prefix, and follows this base with the SERIESNAME/YYYY/MM structure.
For example, if you configured as <Catalogue Base URL for onlineResource>:
The actual onlineResource tag for the product SM_TEST_MIR_SMUDP2_20101101T154551_20101101T163950_400_013_1.tgz would be (url below split up for readability:
In other words, <Catalogue Base URL for onlineResource> is the download URL for the finished product. It should correspond to the publish server inasmuch as the finished product location in your publish server should be reachable via the download URL..
For example, the publish server corresponding to the <Catalogue Base URL for onlineResource> gsiftp://elba.terradue.com/EO_DATA is: sftp://elba.terradue.com/EO_DATA (hostname elba.terradue.com, protocol sftp, path /EO_DATA).If you wish to use a custom <Moisture Map Configuration File>, a custom <Processor Configuration file for Full Mode> and/or a custom <Processor Configuration file for Dual Mode>, you should paste the contents of these configuration files in the appropriate text fields.
It is a good idea to save the configuration file server-side for future use and reference by clicking on the pencil button:
which activates a “Save” dialogue. Click on the icon of the floppy disk to save your configuration file server-side.
Click the [Create] button. Click on [Schedulers] in the top menu to view the list of schedulers. The newly-created Scheduled Task will appear in the list.
The Scheduled Task should start within the polling interval for the IfyAgent set by the G-POD Administrator. If the background process IfyAgent is not running, the scheduler may be started by clicking the Advance button at top left of the page.
Once the scheduled task is started, you can list its individual task by clicking on its name in the list.
In turn, clicking on each taskname will show its jobs, if still running, or its results if completed.
To set up an on-demand task:
On-demand processing is occasionally necessary for testing purposes, in case of a core processor crash or other type of failure, or for reprocessing Level 1C products singly or in groups for whatever reason. Two techniques are available for processing Soil Moisture products on demand:
The use of the Campaign Start field in the service’s webpage;
The use of an unlabelled input field in the service’s webpage.
These two fields allow you to set up what auxiliary data you will be pulling in from the previous day to process your Level 1C files. To use the Campaign Start, you should first be sure that reference data is available in the catalogue for the day prior to the day you will be processing.
An example query to find reference auxiliary files would be:
The query above will yield the auxiliary files you need if you were processing L1C files starting on 2010-11-01. The tag dct:isPartOf in the catalogue results will bear the value “reference”. You would then insert the date-time 2010-11-01T00:00:00 in the field Campaign Start.
An example query to find auxiliary files produced by a previously-run scheduler would be:
where FV032_PV400_World_Validation is the caption of the scheduler in which this task’s results will be integrated. The query above will yield the auxiliary files you need if you were processing any of the L1C files starting on 2010-11-03 for which 2010-11-02 has already been processed by scheduler FV032_PV400_World_Validation. The dct:isPartOf tag bears the portal hostname + the scheduler + caption as value. It is assumed that you hold the parameters of the task (Region of Interest, processor number, file version number, configuration files, publish URLs) identical to the parameters of FV032_PV400_World_Validation.
Just insert the scheduler identifier, i.e. the value of the dct:IsPartOf tag:
in the unlabelled field (debug_scheduler) at the bottom of the service, or include it in the page URL, e.g.
http://portal-dev.gpod.terradue.com/services/SMOSL2SM/ ?startdate=2010-11-02T01:46:18 &stopdate=2010-11-02T02:40:17 &caption=FV025 &debug_scheduler=http://portal-dev.gpod.terradue.com/schedulers/FV025World&filever=025
Using this technique, it is possible to reprocess single files of a completed scheduler.