GPOD User Manual
Table of contents
The Earth Observation Grid Processing-on-Demand environment (G-POD) integrates high-speed connectivity, distributed processing resources and large volumes of data to provide science and industrial partners with a shared data processing platform fostering the development, validation and operations of new earth observation applications.
The Grid Processing On Demand user interface is a web-based mechanism for setting up and submitting processing jobs to G-POD clusters.
Objectives of the Document
This document is the user manual of the G-POD internet portal and aims at providing guidelines to:
- Overall organisation of the portal, providing an outline description of each portal page and access point.
- User registration guidelines and logging in.
- The description of the generic methods and user actions offered to trigger the processing jobs, monitor them and get back their results.
- Service: A service is a pre-defined set of programs designed to retrieve and process data in a certain way.
- Task: A task is a particular run of a service once launched. It comprises one or more jobs.
- Job: The individual steps in the service, running from data retrieval to publication of results to a storage server.
- Files: The input data products.
- AOI or Area of Interest: The geographical region for which data products will be retrieved.
- Series: A type of data product, such as ENVISAT MERIS Level 2 Full Resolution (MER_FR__2P), NOAA AVHRR (AV_SST_MEDP), etc. Series are standard data products offered by ESA and present in the G-POD catalogue. See http://earth.esa.int/dataproductsfor more information.
- Dataset: The actual product, representing a data file or files (depending on format) retrieved from G-POD storage.
- Computing Element or CE: A cluster, consisting of a master node, which identifies the CE, and its working nodes. Different CEs usually have different characteristics such as a particular operating system or library installations.
Table of Acronyms
|ASAR||Advanced Synthetic Aperture Radar (Envisat)|
|EOLI||Earthnet On-Line Interactive|
|ESA||European Space Agency|
|G-POD||Grid Processing On Demand|
|MERIS||Medium Range Imaging Spectrometer (Envisat)|
|MUIS||Multi-mission User Information Service|
|URL||Uniform Resource Locator|
G-POD HOME PAGE OVERVIEW
The G-POD home page is accessible from internet at the following URL:http://gpod.eo.esa.int
Access to G-POD
Access to G-POD requires a UM-SSO account (or, alternatively a G-POD username and password), as well as a certificate which will be used as a proxy when running your tasks. These can be obtained by contacting the G-POD administrator at protectEmail('eogridadmin', 'esa.int', '@');eogridadmin at esa.int
G-POD user registration procedure involves several steps which are described in detail in annexto this document:
- Requesting an UM-SSO account (if UM-SSO is available for G-POD)
- Creating a G-POD account (linked to the UM-SSO account, if UM-SSO is available) and a digital certificate for accessing the G-POD
- Setting up your MyProxy certificate
- Logging in to G-POD
The G-POD user authentication may be based on associating a digital certificate with a G-POD user account; therefore accessing G-POD may require also the creation of a certificate.
If you are new to digital certificates and you want to know more about how they work, you can find more information by doing an internet search for these terms:
- Public Key Infrastructure
- Asymmetric Key Cryptography
Organisation of the G-POD Portal
The G-POD home page layout comprises buttons for access to various sections of G-POD and logging in, and generally includes a banner at top as well as news and other content, which may change from time to time.
Figure 1 - G-POD home controls
- A searchable listing of services available on the portal.
- The area where you can set up, modify, visualise and download results of your tasks.
- A searchable listing of the products available for processing on grid clusters.
- An online version of the user manual.
- My Profile
- An area where you can set and modify your username, password, email address and other personal information.
The UM-SSO login works in the same way as for other EO web applications that are using the Single Sign-on mechanism. The G-POD web portal relies on the identity information from the Identity Provider, that takes care of the user authentication. If you are not yet logged in to G-POD, the G-POD homepage shows the following button on the right of the top banner.
By pressing the Sign in button, the following three things may happen:
- If you are not yet logged in on UM-SSO: you are redirected to the UM-SSO login page where you have to enter your UM-SSO username and password (see figure 1a).
- If you are logged in on UM-SSO and a G-POD account is linked to your UM-SSO account: you are granted immediately access to the protected parts of the G-POD portal.
- If you are logged in on UM-SSO but do not have a G-POD account linked to your UM-SSO account: you are redirected to the registration page where you have to enter additional information for your G-POD user profile.
The following applies only if UM-SSO is not available: Enter your G-POD username and password and press the Login button (as seen in figure 1b).
Once successfully logged in, your username is displayed (see Figure 2) together with information about your available resource credits (see description in the section "Resource Credits").
Figure 2 - User information after login
In order to be able to launch processes on the Grids, you will be required to have a valid proxy with at least three hours of lifetime remaining. In case the Proxy expired message appears, a new proxy needs to be uploaded on the MyProxy server (see the section "MyProxy Server"). To refresh the proxy lifetime information after updating it, simply click on it.
To log out from the system, click on the "Logout" button in the Login pane.
To start working with G-POD, call up the Services page by clicking on the "Services" button.
The Services page presents a paginated list of the services available, sometimes grouped by category, with a brief description of each. You can search for all or part of a string in the service name or description by using the Search input box and Search button at the top of the page, and set the number of items to display per page using the "Items per page" select list. Click on the "... more" link to reveal more search options; you may also search by rating, class, and category if these are available.
Figure 3 - Services search pane
Simply click on a service's name or icon to load the service.
The service page itself will differ greatly from service to service, but all usually have the following common elements:
- a map, showing the portion of the earth relative to the service (usually the world, but a service may also show just a particular region, such as a polar region). A set of map tools are provided at top left of the map: a pan tool, an information tool, two zoom tools, back and forward tools to quickly access previous map displays, and a selection tool. The map is used to select the Area of Interest.
- a set of four input boxes which display the geographical coordinates of the map selection, and which may also be used a input for the Area of Interest. If the map and the Geographical Selection input boxes are not present, this means that the service uses a pre-defined Area of Interest which is not selectable.
- start date and stop date input boxes, along with calendar overlays for ease of start and stop date selection;
- a Dataset Series select list, which allows selection of a product type (or combinations of products and auxiliary files). If no Series select list is present, this means that the service uses one series only which is thus not selectable.
- a Query button, which launches a catalogue query using the Area of Interest, Dataset Series, and start and stop dates chosen. The Query button is usually situated in the service page close to
- the Files area, which displays query results in a select list or table.
Figure 4 - Elements of a service page
The Workspace is an area where you will find the tasks created, submitted or completed on G-POD. It consists of a list of tasks flagged by status which can be modified using the buttons below the list. A set of controls at top lets you filter results by status ("All", "Created", "Pending", "Active", "Paused", "Failed", "Completed", "Incomplete", "Deleted").
Figure 5 - Example Workspace
To delete, abort, and restart several tasks simultaneously, select them by using their checkboxes (at left in Figure 5 above) and click on the appropriate button at bottom. By clicking on a task name in the workspace, job details can be viewed and specific actions can be triggered on an individual task. Details of the methods and actions available for task submission, control and management follow.
Figure 6 - Example task in the workspace
The "Jobs Information" button below the task pane will display specific information about each of the inidividual jobs comprising the task. An individual job display area consists of buttons to call up specific parts of a job: its Details, Parameters, Processing, Nodes, and Results.
- Details provides information on the state of the job on the nodes.
- Parameters provides a list of input parameter names and their values which originated during job setup on the service page.
- Processing Nodes provides a list of the working nodes and the logs they produced as they processed the job.
- Results contains information on the output from the nodes.
Figure 7 - Example job display in the workspace
Full tasks can be submitted (sent for processing to tbe grid-engine), deleted, copied, cloned, re-created, re-submitted, and aborted. If a task has been created but not submitted, some modifications can be made directly in the task view of the job, such as change of the owner, caption, compression type, publication server. In this case, to apply changes, the Modify button should be pressed. If a job failed or was aborted, it can thus be modified and resubmitted. If a task is copied, it is created with identical parameters as a new task ready for submission. If a task is cloned, it is created as a new task with identical parameters, but not ready for submission: the service page of the task is presented for modification of the task's parameters.
When a task is completed, clicking on its name will display results at top of the task edit page.
Figure 8 - Example completed task display in the workspace
The catalogue control calls up a query and list mechanism allowing you to view the series (product types and datasets) available to you for processing your jobs. A search for datasets (product) for a given series starts with the selection of the series from a pulldown list.
Figure 9 - Selection of a series
A catalogue query page is called once you have selected your series. A geographical region of interest, start and stop dates, access protocol, uid (the actual product name), processing centre, acquisition station, track number and free text fields provide ample ways of selecting products from the catalogue.
Figure 10 - Product search using the catalogue query page.
Scheduled services are a way of submitting jobs in bulk, unattended, to G-POD clusters. Per se a scheduled service is identical to an ordinary service, with the addition of a mechanism to set up the automatic launch of the task. When calling a service with scheduling enabled, an additional pane appears in the service page, allowing you to insert parameters defining how the scheduled task will run, as shown in the figure below.
Principally, there are two ways of choosing data for automatic processing using the scheduler.
Time-driven scheduling permits cyclic, repeat processing of selected groups of input products. It takes a given number of input products between start and end dates which are expressed in terms of a range: $(EXECDATE)+/-offset to $(EXECDATE)+/-offset. The time-driven scheduler selects input products on the basis of this range and submits them in batches to the configured service, to be run by the grid engine. The Time Interval parameter acts as a step function, determining how the next batch of products is to be selected. This effectively creates a "moving window" of processing: if a time interval of 1D (one day) is defined, with an $(EXECDATE)Â¬â€ range covering 10 days, groups of 10 products will be made up, starting at date D; the next execution of the scheduler will set up a group of 10 products starting at date D+1. The interval can also be negative, which sets up a reverse range, last product to first product (see below). The Validity Dates in the scheduler input parameters define a date selection range for the entire run of products to be processed.Â¬â€ The Minimum and Maximum input files per task are not used for time-driven scheduling. This type of scheduler is suitable, for example, for daily or weekly composites, or any task where input products must be grouped for processing by time interval.
Figure 11 - Time-driven scheduler example.
Data-driven scheduling, on the other hand, permits continuous processing of products based on the time they are inserted into the catalogue (modification time). In this case, the Time Interval parameter is not used; the Minimum Number of Input Files per Task and Maximum Number of Input Files per Task define the limits of how many input products will be sent in each batch: the scheduled task will not run until the catalogue does not contains entries for the full minimum number of input files.
Figure 12 - Data-driven scheduler example.
To set up a scheduled service, the required fields are:
- Enabled: Check to enable and start the scheduled processing.
- Name: A short name for the scheduled service.
- Caption: A longer caption describing the scheduled service (not to be confused with the Task caption).
- Validity Start Date: Data products grouped for processing will have a start date after this date (or before if the Time Interval is negative).
- Validity Stop Date: Data products grouped for processing will have an end date prior to this date. If the Time interval is negative, this date must precede the Start Date, otherwise it must be a later date.
- Time interval: the grouping interval for data between two runs, for time-driven scheduling. It is defined as days (d), weeks (w), months (m) or years (y), e.g. "10d" or "1m" The interval can also be negative in order to process the data at the beginning of the defined period and to move backwards in time. In that case, the Stop Date must be prior to the Start Execution Date.
- User: Name of the user on behalf of which the service will run.
- Minimum number of input files per task: For data-driven scheduling, this many input files must be available before the task starts.
- Maximum number of input files per task: For data-driven scheduling, task cuts off data input selection and starts task when this many input files are available.
|Validity start and end||Range of task input product start/stop dates||Range of modification times for selected products in the catalogue|
|Minimum input files per task||N/A||Task starts when this many input files are available|
|Maximum input files per task||N/A||Input file selection is closed when this many input files are reached|
|Time interval||Difference between task N start date and task N+1 start date||N/A|
The web portal now provides also the interface defined by OGC Web Processing Service specification.
- GET requests (key/value pairs in URL query string)
- POST requests (containing XML documents)
- SOAP requests
To use the feature, do the following:
1. Create your own configuration file that rules the Web Processing Service. Use the file sites/gpod2/config/application.wps.xml.tmpl as a template. Add correct information in the <log>, <debug> and <alert> elements and adjust the contact information in <wpsSettings>.
2. In the Control Panel, create a new external application. Make sure that the name is "wps" and that the value for the configuration file is the local file path of the configuration file created in step 1.
3. The WPS interface can now be accessed at the URL http://<base_url>/wps/ For the usage of GET, POST and SOAP web service requests, refer to the OGC WPS implementation specification, available at http://portal.opengeospatial.org/modules/admin/license_agreement.php?suppressHeaders=0&access_license_id=3&target=http://portal.opengeospatial.org/files/?artifact_id=13149&version=1&format=pdf&format=pdf&version=1
The capabilities document can be obtained with this HTTP GET request: http://<base_url>/wps/?Request=GetCapabilities&service=WPS&AcceptVersions=1.0.0
The description of a process (these are defined in the configuration file) can be obtained with this HTTP GET request: http://<base_url>/wps/?Request=DescribeProcess&service=WPS&version=1.0.0&Identifier=MosaicCOM-simple
Note that "Execute" requests require an authentication for which currently only HTTPS is available.
My Profile - User Preferences Configuration
This section explains how to set up your user profile on G-POD. The My Profile menu is divided into the My Account area and the My Publish Servers area.
User And Authentication information
Figure 14 shows the User Information section. The fields are explained below.
__Figure 14 - User Information section in the "My Account" configuration page
First Name and Last Name__: this information is displayed on the header banner.
E-Mail: the email provided is used to notify the user when the G- POD administrator has to communicate important information (e.g. Certification Authority certificate renewal, system downtimes, etc.).
Username: the username has been provided by the G-POD administrator and should not be changed.
Password: the password provided by the G-POD administrator should be changed the first time you log on. This is done by providing the password twice in the Password and Enter Password Again fields.
Affiliation: The name of your organisation.
Country: Your country.
Language: The language for G-POD menus and messages/
Time zone: Your time zone. Once all fields are filled, click the Update button to save the information.
Proxy certificate Information
Please refer to the annex for procedure and details.
Adding a new publish server (other than the G-POD portal) is done using the bottom section of the My Profile control (see Figure 15). The transfer protocol can be selected from a list of supported protocols. According to the selected protocol, further transfer parameters need to be defined in specific fields: Server: the hostname of the destination serverUser: the username on the destination server (not required for GridFTP)Password: the password for the specified user on the destination server (not required for GridFTP and SCP)Path: the absolute or relative path on the destination serverOptions: select "Without SessionID in path" if you do not wish the result files to be put in a folder called like the session ID.Note that the use of GridFTP requires, instead of a username and a password, a valid proxy certificate at the time of the file transfer. SCP requires only a username.
The following table gives examples of the possible settings:
|Protocol||Server||User||Password||Path||Options||URL to results folder|
|sftp||mysftpserver.com||sftpuser||***||mydir||Without session ID in path||sftp://email@example.com/mydir/|
|gridftp||mygridserver.com||Not required||-||mybox||Without session ID in path||gridftp://mygridserver.com/mybox/|
|scp||myserver.com||scpuser||Not required||mydir||Without session ID in path||scp://firstname.lastname@example.org/mydir/|
A new server is defined after the Add button is clicked.
You may modify the settings of an already defined publish server by selecting it from the list (see Figure 15), changing the values in the individual fields and pressing the Update button. To delete a server, select it from the list and click the Delete button.
Figure 15 - Adding a new publish server
Figure 16 - Editing a publish server
Managing G-POD TasksUser Resources
Users are given resource credits, shown next to the proxy lifetime in the banner area (see Figure 2). Resource credits are a measure of the amount of workload users can put on the system. Resource credits are consumed as tasks are launched and are returned as they finish. If your resources are insufficient, the system proposes to place your task for automatic deferred submission.
Tasks are commonly launched with a nominal priority setting corresponding to an optimized trade- off between its execution speed and the amount of parallel system resources it consumes. This task default priority may be altered using the drop down menu displayed at the top of the submission window. The higher the priority assigned, the higher the number of parallel processing nodes allocated for the task, such that high priority submissions usually result in shorter overall processing time when the task jobs implement multi-processing.
The priority levels available to G-POD users at a given time are configured and tuned by the G- POD administrator to optimize the load on the GRID and therefore its performance and shared availability to the G-POD user community. The task priority settings will directly affect the number of resource units consumed for the task; low priority tasks will commonly consume fewer resources while high-priority tasks will consume more.
G-POD AND GRID REGISTRATION
Procedure for Account and Certificate Request
- Send an email to G-POD Administration (email@example.com) requesting an account including the following details: Name, Surname, organization or company, optional ESA/EO project affiliation, list of services you are interested to use
- The EO Grid Administrator validates the request, creates user account on G-POD with rights to access the requested services plus the standard G-POD Demo services.
- You receive your G-POD account details. For good security practice, you should change your password the first time you log in to the Portal.
If you are interested in other services, you can send an email to the G-POD Team (firstname.lastname@example.org) requesting the full list of available services or access to a given service.
G-POD detailed services user manuals
More detailed user manuals for particular G-POD services are accessible from the Services User Manualspage.
For any questions, don't hesitate to email the G-POD administrator eo-gpod at esa.int