User Tools

Site Tools


cigri_v3_users_documentation

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cigri_v3_users_documentation [2013/11/06 21:26] (current)
bzizou created
Line 1: Line 1:
 +====== ​ CiGri User Documentation ​ ======
 +
 +; Authors
 +> Bruno Bzeznik, Ghislain Charrier
 +; Contact
 +> [[mailto:​cigri-devel@lists.gforge.inria.fr|cigri-devel@lists.gforge.inria.fr]]
 +; Organization
 +> LIG laboratory
 +; Address
 +> Laboratoire d'​Informatique de Grenoble Bat. ENSIMAG - antenne de Montbonnot ZIRST 51, avenue Jean Kuntzmann 38330 MONTBONNOT SAINT MARTIN
 +; Status
 +> Testing
 +; Copyright
 +> Licenced under the GNU General Public License
 +
 +; :Abstract:
 +> Cigri is a tool for multiparametric jobs submissions on a ligweight computing grid. It is built to run over a set of clusters managed by the OAR resources and job management system.
 +
 +; Dedication
 +> For users.
 +
 +
 +-----
 +
 +=====  Cigri Tour  =====
 +
 +====  General Presentation ​ ====
 +
 +Cigri is a campaign management tool. It is design to run on top of multiple clusters each managed by a batch scheduler.
 +
 +====  Campaigns ​ ====
 +
 +A campaign is a set of jobs that have to be executed. In our context, we consider that all the jobs in a campaign are similar. In other terms, all the jobs of a campaign use the same executable with different parameters. It can be the same program executed repetitively with different parameters. A typical Monte-Carlo campaign using a seed for its random generator could be schematized by:
 +
 +<​code>​for i in 0..1 000 000
 +  program.exe i
 +end<​\/​code>​
 +====  Cigri Features ​ ====
 +
 +Cigri includes many features including but not limited to:
 +
 +  * Multiple campaigns management
 +  * Multiple users
 +  * Different campaigns types
 +  * Automatic resubmission
 +
 +TODO
 +
 +====  Campaigns types  ====
 +
 +Cigri distinguishes 4 different types of campaigns:
 +
 +  * **Normal** campaigns: with this type of campaigns, Cigri submits jobs to the batch schedulers. Normal campaigns are the best for the users because the jobs are assured to have the requested time. However, because the first role of Cigri is to use idle resources with minimum impact on the other users, this type of campaign will most likely require an authorization from the admins.
 +  * **Best-effort** campaigns: this type of campaigns submits jobs in a best-effort mode to the batch scheduler. This means that when resources are needed by a non best-effort job, the campaign job will be killed and will have to be resubmitted later. This type of campaign can take advantage of idle resources while not disturbing the platform. However, due to the likeliness that jobs may be killed, it is better is jobs are small or checkpointable.
 +  * **Semi-best-effort** campaigns: the semi-best-effort campaign is a mix of the two previous policies. During the day, jobs are submitted in a best-effort mode and during the night, normal submissions are used. This ensures that jobs execution progresses during the night.
 +  * **Nightly** campaigns: for some kind of jobs (long and parallel ones for example) trying to execute jobs is a best-effort mode has no purpose as they will get killed most of the time. Resources would just be wasted. Therefore, for this kind of jobs, it is better only to use normal submissions during the night in order to let resources to the other users of the platform during the day.
 +
 +=====  Job Description Language (JDL)  =====
 +
 +To describe a campaign, we use a Job Description Language (JDL). The JDL is based on JSON <​ref>​See [[http://​www.json.org|http://​www.json.org]]/​ for more information about JSON
 +</​ref>​.
 +
 +The JDL has 2 main parts:
 +
 +  - The global settings
 +  - The cluster settings
 +
 +**Emphasized** values correspond to the default.
 +
 +Attributes followed by a "​*"​ are mandatory.
 +
 +====  Global Settings ​ ====
 +
 +  * name*: Name of the campaign
 +  * clusters*: list of the clusters where the campaign should run. See `Cluster Settings`_
 +  * param_file: path to the file containing all the parameters to run for the campaign
 +  * nb_jobs: number of jobs
 +  * params: array of parameters
 +  * jobs_type:
 +    * **normal**: jobs using the param_file or nb_jobs
 +    * desktop_computing:​ jobs launched with always the same parameters
 +  * Any field described in `Cluster Settings`_
 +
 +====  Cluster Settings ​ ====
 +
 +Settings in this section can be defined in the global section to act as value on all clusters.
 +
 +  * type: Values other than best-effort may require approval from platform admins
 +    * **best-effort**:​ jobs are executed day and night as best-effort
 +    * semi-best-effort:​ jobs are executed as best-effort during the day and as normal submissions during the night
 +    * nightly: jobs are only executed as normal submissions during the night
 +    * normal: jobs are executed as normal submissions during the day and the night
 +  * walltime: maximum duration of the jobs
 +    * **Default** defined in Cigri configuration file
 +  * exec_file*: script to execute
 +  * exec_directory:​ path to a directory execution.
 +    * **Default**:​ $HOME
 +  * resources: resources that are asked to the underlying batch scheduler (-l in OAR)
 +    * **Default**:​ /<​resource_unit>​=1. Resource_unit is defined per cluster and can therefore be different between 2 clusters. Users should answer this field.
 +  * properties: properties passed to OAR to select resources
 +  * prologue: commands that are executed before the first job on each cluster
 +  * epilogue: commands that are executed at the end of a campaign
 +  * prologue_walltime:​ specific walltime for the prologue
 +  * epilogue_walltime:​ specific walltime for the epilogue
 +  * output_gathering_method:​ method to use to gather results in a single place
 +    * **None**
 +    * iRods: files will be put in iRods at the end of the execution
 +    * collector: a collector will pass regularly to gather files
 +    * scp: a simple scp will be done on the output files after the completion
 +  * output_file:​ file or directory to save
 +  * output_destination:​ some server (not used with iRods) where output files will be gathered
 +  * dimensional_grouping:​ allow to execute several jobs in parallel in a single submission if possible
 +    * true
 +    * **false**
 +  * temporal_grouping:​ allow to execute several jobs one after the other in a single submission. The number of jobs is computed automatically by Cigri
 +    * **true**
 +    * false
 +  * checkpointing_type:​
 +    * **None**
 +    * BLCR
 +    * ...
 +  * test_mode: when test_mode is enabled, only one job per active cluster is submitted into normal mode even if best-effort is enabled. The jobs of such a campaign are also executed prior to other campaigns. This allow testing of a campaign without sending all the jobs and with less waiting.
 +    * true
 +    * **false**
 +  * max_jobs: limit the number of jobs submitted for the current campaign on the cluster. This is useful when for example, your jobs are doing a lot of i/o and they may crash distributed filesystems if too many occurences are running.
 +    * **None**
 +    * <​integer>​
 +
 +====  Example of JDL  ====
 +
 +<dl>
 +<​dt>​Here is an example of a JDL file described in JSON:</​dt>​
 +<​dd><​dl>​
 +<​dt>​literal</​dt>​
 +<dd>
 +</​dd></​dl>​
 +</dd>
 +<​dt>​{</​dt>​
 +<​dd><​p>"​name":​ "Some campaign",​ "​nb_jobs":​ 2, "​resources":​ "​nodes=1",​ "​exec_file":​ "​$HOME/​script.sh",​ "​output_gathering_method":​ "​scp",​ "​output_destination":​ "​my.dataserver.fr",​ "​clusters":​ { "​tchernobyl":​ { }, "​my.other_cluster.fr":​ { }, "​fukushima":​ { "​exec_file":​ "​$HOME/​path/​script"​ } }</​p></​dd></​dl>​
 +
 +}
 +
 +=====  Client tools  =====
 +
 +This chapter describes the client tools available to the users for interacting with the grid. Most of the CLI tools (gridsub, gristat, gridevents, gridnotify,​...) have a minimal help that is printed with the -h option.
 +
 +====  gridsub ​ ====
 +
 +====  gridstat ​ ====
 +
 +====  gridnotify ​ ====
 +
 +====  gridevents ​ ====
 +
 +=====  REST API  =====
 +
 +Cigri offers a REST API accessible through HTTP.
 +
 +====  URLs  ====
 +
 +<​table>​
 +<​thead>​
 +<tr class="​header">​
 +<th align="​left">​HTTPrequest</​th>​
 +<th align="​left">​URL</​th>​
 +<th align="​left">​Purpose</​th>​
 +</tr>
 +</​thead>​
 +<​tbody>​
 +<tr class="​odd">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/</​td>​
 +<td align="​left">​List the available links</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​clusters</​td>​
 +<td align="​left">​List all clusters available in Cigri</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​clusters/<​cluster_id></​td>​
 +<td align="​left">​Get details on a specific cluster</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​campaigns</​td>​
 +<td align="​left">​List of all running campaigns</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​campaigns/<​campaign_id></​td>​
 +<td align="​left">​Get details on a specific campaign</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​campaigns/<​campaign_id>/​jdl</​td>​
 +<td align="​left">​Get the expanded JDL of a campaign</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​campaigns/<​campaign_id>/​jobs</​td>​
 +<td align="​left">​List all jobs of a specific campaign (See `API options`_)</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​campaigns/<​campaign_id>/​jobs/<​job_id></​td>​
 +<td align="​left">​Get details of a specific job of a specific campaign</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​POST</​td>​
 +<td align="​left">/​campaigns</​td>​
 +<td align="​left">​Submit a new campaign</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​PUT</​td>​
 +<td align="​left">/​campaigns/<​campaign_id></​td>​
 +<td align="​left">​Update a campaign (status, name)</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​DELETE</​td>​
 +<td align="​left">/​campaigns/<​campaign_id></​td>​
 +<td align="​left">​Delete a campaign</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​campaigns/<​campaign_id>/​events</​td>​
 +<td align="​left">​List the open events for the given campaign</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​DELETE</​td>​
 +<td align="​left">/​campaigns/<​campaign_id>/​events</​td>​
 +<td align="​left">​Fix (close) all the events for the given campaign</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​notifications</​td>​
 +<td align="​left">​List notification subscriptions for the current user</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​POST</​td>​
 +<td align="​left">/​notifications/​mail</​td>​
 +<td align="​left">​Subscribe to the mail notification service</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​POST</​td>​
 +<td align="​left">/​notifications/​jabber</​td>​
 +<td align="​left">​Subscribe to the jabber notification service</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​DELETE</​td>​
 +<td align="​left">/​notifications/<​mail|jabber></​td>​
 +<td align="​left">​Unsubscribe from a notification service</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​events/<​id></​td>​
 +<td align="​left">​Get a specific event</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​DELETE</​td>​
 +<td align="​left">/​events/<​id></​td>​
 +<td align="​left">​Fix (close) a specific event</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​DELETE</​td>​
 +<td align="​left">/​events/<​id>?​resubmit</​td>​
 +<td align="​left">​Fix (close) a specific event and resubmit the job</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​gridusage</​td>​
 +<td align="​left">​Get the current usage state of the grid</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​GET</​td>​
 +<td align="​left">/​gridusage?​from=<​date>&​to=<​date></​td>​
 +<td align="​left">​Get usage stats between two dates (unix timestamps)</​td>​
 +</tr>
 +</​tbody>​
 +</​table>​
 +
 +====  Accessing the API  ====
 +
 +Getting the links available on the server:
 +
 +<​code>​$ curl http://​api-host:​port
 +{"​links":​[[{"​href":"/","​rel":"​self"​},​{"​href":"/​campaigns","​title":"​campaigns","​rel":"​campaigns"​},​{"​href":"/​clusters","​title":"​clusters","​rel":"​clusters"​}]]}<​\/​code>​
 +When posting a campaign, the JSON containing the ID of the submitted campaign is returned:
 +
 +<​code>​$ curl -X POST http://​api-host:​port/​campaigns -d '​{"​name":"​n",​ "​nb_jobs":​0,"​clusters":​{"​fukushima":​{"​exec_file":""​}}}'​
 +{"​id":"​585","​links":​[[{"​href":"/​campaigns/​585","​rel":"​self"​},​{"​href":"/​campaigns","​rel":"​parent"​}]]}<​\/​code>​
 +====  Return codes  ====
 +
 +Each action done through the API will return a code in the HTTP header. The list of the codes is described here:
 +
 +<​table>​
 +<​thead>​
 +<tr class="​header">​
 +<th align="​left">​Code</​th>​
 +<th align="​left">​HTTPrequest</​th>​
 +<th align="​left">​Meaning</​th>​
 +</tr>
 +</​thead>​
 +<​tbody>​
 +<tr class="​odd">​
 +<td align="​left">​200</​td>​
 +<td align="​left">​GET</​td>​
 +<td align="​left">​Request successful: everything went well :​)</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​201</​td>​
 +<td align="​left">​POST</​td>​
 +<td align="​left">​Resource created: the campaign has been submitted</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​202</​td>​
 +<td align="​left">​PUT,​ DELETE</​td>​
 +<td align="​left">​Accepted:​ modifications done</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​400</​td>​
 +<td align="​left">​POST,​ PUT</​td>​
 +<td align="​left">​Bad request: see the body of the answer for details</​td>​
 +</tr>
 +<tr class="​odd">​
 +<td align="​left">​403</​td>​
 +<td align="​left">​POST,​ PUT, DELETE</​td>​
 +<td align="​left">​Forbidden:​ see response for details</​td>​
 +</tr>
 +<tr class="​even">​
 +<td align="​left">​404</​td>​
 +<td align="​left">​GET,​ POST, PUT, DELETE</​td>​
 +<td align="​left">​Page not found: the URL does not exist</​td>​
 +</tr>
 +</​tbody>​
 +</​table>​
 +
 +Exemples:
 +
 +<​code>​$ curl -i http://​api-host:​port
 +  HTTP/1.1 200 OK 
 +$ curl -i -X DELETE http://​api-host:​port/​campaigns/​1
 +  HTTP/1.1 403 Forbidden ​
 +$ curl -i -X POST http://​api-host:​port/​campaigns -d '​{"​name":"​n",​ "​nb_jobs":​2,"​clusters":​{"​cluster1":​{"​exec_file":"​toto.sh"​}}}'​
 +  HTTP/1.1 201 Created <​\/​code>​
 +====  API options ​ ====
 +
 +Options that can be passed in the URL with their default value in parenthesis:​
 +
 +<ul>
 +<​li><​p>​**pretty** (false): Will display the answered JSON in a more readable format (but larger). Only not giving the option or putting it to false will disable it:</​p>​
 +<​p>>></​p>​
 +<p>$ curl http://​api-host:​port?​pretty => pretty print on $ curl http://​api-host:​port?​pretty=true => pretty print on $ curl http://​api-host:​port?​pretty=whatever => pretty print on $ curl http://​api-host:​port => pretty print off $ curl http://​api-host:​port?​pretty=false => pretty print off</​p></​li>​
 +<​li><​p>​**limit** (100) and **offset** (0): Some resources may contain many items, therefore, only a subset of them are displayed.:</​p>​
 +<​p>>></​p>​
 +<p>$ curl http://​api-host:​port/​campaigns/<​campaign_id>/​jobs => display the first 100 jobs $ curl http://​api-host:​port/​campaigns/<​campaign_id>/​jobs?​limit=23 => display the first 23 jobs $ curl http://​api-host:​port/​campaigns/<​campaign_id>/​jobs?​limit=12&​offset=50 => display jobs 50 to 62</​p></​li></​ul>​
 +
 +<​references />
  
cigri_v3_users_documentation.txt ยท Last modified: 2013/11/06 21:26 by bzizou