information technology services
Torque, a variant of PBS is installed on all the nodes. The Torque Users Guide contains information on how to submit jobs.
There is one queue and all jobs will be submitted to this queue. However, the maximum time that a job is permitted to use a node (called wall clock time, or just wall time and is the time from when a job starts to when it completes.) is restricted as follows:
- Several nodes nodes are reserved for jobs requesting 24 hours or less of wall time.
- One node on is restricted to jobs running 12 hours or less
- The rest are restricted to 5 days (120 hours).
Note: Depending on current job requirements, there may be some variation to the above. To see the current configuration, login to the cluster and type "pbshosts". This will show you the number of cores, the amount of memory, the maximum wallclock time and the property to use if you want to specify a specific type of node. The command "pbshosts -f" will show this information only for the nodes which have free processors.
Jobs requesting 12 hours or less may be scheduled on any of the nodes. Short jobs, such as these, can be used to fill in scheduling gaps created by reserving processors for parallel jobs.
Our goal is to be able to accomodate longer running jobs, and at the same time get shorter jobs scheduled promptly. If there is difficulty scheduling your jobs, please contact Barry Schaudt(email@example.com). There are many things we can do. If it is a one time request, we will try to accomodate your request (For example, you need to run for more than 5 days). If it is something you will do frequently, the Cluster Policy Committee will look at the queue structure.
We use several factors in scheduling jobs: The number of processors requested, the time submitted, and the groups current fairshare usage. This means the priority of jobs waiting to run may not be the order in which they were submitted.
Fairshare is used to as one factor in the priority of jobs waiting to run. Each group is assigned a fairshare target. The fairshare target corresponds to a percentage of the system used over a period of time. Currently, each group has the same fairshare target. When a group exceeds their fairshare target, all other factors being equal, jobs by the group will have a lower priority for scheduling than jobs by groups who haven't exceeded their fair share.
Another factor used in scheduling jobs is the number of processors requested. The number of processors is used in job scheduling because parallel jobs need to reserve processors in order to accumulate enough processors to run. As jobs finish, processors can be idle until enough processors are available for the parallel job to run. Small and/or short jobs can run on the idle but reserved processors. The process of scheduling small and/or short jobs on the unused but reserved processors is called backfill scheduling. It is easier and more efficient to give parallel jobs a high priority and schedule smaller jobs around the parallel jobs than scheduling small jobs first.
The last factor used is the time the job was submitted. We also want all jobs to run. So, as a job waits in the queue, its priority will increase.
How to Submit Jobs
See the PBS (Torque) User Guide .