Table of Contents |
---|
Introduction
Job instance statistics file contains details about a job instance like name, site on which it ran , runtime etc.
Jobs Statistics File Content
Jobs file contains the following information about jobs in the individual workflow.
Job - the name of the job instance
Site - the site where the job instance ran
CondorQTime(sec.) - the time between submission by DAGMan and the remote Grid submission. It is an estimate of the time spent in the condor q on the submit node .The value is calculated as [GRID_SUBMIT/GLOBUS_SUBMIT/EXECUTE -SUBMIT].The information is obtained from jobstate table
Resource(sec.) - the time between the remote Grid submission and start of remote execution . It is an estimate of the time job spent in the remote queue .The value is calculated as [EXECUTE -GRID_SUBMIT/GLOBUS_SUBMIT].The information is obtained from jobstate table
Runtime(sec.) - the time spent on the resource as seen by Condor DAGMan . Is always >=kickstart .The value is obtained from the local_duration in the job_instance
Kickstart(sec.) - the actual duration of the job in seconds on the remote compute node. The value is obtained from the remote_runtime in the invocation table.
Mutiplier-Factor - multiplier factor from the user-provided profile that is used to multiply the kickstart time on the remote node. This value is in the job_instance table and defaults to 1.
Kickstart_mult(sec.) - the Kickstart time multiplied by the Multiplier-Factor.
Remote-CPU-Time(sec.) - sum of the utime and the stime obtained from the Kickstart invocation record. This value is obtained from the invocation table.
Seqexec(sec.) - the time taken for the completion of a clustered job . This value is obtained from the cluster_duration in the job instance table
Seqexec-Delay(sec.) - the time difference between the time for the completion of a clustered job and sum of all the individual tasks kickstart time . This value is obtained as the difference between the cluster_duration in the job instance table and sum of all the corresponding task's remote_runtime in the invocation table.
Exitcode - exitcode from the job. For clustered jobs, it is the highest exitcode found in all the invocation records
Hostname - host name where the job instance ran
Please find below a diagram showing job states and delays.
Queries
The queries for showing information corresponding to jobs in the workflow.
Original 3.1 Query for Job Statistics
Code Block | ||
---|---|---|
| ||
// API method name: get_job_statistics select jb.job_id, jb_inst.job_instance_id, jb_inst.job_submit_seq, jb.exec_job_id as job_name, jb_inst.site as site, ( (select min(timestamp) FROM jobstate WHERE job_instance_id = jb_inst.job_instance_id and (state = 'GRID_SUBMIT' or state = 'GLOBUS_SUBMIT' or state = 'EXECUTE')) - (select timestamp FROM jobstate WHERE job_instance_id = jb_inst.job_instance_id and state = 'SUBMIT' ) ) as condor_q_time, ( (select min(timestamp) FROM jobstate WHERE job_instance_id = jb_inst.job_instance_id and state = 'EXECUTE' ) - (select timestamp FROM jobstate WHERE job_instance_id = jb_inst.job_instance_id and (state = 'GRID_SUBMIT' or state ='GLOBUS_SUBMIT')) ) as resource_delay, jb_inst.local_duration as runtime, ( (select sum(remote_duration) FROM invocation as invoc WHERE job_instance_id = jb_inst.job_instance_id and wf_id = jb.wf_id and task_submit_seq >=0 GROUP BY job_instance_id) ) as kickstart, ( (select timestamp from jobstate where job_instance_id = jb_inst.job_instance_id and state = 'POST_SCRIPT_TERMINATED') - (select max(timestamp) from jobstate where job_instance_id = jb_inst.job_instance_id and (state ='POST_SCRIPT_STARTED' or state ='JOB_TERMINATED')) ) as post_time, jb_inst.cluster_duration as seqexec FROM job as jb, job_instance as jb_inst WHERE jb_inst.job_id = jb.job_id and jb.wf_id = 3 ORDER BY jb_inst.job_submit_seq |
All Jobs Statistics (with the multiplier factor)
Code Block | ||
---|---|---|
| ||
// API method name: get_job_statistics select jb.job_id, jb_inst.job_instance_id, jb_inst.job_submit_seq, jb.exec_job_id as job_name, jb_inst.site as site, ( (select min(timestamp) FROM jobstate WHERE job_instance_id = jb_inst.job_instance_id and (state = 'GRID_SUBMIT' or state = 'GLOBUS_SUBMIT' or state = 'EXECUTE')) - (select timestamp FROM jobstate WHERE job_instance_id = jb_inst.job_instance_id and state = 'SUBMIT') ) as condor_q_time, ( (select timestamp FROM jobstate where job_instance_id = jb_inst.job_instance_id and state = 'EXECUTE' ) - (select min(timestamp) FROM jobstate where job_instance_id = jb_inst.job_instance_id and (state='SUBMIT' or state = 'GRID_SUBMIT' or state ='GLOBUS_SUBMIT')) ) as resource_delay, jb_inst.local_duration as runtime, ( (select sum(remote_duration) FROM invocation as invoc WHERE job_instance_id = jb_inst.job_instance_id and wf_id = jb.wf_id and task_submit_seq >=0 GROUP BY job_instance_id) ) as kickstart, ( (select timestamp from jobstate where job_instance_id = jb_inst.job_instance_id and state = 'POST_SCRIPT_TERMINATED') - (select max(timestamp) from jobstate where job_instance_id = jb_inst.job_instance_id and (state ='POST_SCRIPT_STARTED' or state ='JOB_TERMINATED')) ) as post_time, jb_inst.cluster_duration as seqexec, ( (select max(exitcode) from invocation as invoc where job_instance_id = jb_inst.job_instance_id and wf_id = jb.wf_id and task_submit_seq >=0 group by job_instance_id) ) as exit_code, ( (select h.hostname from host h, job_instance ji where ji.job_instance_id = jb_inst.job_instance_id and h.host_id = ji.host_id and h.wf_id = 1 GROUP BY ji.job_instance_id) ) as host_name, multiplier_factor, ( (select sum(remote_duration * multiplier_factor) FROM invocation as invoc WHERE job_instance_id = jb_inst.job_instance_id and wf_id = jb.wf_id and task_submit_seq >=0 GROUP BY job_instance_id) ) as kickstart_multi, ( (select sum(remote_cpu_time) FROM invocation as invoc WHERE job_instance_id = jb_inst.job_instance_id and wf_id = jb.wf_id and task_submit_seq >=0 GROUP BY job_instance_id) ) as remote_cpu_time |