1. Total number of workflows executed
  2. Number of workflows executed per day
  3. Total Runtime of each workflow
    1. Sum of task durations with and without pre/post script
  4. Total Walltime of each workflow
    1. Sum of (difference between dagman end and start times)
  5. Total number of workflow jobs/tasks executed
    1. This means total jobs and tasks (including failed, repeats, success what not. These are all the job executions for a given workflow
    2. Total number of workflow jobs and tasks that failed
    3. Total number of workflow jobs and tasks that succeeded
  6. Total number of workflow jobs/tasks that were automatically retried
    1. e.g. select count(job_id) from job where count(job_submit_seq)>1 and wf_uuid=xyz;
  7. workflow jobs/tasks that were retried (breakdown by jobmaname or transformation;
    1. e..g select job_name, count(job_id) from job where count(job_submit_seq)>1 and wf_uuid=xyz group by job_name;
  8. Number of jobs/tasks executed per day, per week , per month, per hour, per year
  9. We can also do the above by job type (data transfer in/out, registration, application, other Pegasus jobs)

We can graph workflows/jobs/tasks over time

We also should be able to get the overheads for the jobs (cumulative and average), DAGMan overhead, amount of time spent in the Condor Q, time from release to the queue to running, Kickstart overhead. We should also be able to quantify the % overhead in relation to the overall job time.

Related to resource utilization

  1. Number of jobs executed on a host/glidein for a particular provisioning request
  2. Average number of idle jobs in the queue over time (also maybe min/max)
  3. Average number of running jobs over time (also maybe min/max)
  4. Average number of idle glide-ins over time (and min/max)

Corral

  1. Number of request made (and for how many processors)
  2. Number of failed requests
  3. Number of automatic resubmissions
  • No labels