Introduction

Workflow statistics file contains the details of each individual workflow in a tabular format. 

Workflow Statistics File Content

Workflow Summary contains the following information.

The diagram below shows the format of the workflow statistics file.

 

 

Succeeded

Failed 

Incomplete

Total

 

Retries

Total Run

Workflow Retries

Sub wf 1

Tasks

 

 

 

 

 

 

 

 

 

Jobs

 

 

 

 

 

 

 

 

 

Sub Workflows

 

 

 

 

 

 

 

 

Sub wf 2

Tasks

 

 

 

 

 

 

 

 

 

Jobs

 

 

 

 

 

 

 

 

 

Sub Workflows

 

 

 

 

 

 

 

 

Root wf

Tasks

 

 

 

 

 

 

 

 

 

Jobs

 

 

 

 

 

 

 

 

 

Sub Workflows

 

 

 

 

 

 

 

 

Total

Tasks

 

 

 

 

 

 

 

 

 

Jobs

 

 

 

 

 

 

 

 

 

Sub Workflows

 

 

 

 

 

 

 

 

Legends

The diagram below shows the state diagram of a job instance.

Succeeded :-
Tasks - Number of succeeded tasks . 
Jobs -Number of  succeeded jobs. 
Sub Workflows - Number of succeeded sub workflows . 

Failed :-
Tasks - Number of failed tasks.
Jobs -Number of failed jobs. 
Sub Workflows - Number of failed sub workflows.

Incomplete :-
Tasks - Number of tasks that are not in succeeded or failed state.
Jobs- Number of jobs that are not in succeeded or failed state.
Sub Workflows - Number of sub workflows that are not in succeeded or failed state.

Total :-
Tasks - Total number of abstract tasks.
Jobs -Total number of planned jobs. 
Sub Workflows - Total number of sub workflows.

Retries :-
Tasks - Total number of task retries.
Jobs - Total number of job retries.
Sub Workflows - Total number of sub workflow retries.

Total Run :-
Tasks - Total number of invocations that were executed during workflow run. This value is a cumulative of retries , if a task was executed more than once.
Jobs -Total number of job instances that where executed during workflow run.This value is a cumulative of retries , if a job was instantiated more than once.  
Sub Workflows - Total number of sub workflows that where executed during workflow run. This value is a cumulative of retries , if a sub workflow was executed more than once.

Workflow Retries :- Total number of workflow retries.

How to calculate the values 

Jobs is any entry in the job table with type_desc not (dax or dag)
Task is any entry in the task table with type_desc not (dax or dag)
Sub workflow is any entry the job table or task table with type_desc (dax or dag)

Succeeded :-
Tasks - Total count of all task last retry with exitcode ==0 in invocation table
Jobs - Total count of all job last retry with last state in jobstate table either JOB_SUCCESS or POST_SCRIPT_SUCCESS state
Sub Workflows - Total count of all sub workflow last retry with last state in jobstate table either 'JOB_SUCCESS' or 'POST_SCRIPT_SUCCESS' state

Failed :-
Tasks - Total count of all task last retry with exitcode <> 0 in invocation table
Jobs -Total count of all job last retry with jobstate table having 'PRE_SCRIPT_FAILED','SUBMIT_FAILED','JOB_FAILURE' ,'POST_SCRIPT_FAILED' state
Sub Workflows - Total count of all sub workflow last retry with jobstate table having 'PRE_SCRIPT_FAILED','SUBMIT_FAILED','JOB_FAILURE' ,'POST_SCRIPT_FAILED' state

Incomplete :-
Tasks -Total tasks - (Failed tasks + Succeeded tasks).
Jobs-  Total jobs-(Failed job count + Succeeded Job count)
Sub Workflows-  Total sub workflows -(Failed sub workflow count + Succeeded sub workflow count)

Total :-
Tasks - Total count of tasks in the task table
Jobs - Total count of jobs in the job table.
Sub Workflows - Count of sub workflows in the job table

Retries :-
Task - This will be difference between the total number of tasks in the invocation table and total number of unique instance in invocation .
Jobs - This will be difference between the total number of job instances in the job_instance table and total number of unique job instance in job_instance table.
Sub Workflows - This will be difference between the total number of sub workflow instances in the jobinstance table and total number of unique sub workflow instance in jobinstance table .

Workflow Retries :- Maximum restart_count value of a workflow in the workflowstate table.

Examples

Case 1

 
Figure 1 :- Hierarchal Workflow [Successful Run]

The example in Figure 1 is a hierarchical work flow with 4 tasks in DAX A and 4 tasks in DAX B.A3 is sub workflow task.

 

 

Succeeded

Failed 

Incomplete

Total

 

Retries

Total Run

Workflow Retries

Sub wf1

 

 

 

 

 

 

 

 

0

 

Jobs

4

0

0

4

 

0

4

 

 

Tasks

4

0

0

4

 

0

4

 

 

Sub  Workflows

0

0

0

0

 

0

0

 

Root wf

 

 

 

 

 

 

 

 

0

 

Tasks

3

0

0

3

 

0

3

 

 

Jobs

3

0

0

3

 

0

3

 

 

Sub  Workflows

1

0

0

1

 

0

1

 

Total

 

 

 

 

 

 

 

 

0

 

Tasks

7

0

0

7

 

0

7

 

 

Jobs

7

0

0

7

 

0

7

 

 

Sub  Workflows

1

0

0

1

 

0

1

 

Table 1.1 Workflow statistics file

Note: For the sake of simplicity Jobs row consider only compute jobs.

Case 2


Figure 2 Hierarchal workflow [Failed Run]

The example in Figure 2 is a hierarchical work flow with 4 tasks in DAX A and 4 tasks in DAX B.However the A3 sub workflow tasks fails at the Prescript which results in DAX B workflow not getting planned . So the database is not populated with DAX B workflow details.

 

 

Succeeded

Failed 

Incomplete

Total

 

Retries

Total Run

Workflow Retries

Root wf

 

 

 

 

 

 

 

 

0

 

Tasks

2

0

1

3

 

0

2

 

 

Jobs

2

0

1

3

 

0

2

 

 

Sub Workflows

0

1

0

1

 

2

3

 

Total

 

 

 

 

 

 

 

 

0

 

Tasks

2

0

1

3

 

0

2

 

 

Jobs

2

0

1

3

 

0

2

 

 

Sub Workflows

0

1

0

1

 

2

3

 

Table 2.1 Workflow statistics file

Note: For the sake of simplicity Jobs row consider only compute jobs.

Queries

The queries for work flow summary and workflow statistics are same other than the change to specify whether to calculate against all the workflow id or single workflow id. 

Sub Workflows

//API method name : get_sub_workflow_ids() expand_workflow = false,
SELECT wf_id , wf_uuid , dax_label FROM workflow as wf where wf.parent_wf_id = 1

Total jobs

// API method name : get_total_jobs_status() expand_workflow = false,

//job_filter = all
select
  • No labels