Table Of Contents:

Motivation

As part of adding notification support in Pegasus and monitord, we would want monitord to be managed by Condor. Currently, monitord is launched via pegasus-run as a separate process.
In case of system crashes, condor comes up automatically but monitord does not. This is a problem, as we will loose notifications in that case.

Solutions

There are several ways to get monitord to launch via Condor

Monitord is added as an independent condor job in the executable workflow created by Pegasus.

Open Question: How does monitord know workflow has completed?

Currently monitord looks at the dagman out file to determine whether a workflow has finished. Since now monitord is a part of the workflow itself, it cannot rely on that.

Monitord either

Open Question: Discount monitor job in workflow.

There is also the open issue about confusion caused by the extra job in the workflow, which will need to be (un)accounted for by pegasus-statistics, and any form of statistics.

Monitord is a condor job separate from the executable workflow Pegasus creates

In this case either

  1. Pegasus creates the separate condor job outside the worklfow
  2. Or pegasus-run creates the condor job and submits it in addition to submitting the dag file to condor dagman

Open Questions

Wrapper around condor dagman

Open Question

Keep monitord separate Unix process as-is

As shown in the motivation, a user ceases to get notification, if something untoward happens to monitord, or the entire system.

Not use monitord for notifications

Certain fine-grained notification will not be possible. However, we don't have to travel the whole journey in one step. Push Condor to support multiple PRE and POST scripts, and do some notifications from within DAGMan. Not all use-cases can be handled, but it is a start.

Another big disadvantage is no workflow level notifications . This is because of lack of postscripts at the dag level.
In case of hierarchal workflows, we will get notifications for sub workflows as they are jobs in the parent workflow.