Workflows

The Workflows Service is an API and workflow exectuor that enables users to automate and execute their research computing tasks in the Tapis Ecosystem.


Overview

Before getting into the details about how to create and run a workflow, here are some important concepts you must understand in order to properly use Tapis Workflows

Important Concepts

  • Group: A group is a collection of Tapis users that own or have access to workflow resources. In order to create your first workflow, you must first belong to, or create your own group.

  • Pipeline: A pipeline is a collection of tasks and a set of rules governing how those tasks are to be executed.

  • Archive: An archive is the storage medium for the results created during a pipeline run. By default, the results produced by each task are deleted at the end of a pipeline run.

  • Identity: An identity is a mapping between a Tapis identity and an external identity. An example of an external identity is a Github account or Dockerhub account.

  • Task: Tasks are discrete units of work performed during the execution of a workflow. They can be represented as nodes on a directed acyclic graph (DAG), with the order of their execution determined by their dependencies, and where all tasks without dependencies are executed first. There are different types of tasks that users can leverage to perform diffent types of work. These are called task primitives and they will be discussed in detail later:

    Type

    Example

    Supported

    image_build

    Builds Docker and Singularity images from recipe files and pushes the to repositories

    yes

    request

    Sends requests using various protocols to resources external to the workflow (Only HTTP protocol and GET currently fully supported)

    partial

    tapis_job

    Submits a Tapis job

    partial

    tapis_actor

    Executes an Tapis actor

    no

    container_run

    Runs a container based on the provided image and tag

    no

    function

    Runs user-defined code in the language and runtime of their choice

    no


Note

Security Note

In order to create and run workflows that are automated and reproducible, the Workflow Executor must sometimes be furnished with secrets(passwords, access keys, access secrets, ssh keys, etc) that enable it access restricted resources.

To ensure the safe storage and retrieval of this sensitive data, the Workflows service integrates with Tapis SK, a built-in secrets management service backed by Hashicorp Vault.

It is also important to note that, when a user creates a task that accesses some restricted resource, the Workflow Executor will execute that task on behalf of that user with the credentials that they provided for every run of the pipeline. If those credentials expire, or the user has their access revoked for those resources, the pipeline run will fail on that task.


Quick start

We will be creating a pipeline that
  • pulls code from a private Github repository

  • builds an image from a Dockerfile located in that source code

  • then pushes the resultant image to a Dockerhub image registry

In the examples below we assume you are using the TACC tenant with a base URL of tacc.tapis.io and that you have authenticated using PySDK or obtained an authorization token and stored it in the environment variable JWT, or perhaps both.

Summary of Steps

  1. Create a Group

  2. Create the Identities that the workflow executor will use to access both Github and Dockerhub on your behalf

  3. Create an Archive to which the results of the pipeline run will be persisted

  4. Create the Pipeline and its Tasks which act as instructions to the workflow executor

Creating a Group

Create a local file named group.json with json similar to the following:

{
  "id": "<group_id>",
  "users": [
      {
        "username":"<user_id>",
        "is_admin": true
      }
  ]
}

Note

You do not need to add your own Tapis id to the users list. The owner of the Group is added by default.

Replace <group_id> with your desired group id and <user_id> in the user objects with the tapis user ids of the other users that you want to grant access to this group’s workflow resources.

Submit the definition.

curl -X POST -H "content-type: application/json" -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups -d @group.json

Creating Identities

We will be creating 2 identity mappings. One for Github and one for Dockerhub. After creating the identities, we will need to retrieve the UUIDs of the newly created identities. You can do this in a separate call, or simple grab the UUID from the url in the result after each operation.

Warning

Do NOT commit these files to source control!

Create the first file named github-identity.json with the following json:

{
  "type": "github",
  "name": "my-github-identity",
  "description": "My github identity",
  "credentials": {
    "username": "<github_username>",
    "personal_access_token": "<github_personal_access_token>"
  }
}

Then submit the definition

curl -X POST -H "content-type: application/json" -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/identities -d @github-identity.json

Create the second file named dockerhub-identity.json with the following json

{
    "type": "dockerhub",
    "name": "my-dockerhub-identity",
    "description": "My Dockerhub identity",
    "credentials": {
      "username": "<docerkhub_username>",
      "token": "<dockerhub_access_token>"
    }
}

Then submit the definition

curl -X POST -H "content-type: application/json" -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/identities -d @dockerhub-identity.json

Creating an Archive

In this step, we create the Archive. The results of the pipeline run will be persisted to the archive.

Note

This step requires that you have “MODIFY” permissions on some Tapis System. If you do not have access to one, you can create it following the instruction in the “Systems” section.

Create a local file named archive.json with json similar to the following:

{
  "id": "my-sample-archive",
  "type": "system",
  "system_id": "<your-tapis-system-id>",
  "archive_dir": "/workflows/archive/"
}

Note

The archive_dir is relative to your system’s rootDir. You can change this value to whatever you like.

curl -X POST -H "content-type: application/json" -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups/<group_id>/archives -d @archive.json

Creating a Pipeline

In this step, we define the pipeline. There are many more properties that can be defined at both the pipeline and task level, but for simplicity, we will be leaving them out.

Create a local file named pipeline.json with json similar to the following:

{
  "id": "my-sample-workflow",
  "archives": [ "<archive_id>" ]
  "tasks": [
    {
      "id": "my-image-build",
      "type": "image_build",
      "builder": "kaniko",
      "context": {
          "branch": "main",
          "recipe_file_path": "<path/to>/Dockerfile",
          "sub_path": null,
          "type": "github",
          "url": "<account>/<repo>",
          "visibility": "private",
          "identity_uuid": "<github_identity_uuid>"
      },
      "destination": {
          "tag": "<some_image_tag>",
          "type": "dockerhub",
          "url": "<account>/<registry>",
          "identity_uuid": "<dockerhub_identity_uuid>"
      }
    }
  ]
}

Go through the definition above and replace all of the placeholders with the correct values.

curl -X POST -H "content-type: application/json" -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups/<group_id>/pipelines -d @pipeline.json

Triggering the Workflow

Now it’s time to run the pipeline.

curl -X POST -H "content-type: application/json" -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups/<group_id>/pipelines/<pipeline_id>/events -d "{}"

After the pipeline has finished running, take a look in your Dockerhub image repository and you will find you newly pushed image.

If you SSH into the Tapis System that you selected as your archive, you will also find that you have some new directories and files in your rootDir;

/workflows/archive/<UUID of the pipeline run>/my-image-build/output/.stdout.

If you want to find the output for any task for a given pipeline run, simply navigate to the archive directory, cd into directory with the pipline run UUID, then cd into the directory with that task’s name. Inside the output/ directory, you will find all of the data created by that task.


Groups

A group is a collection of Tapis users that own or have access to workflow resources. In order to create your first workflow, you must first belong to, or create a group. A group and all of its resources are tied to a particular tenant which is resolved from the url of the request to create the group.

Groups have an id which must be unique within the tenant to which it belongs. Groups also have users. This is a simple list of user objects where user.username is the Tapis username/id and is_admin is a Boolean. Users with admin permissions i.e. is_admin == true are able to add and remove users from a group. Only the owner of a group may add or delete other admin users.

Group Attributes Table

Attribute

Type

Example

Notes

id

String

my.group

  • Must be unique within the tenant.

owner

String

someuser

  • Cannot be removed from group unless ownership is transferred.

tenant_id

String

tacc

  • Automatically set at create-time. Determined from the url of the request. Does not need to be specified.

uuid

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • Automatically set at create-time

users

Array(Users)

GroupUser Attributes Table

Attribute

Type

Example

Notes

username

String

jsmith

  • Must be unique within the group.

is_admin

Boolean

True

  • Must be an admin or group owner to add a user as admin

uuid

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • Automatically set at create-time

group

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • Automatically set at create-time

Retrieval

Retrieve details for a specific group

curl -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups/<group_id>

The response should look similar to the following

{
  "success": true,
  "status": 200,
  "message": "Success",
  "result": {
    "id": "my.group",
    "owner": "someuser",
    "tenant_id": "tacc",
    "uuid": "f60fdf8a-4ceb-4273-b49f-4c0dd94111c3",
    "users": [
      {
        "group": "f60fdf8a-4ceb-4273-b49f-4c0dd94111c3",
        "username": "someuser",
        "is_admin": true,
        "uuid": "c6b7acfd-da4b-4a1d-acbd-adbfa6aa4057"
      },
      {
        "group": "f60fdf8a-4ceb-4273-b49f-4c0dd94111c3",
        "username": "anotheruser",
        "is_admin": false,
        "uuid": "d6ca476a-2c19-4168-8054-264bcaaa70e7"
      }
    ]
  }
}

Deletion

Groups may only be deleted by the group owner. Upon deletion of a group, every workflow object owned by the group will also be deleted; pipelines, tasks, archives, etc.


Pipelines

A pipeline is a collection of tasks and a set of rules governing how those tasks are to be executed.

Pipeline Attributes Table

Attribute

Type

Example

Notes

id

String

my.pipeline

  • Must be unique within the group

uuid

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • Globally unique identifier for the pipeline

owner

String

jsmith

  • The only user that can delete the pipeline

group

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • The uuid of the group that owns this pipeline

last_run

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • The UUID of the previous pipeline run

current_run

String(UUIDv4)

e48ada7a-56b4-4d48-974c-7574d51a8789

  • The UUID of the current running pipeline

tasks

Array[Task]

See the Task section for the Task object

execution_profile

Object

See table below

Execution Profile Attributes Table

Overrides the default behavior of the Workflow Executor regarding task retries, backoff algorithm, max lifetime of the pipeline, etc. All Execution Profile properties of the pipeline are inherited by the tasks that belong to the pipeline unless otherwise specified in the task definition.

Attribute

Type

Example

Notes

max_retries

Int

0, 3, 10

  • The number of times that the Workflow Executor will try to rerun the task if it fails. Defualts to 0

invocation_mode

Enum

async, sync

  • Default is “async”. When “async” is selected, all tasks will be executed concurrently. Currently, async is the only support option

retry_policy

Enum

exponential_backoff

  • Dictates which policy to employ when restarting tasks. Default(and only supported option) is exponential_backoff

max_exec_time

Int

60, 3600, 10800

  • The maximum amount of time in seconds that a pipeline(or task) is permitted to run. As soon as the sum of all task runs equals this limit, the pipeline(or task) is terminated. Defaults to 3600 seconds(1 hour)

Retrieval

Retrieve details for a specific pipeline

curl -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups/<group_id>/pipelines/<pipeline_id>

The response should look similar to the following:

{
   "success": true,
   "status": 200,
   "message": "Success",
   "result": {
     "id": "some_pipeline_id",
     "group": "c487c25f-6c6e-457d-a781-85120df9f10b",
     "invocation_mode": "async",
     "max_exec_time": 10800,
     "max_retries": 0,
     "owner": "testuser2",
     "retry_policy": "exponential_backoff",
     "uuid": "e48ada7a-56b4-4d48-974c-7574d51a8789",
     "current_run": null,
     "last_run": null,
     "tasks": [
       {
         "id": "build",
         "cache": false,
         "depends_on": [],
         "description": "Build an image from a repository and push it to an image registry",
         "input": null,
         "invocation_mode": "async",
         "max_exec_time": 3600,
         "max_retries": 0,
         "output": null,
         "pipeline": "e48ada7a-56b4-4d48-974c-7574d51a8789",
         "poll": null,
         "retry_policy": "exponential_backoff",
         "type": "image_build",
         "uuid": "e442b5df-8a9e-4d55-b4da-c51b7241a79f",
         "builder": "singularity",
         "context": "5bd771ab-8df5-43cd-a059-fbaa2323841b",
         "destination": "b34d1439-d2c9-4238-ab74-13b5fd7f3b1f",
         "auth": null,
         "data": null,
         "headers": null,
         "http_method": null,
         "protocol": null,
         "query_params": null,
         "url": null,
         "image": null,
         "tapis_job_def": null,
         "tapis_actor_id": null
       }
     ]
   }
 }

Deletion

Deleting a Pipeline will delete all of it’s tasks. This operation can only be performed the owner of the pipeline.


Tasks

Tasks are discrete units of work performed during the execution of a workflow. They can be represented as nodes on a directed acyclic graph (DAG), with the order of their execution determined by their dependencies, and where all tasks without dependencies are executed first.

Tasks can be defined when creating pipeline, or after the pipelines creation. Every task must have an id that is unique within the pipeline.

Task may also specify their dependencies in a number of ways. The first way is by declaring the dependency explicity in the depends_on property. This is an Array of TaskDependency objects which only have 2 attributes. The id, which is the id of the task that it depends on, and the can_fail attribute(Boolean) which specifies whether the dependent task is allowed to run if that TaskDependency fails.

Task Attributes Table

This table contains all of the properties that are shared by all tasks. Different types of tasks will have other unique properties in addition to all of the properties in the table below.

Attribute

Type

Example

Notes

id

String

my-task, my.task, my_task

  • Must be unique within the pipeline that it belongs to

type

Enum

image_build, tapis_job, tapis_actor, request, container_run, function

  • Only image_build is fully supported. Partial support for the request type exists; HTTP GET requests only

depends_on

Array[TaskDependency]

see table below

  • Explicitly declares this task’s dependencies. Task with the specified id must exist or the pipeline will not run.

execution_profile

Object

see execution profile table in the pipeline section

  • Inherits the execution_profile set in the pipeline definition.

description

String

My task description

input

Object

output

Object

pipeline

String(UUIDv4)

5bd771ab-8df5-43cd-a059-fbaa2323841b

  • UUID of the pipeline that this task is a part of

uuid

String(UUIDv4)

5bd771ab-8df5-43cd-a059-fbaa2323841b

  • A globally unique identifier for this task

Task Types

There are different types of tasks types users can leverage to perform diffent types of work. These are called task types or primitives. Task types include the image_build type, the request type, the tapis_job type, the tapis_actor type, the container_run type, and the function task.

When defining tasks on a pipeline, the type must be present in the task definition along with all other attributes specific to the task type.


Image Build

Builds Docker and Singularity images from recipe files and pushes them to repositories or stores the resultant image in some archive(specified in the pipeline definition)

Image Build Task Attributes Table

Attribute

Type

Example

Notes

builder

Enum

kaniko, singularity

  • There are two image builders that can be used. Kaniko, which builds docker images, and Singularity, which builds singularity files

cache

Boolean

true, false

  • Layer caching. Used to make subsequent builds of the same image quicker(if supported by the image builder)

context

Object

see context table below

  • Indicates the source of the image to build. Typically that source is a code repository, or an image registry

destination

Object

see destination attribute table below

  • Indicates the destination to which the image will be stored/pushed. Can be local, or an image registry like Dockerhub

Context Attribute Table

Attribute

Type

Example

Notes

branch

String

main, dev, feature/some-new-feature

  • Branch to pull and build from

recipe_file_path

String

src/Dockerfile, src/Singularity.myfile

  • Path to the Dockerfile relative to the root directory of the project

sub_path

String

/some/sub/path

  • Equivalent to the build context argument in docker push

type

Enum

github, dockerhub, local(unsupported)

  • Instructs the API and Workflow Executor how fetch the source

url

String

tapis/workflows-api

  • The url repository(or registry) where the source code(or image) is located

visibility

Enum

private, public

  • Informs that API that credentials are required to access the source

identity_uuid

String(UUIDv4)

78aa5231-7075-428c-b94a-a6b971a444d2

  • Optional if visibility == "public". The identity that contains the set of credentials required to access the source

credentials

Object

  • Optional if visibility == "public" and unneccessary if an identity_uuid is provided. An object that contains key/value of the credentials needed to access the source

Context Examples

"context": {
  "branch": "main",
  "recipe_file_path": "src/Singularity.test",
  "sub_path": null,
  "type": "github",
  "url": "nathandf/jscicd-image-demo-private",
  "visibility": "private",
  "identity_uuid": "78aa5231-7075-428c-b94a-a6b971a444d2",
  "credentials": {
    "username": "<username>",
    "personal_access_token": "<token>"
  }
}

Destination Attribute Table

Attribute

Type

Example

Notes

type

Enum

dockerhub, local

tag

String

latest, dev, 1.0.0

  • type dockerhub only. The tag for the image when pushing to a registry

url

String

someaccount/somerepo

  • type dockerhub only

identity_uuid

String(UUIDv4)

78aa5231-7075-428c-b94a-a6b971a444d2

  • The identity that contains the set of credentials required to access the destination

credentials

Object

  • Unneccessary if an identity_uuid is provided. An object that contains key/value of the credentials needed to access the destination

Destination Examples

A destination of type dockerhub will push the resultant image to the specified Dockerhub registry using either the credentials provided in the identity(referenced in the request body via the identity_uuid), or by providing a credentials object with the necessary username and token required to push to that repository.

"destination": {
  "tag": "test",
  "type": "dockerhub",
  "url": "nathandf/jscicd-kaniko-test",
  "identity_uuid": "fb949e63-a636-4666-980f-c72f8abc2b29"
}

OR

"destination": {
  "tag": "test",
  "type": "dockerhub",
  "url": "nathandf/jscicd-kaniko-test",
  "credentials": {
    "useranme": "<username>",
    "token": "<token>"
  }
}

Back to tasks


Retrieval

Retrieve details for a specific task in a pipeline

curl -H "X-Tapis-Token: $JWT" https://tacc.tapis.io/v3/workflows/groups/<group_id>/pipelines/<pipeline_id>/tasks/<task_id>

The response should look similar to the following:

{
   "success": true,
   "status": 200,
   "message": "Success",
   "result": {
     "id": "build",
     "cache": false,
     "depends_on": [],
     "description": "Build an image from a repository and push it to an image registry",
     "input": null,
     "invocation_mode": "async",
     "max_exec_time": 3600,
     "max_retries": 0,
     "output": null,
     "pipeline": "ececc546-3ee0-437e-ae50-5882ec03356a",
     "poll": null,
     "retry_policy": "exponential_backoff",
     "type": "image_build",
     "uuid": "01eac121-19bf-4d8e-957e-faa27bdaa1f8",
     "builder": "singularity",
     "context": "ea58c3ef-7175-41b0-9671-e50700a33c77",
     "destination": "6eac73da-5799-4e74-957c-03b5cee97149",
     "auth": null,
     "data": null,
     "headers": null,
     "http_method": null,
     "protocol": null,
     "query_params": null,
     "url": null,
     "image": null,
     "tapis_job_def": null,
     "tapis_actor_id": null
   }
 }

Deletion

Deleting a task can only be done by a pipeline administrator. If any tasks depend on the deleted task, the pipeline will fail when run


Archives

Archives are the storage mechanisms for pipeline results. Without an archive, the results produced by each task will be permanently deleted at the end of each pipeline run. By default, archiving occurs at the end of a pipeline run when all tasks have reached a terminal state.

Archive Attributes Table

This table contains all of the properties that are shared by all archives. Different types of archives will have other unique properties in addition to all of the properties in the table below.

Attribute

Type

Example

Notes

id

String

my-task, my.task, my_task

  • Must be unique within the group that it belongs to

archive_dir

String

path/to/archive/dir

  • Relative to either the “root directory” of the archive’s file system

type

Enum

system, S3

Archive Types

Tapis System

Store the results of a pipeline run to a specific system. The owner of the archive must have MODIFY permissions on the system. Permission will be checked at the time the archive is created and every time before archiving.

Note

The archiving process does NOT interfere with the task execution process. If archiving fails, the pipeline run can still complete successfully.

Tapis System Archive Attributes Table

Attribute

Type

Example

Notes

system_id

String

somerepo/some_image

  • Must have MODIFY permissions on this system. Also, by default, the system is assumed to be in the same tenant as the group to which it belongs

Tapis System Archive Example:

{
  "id": "my.archive",
  "type": "system",
  "system_id": "my.system",
  "archive_dir": "workflows/archive/"
}

Back to archives


Identities

Identities are mappings between a Tapis identity and some external identity. An example of an external identity would be a Github user or Dockerhub user.

Identities have two primary functions. The first is it serves as a reference to some set of credentials that are required to access a restricted external resource, such as a Github repository or Dockerhub image registry. The second is for authenticating the identity of a user that triggerred a webhook notification from some external resource.

For example, if:
  • Github user jsmith pushes code to some repository,

  • and has an “on-push” webhook notification configured to make a request the Workflows API(to trigger a pipeline)

We need to know which Tapis user(if any) corresponds to that Github user jsmith so we can determine if jsmith is permitted to trigger that pipeline.

Note

Security Note

In order to create and run workflows that are automated and reproducible, the Workflow Executor must sometimes be furnished with secrets(passwords, access keys, access secrets, ssh keys, etc) that enable it access restricted resources.

To ensure the safe storage and retrieval of this sensitive data, the Workflows service integrates with Tapis SK, a built-in secrets management service backed by Hashicorp Vault.

It is also important to note that, when a user creates a task that accesses some restricted resource, the Workflow Executor will execute that task on behalf of that user with the credentials that they provided for every run of the pipeline. If those credentials expire, or the user has their access revoked for those resources, the pipeline run will fail on that task.

Identities Attribues Table

Attribute

Type

Example

Notes

type

Enum

github, dockerhub

  • For each type of identity, the credentials object of the identity will be different. Details to follow

name

String

my-github-identity

  • Must be unique to the Tapis user.

description

String

This is the identity to access my restricted github repo

credentials

Object

  • Contains the secret values to access the restricted external resources

Identity Examples

{
  "type": "github",
  "name": "my-github-identity",
  "description": "My github identity",
  "credentials": {
    "username": "<username>",
    "personal_access_token": "<token>"
  }
}

Directives (experimental)

Directives are a special set of commands that override the default execution behavior of a pipeline or its tasks. Directives can be provided in either a commit message or the request body.

Commit Message Directives

Directives must be placed inside square brackets at the end of the commit message Multiple Directives must be separated by a pipe “|” Directives that require a key-value pair must have the key and value separated by a colon “:” The directive string in a commit message must comply with the following regex(Python flavor) pattern: \[[a-zA-Z0-9\s:|._-]+\]

Directive Usage Examples

git commit -m "Some commit message [no_push]"
git commit -m "Some commit message [cache|custom_tag:my-custom-tagV.1]"

List of Directives

  • custom_tag - Overrides the destination image_tag on an image_build task. Tags an image with the value provided after “custom_tag:”.

  • commit_destination - Overrides the destination image_tag on an image_build task. Dynamically tags the image with the short commit sha of the last commit(and push) that triggered the pipeline.

  • no_push (pending) - Overrides the image build destination. Creates a local file

  • dry_run - prevents the pipeline from running. Used to test whether the desired Pipeline is matched.

  • nocache (unsupported) - prevents the image builder in an “image_build” task from caching the image layers. This will result in longer build times for subsequent builds.