Chapter 5 WDL style guide

The Workflow Description Language (WDL) is a way to specify data processing workflows with a human-readable and writeable syntax. WDL makes it straightforward to define complex analysis tasks, chain them together in workflows, and parallelize their execution

5.1 Name conventions

The names for the various types of objects should be in the following format:

Workflow: UpperCamelCase
Task: UpperCamelCase
Struct: UpperCamelCase
Variable: lower_case separated with underscore _

To avoid name conflicts, end with Task to tasks, and end with Flow to workflows.

5.1.1 Input and output names

Should mimic the (long form) versions of the options they represent as much as possible.
Should be formatted as lower_case.
End with _path for input which represents a file path.
End with _file for input which represents a file.

Avoid referencing Absolute Paths (Except when using Docker), use a input parameter.

task bad {
    File f

    command {
        java -jar /usr/lib/library.jar -Dinput=${f}
    }
}

# Instead:
task good {
    File f
    File jar

    command {
        java -jar ${jar} -Dinput=${f}
    }
}

5.2 Indentation

Following should be indented:

Anything within a set of braces {}.
Inputs following input: in a call block.
Continuations of expressions which did no fit on a single line, see Line length and line breaks.

Indentation rules:

Use spaces.
4 spaces per indentation.
Closing braces should use the same level as indentation as their opening braces.

Example:

workflow Example {
    call SomeTask as doStuff {
        input:
            number = 1,
            letter = "a"
    }
}

5.3 Blank Lines

Blank lines should be used to separate different parts of a workflow:

Different blocks (code surrounded by {} or <<<>>>) should be separated by a single blank line.
Different groupings of inputs (in call, task and workflow blocks) and items in runtime and parameter_meta sections may also be separated by a single blank line.
Between the closing braces of a parent and child block, no blank lines should be placed.

5.4 Line length and line breaks

Lines should be at most 80 characters long. If a line exceeds this, it should be broken up into multiple lines.

5.4.1 Line break

Aways add line breaks in:

Input section after each input.
Output section after each output.
After {, }, <<< or >>>.
Between inputs in a call block.

When using line breaks to adhere to the line length limit, they should occur on logical places. These include:

Following a comma.
Before the then or else in an if-then-else expression.
Following an opening bracket (().
Following an operator which would otherwise be followed by a space in (Expression spacing)

Example:

Int value = if defined(aVariableThatHasAWayTooLongName)
    then aVariableThatHasAWayTooLongName else 10

# or

Int value = if defined(aVariableThatHasAWayTooLongName)
    then aVariableThatHasAWayTooLongName
    else 10

# ----

call SomeTask as doStuff {
    input:
        number = 1,
        letter = "a"
}

5.5 Expression spacing

Spaces should be added between values and operators in expressions.
If multiple expressions occur within a single overarching expression, they may be grouped without placing spaces, with spaces being placed between the groups instead. It is advisable to base the groups on the order of operations.
In the case of groupings, opening brackets (() should always be preceded by a space and closing brackets ()) should always be followed by one. There should not be a space between the brackets and their first or last value.
In the case of function calls, there should not be a space between the function’s name and the opening bracket of it’s parameters, but besides that the same rules apply as with groupings, as far as the brackets are concerned. The commas separating the parameters should be followed, but not preceded by a space.

Example

1 + 1
1 + 1/2
(1 + 1) / 2
!(a == -1)

5.6 Tasks

Recommend to use as format and name all tasks.

One empty line between each subsection example.

task Echo {
  input {
      String message

      String? outputPath # Optional input(s) separated from mandatory input(s)
  }

  command <<<
      echo ~{message} ~{"> " + outputPath}
  >>>

  output {
      File? outputFile = outputPath
  }
}

5.6.1 Command section

Each option in a bash command should be on a new line. Ending previous lines with a backslash (). Some grouping of options on a single line may be acceptable. For example, various java memory settings may be set on the same line, as long as the line does not exceed the line length limit.
All bash commands should start with set -e -o pipefail if more than one bash command gets executed in the task.
Use command <<<...>>> rather than command {...}, which menas the ~{...}
placeholder syntax should be used in all cases rather than the ${...} syntax.

Full scripts, e.g. python

task heredoc {
input {
    File in
}

command<<<
python <<CODE
    with open("${in}") as fp:
        for line in fp:
            if not line.startswith('#'):
                print(line.strip())
CODE
>>>
}

5.6.2 Runtime section

A docker container should be provided for all tasks, e.g. Biocontainers. Only if a task performs highly generic tasks (eg. a simple ln command) may the docker container be omitted.

5.6.3 The parameter_meta section

It is highly advised that a parameter_meta section is defined, containing descriptions of both the inputs and output of the task.

5.6.4 Example

task DoStuff {
    input {
        File inputFile
        String outputPath
        Int maxRAM

        String? preCommand
    }

    command {
        set -e -o pipefail
        mkdir -p `dirname outputPath`
        someScript \
        -i ~{inputFile} \
        --maxRAM ~{maxRAM} \
        -o outputPath
    }

    output {
        File output = outputPath
    }

    runtime {
        docker: "alpine"
    }

    parameter_meta {
        inputFile: "A file"
        outputPath: "A location to put the output"
        maxRam: "The maximum amount of RAM that can be used (in GB)"

        output: "A file containing the output"
    }
}

5.7 Modularization

5.7.1 Tasks and workflows

In general tasks and structs should be kept in separate files from workflows. Only if a task is small and specific to a certain workflow may it be placed in the same file as the workflow.
Tasks and structs relating to the same tool or toolkit should be in the same file.
If a file contains multiple tasks and/or structs, the structs should be kept below the tasks and both the tasks and structs should be ordered alphabetically.
Calls and value assignments in a workflow should be placed in order of execution.
If there are multiple workflows and task files:
- all tasks should saved in the tasks/ folder.
- All workflows that contain tasks must go in workflows/.
- each wdl file must only contain tasks corresponding to a certain tool.

5.7.2 Analyses

Analyses are higher level workflows that string together multiple workflows that all do the same kind of computation.
All analyses should go into analyses/.

Example:

task A {
    command {
        echo A
    }
}

task Z {
    command {
        echo Z
    }
}

struct B {
    String name
}

5.8 Imports

Imports should be placed in alphabetical order at the top of the file. All imports must be named using the as syntax.

Good practices in bioinformatic or computational biology projects

Chapter 5 WDL style guide

5.1 Name conventions

5.1.1 Input and output names

5.2 Indentation

5.3 Blank Lines

5.4 Line length and line breaks

5.4.1 Line break

5.5 Expression spacing

5.6 Tasks

5.6.1 Command section

5.6.2 Runtime section

5.6.3 The parameter_meta section

5.6.4 Example

5.7 Modularization

5.7.1 Tasks and workflows

5.7.2 Analyses

5.8 Imports

5.9 References