Chapter 4 R Style Guide

4.1 Files

4.1.1 Names

  • File names should be meaningful and end in .R. Avoid using special characters in file names - stick with numbers, letters, -, and _.

    # Good
    fit_models.R
    utility_functions.R
    
    # Bad
    fit models.R
    foo.r
    stuff.r
  • If files should be run in a particular order, prefix them with numbers. If it seems likely you’ll have more than 10 files, left pad with zero:

    00_download.R
    01_explore.R
    ...
    09_model.R
    10_visualize.R
  • Prefer file names that are all lower case, and never have names that differ only in their capitalization.

4.1.2 Internal structure

  • Use commented lines of - and = to break up your file into easily readable chunks.

    # Load data ---------------------------
    
    # Plot data ---------------------------
  • If your script uses add-on packages, load them all at once at the very beginning of the file.

4.2 Naming Convention

  • Variable and function names should use only lowercase letters, numbers, and _. Use underscores (_) (so called snake case) to separate words within a name.

    # Good
    day_one
    day_1
  • It’s better to reserve dots exclusively for the S3 object system. In S3, methods are given the name function.class.

  • Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!).

    # Good
    day_one
    
    # Bad
    first_day_of_the_month
    djm1
  • Avoid re-using names of common functions and variables. This will cause confusion for the readers of your code.

    # Bad
    T <- FALSE
    c <- 10
    mean <- function(x) sum(x)

4.3 Spacing

4.3.1 Indentation

Use four spaces.

4.3.2 Commas

Always put a space after a comma, never before, just like in regular English.

# Good
x[, 1]

# Bad
x[,1]
x[ ,1]
x[ , 1]

4.3.3 Parentheses

  • Do not put spaces inside or outside parentheses for regular function calls.

    # Good
    mean(x, na.rm = TRUE)
    
    # Bad
    mean (x, na.rm = TRUE)
    mean( x, na.rm = TRUE )
  • Place a space before and after () when used with if, for, or while.

    # Good
    if (debug) {
        show(x)
    }
    
    # Bad
    if(debug){
        show(x)
    }
  • Place a space after () used for function arguments:

    # Good
    function(x) {}
    
    # Bad
    function (x) {}
    function(x){}

4.3.4 Embracing

The embracing operator, {{ }}, should always have inner spaces to help emphasise its special behaviour:

# Good
max_by <- function(data, var, by) {
  data %>%
    group_by({{ by }}) %>%
    summarise(maximum = max({{ var }}, na.rm = TRUE))
}

# Bad
max_by <- function(data, var, by) {
  data %>%
    group_by({{by}}) %>%
    summarise(maximum = max({{var}}, na.rm = TRUE))
}

4.3.5 Infix operators

Most infix operators (==, +, -, <-, etc.) should always be surrounded by spaces:

# Good
height <- (feet * 12) + inches
mean(x, na.rm = TRUE)

# Bad
height<-feet*12+inches
mean(x, na.rm=TRUE)

There are a few exceptions, which should never be surrounded by spaces:

  • The operators with high precedence: ::, :::, $, @, [, [[, ^, unary -, unary +, and :.

  • Single-sided formulas when the right-hand side is a single identifier.

    # Good
    ~foo
    
    # Bad
    ~ foo

    Note that single-sided formulas with a complex right-hand side do need a space:

    # Good
    ~ .x + .y
    
    # Bad
    ~.x + .y
  • When used in tidy evaluation !! (bang-bang) and !!! (bang-bang-bang)

4.4 Function Calls

4.4.1 Named arguments

  • If you override the default value of an argument, use the full name.

  • You can omit the argument names for very common arguments

    # Good
    mean(1:10, na.rm = TRUE)
    
    # Bad
    mean(x = 1:10, , FALSE)
    mean(, TRUE, x = c(1:10, NA))

4.4.2 Assignment

Avoid assignment in function calls:

# Good
x <- complicated_function()
if (nzchar(x) < 1) {
  # do something
}

# Bad
if (nzchar(x <- complicated_function()) < 1) {
  # do something
}

The only exception is in functions that capture side-effects:

output <- capture.output(x <- f())

4.4.3 Long lines

There are two options if the function name and definition can’t fit on a single line:

  • Function-indent: place each argument on its own line, and indent to match the opening ( of function:

    long_function_name <- function(a = "a long argument",
                                   b = "another argument",
                                   c = "another long argument") {
      # As usual code is indented by two spaces.
    }
  • Double-indent: Place each argument of its own double indented line.

    long_function_name <- function(
        a = "a long argument",
        b = "another argument",
        c = "another long argument") {
        # As usual code is indented by two spaces.
    }

In both cases the trailing ) and leading { should go on the same line as the last argument.

Prefer function-indent style to double-indent style when it fits.

These styles are designed to clearly separate the function definition from its body.

# Bad
long_function_name <- function(a = "a long argument",
    b = "another argument",
    c = "another long argument") {
    # Here it's hard to spot where the definition ends and the
    # code begins, and to see all three function arguments
}

4.4.4 Return

  • Only use return() for early returns. Otherwise, rely on R to return the result of the last evaluated expression.

    # Good
    find_abs <- function(x) {
        if (x > 0) {
            return(x)
        }
        x * -1
    }
    add_two <- function(x, y) {
        x + y
    }
    
    # Bad
    add_two <- function(x, y) {
        return(x + y)
    }
  • Return statements should always be on their own line because they have important effects on the control flow

    # Good
    find_abs <- function(x) {
        if (x > 0) {
            return(x)
        }
        x * -1
    }
    
    # Bad
    find_abs <- function(x) {
        if (x > 0) return(x)
        x * -1
    }
  • If your function is called primarily for its side-effects (like printing, plotting, or saving to disk), it should return the first argument invisibly. This makes it possible to use the function as part of a pipe. print methods should usually do this, like this example from httr:

    print.url <- function(x, ...) {
        cat("Url: ", build_url(x), "\n", sep = "")
        invisible(x)
    }

4.4.5 Comments

  • In code, use comments to explain the “why” not the “what” or “how”. Each line of a comment should begin with the comment symbol and a single space: #.

    # Good
    
    # Objects like data frames are treated as leaves
    x <- map_if(x, is_bare_list, recurse)
    
    
    # Bad
    
    # Recurse only with bare lists
    x <- map_if(x, is_bare_list, recurse)
  • Comments should be in sentence case, and only end with a full stop if they contain at least two sentences:

    # Good
    
    # Objects like data frames are treated as leaves
    x <- map_if(x, is_bare_list, recurse)
    
    # Do not use `is.list()`. Objects like data frames must be treated
    # as leaves.
    x <- map_if(x, is_bare_list, recurse)
    
    
    # Bad
    
    # objects like data frames are treated as leaves
    x <- map_if(x, is_bare_list, recurse)
    
    # Objects like data frames are treated as leaves.
    x <- map_if(x, is_bare_list, recurse)

4.5 Control flow

4.5.1 Code blocks

  • { should be the last character on the line. Related code (e.g., an if clause, a function declaration, a trailing comma, …) must be on the same line as the opening brace.

  • The contents should be indented by four spaces.

  • } should be the first character on the line.

    # good
    if (y < 0 && debug) {
        message("y is negative")
    }
    
    if (y == 0) {
        if (x > 0) {
            log(x)
        } else {
            message("x is negative or zero")
        }
    } else {
        y^x
    }
    
    # bad
    if (y == 0)
    {
      if (x > 0) {
        log(x)
      } else {
    message("x is negative or zero")
      }
    } else { y ^ x }

4.5.2 If Statements

  • If used, else should be on the same line as }.
  • & and | should never be used inside of an if clause because they can return vectors. Always use && and || instead.

4.5.3 Inline statement

  • You can write a simple if block on ane single line

    message <- if (x > 10) "big" else "small"
  • Function calls that affect control flow (like return(), stop() or continue) should always go in their own {} block:

    # Good
    if (y < 0) {
        stop("Y is negative")
    }
    
    find_abs <- function(x) {
        if (x > 0) {
            return(x)
        }
        x * -1
    }
    
    # Bad
    if (y < 0) stop("Y is negative")
    
    if (y < 0)
        stop("Y is negative")

4.5.4 Implicit type coercion

Avoid implicit type coercion (e.g. from numeric to logical) in if statements:

# Good
if (length(x) > 0) {
    # do something
}

# Bad
if (length(x)) {
    # do something
}

4.5.5 Switch statements

  • Avoid position-based switch() statements (i.e. prefer names).

  • Each element should go on its own line.

  • Elements that fall through to the following element should have a space after =.

  • Provide a fall-through error, unless you have previously validated the input.

    # Good 
    switch(x, 
        a = ,
        b = 1, 
        c = 2,
        stop("Unknown `x`", call. = FALSE)
    )
    
    # Bad
    switch(x, a = , b = 1, c = 2)
    switch(x, a =, b = 1, c = 2)
    switch(y, 1, 2, 3)

4.6 Long lines

  • Strive to limit your code to 80 characters per line.

  • If a function call is too long to fit on a single line, use one line each for the function name, each argument, and the closing ). This makes the code easier to read and to change later.

    # Good
    do_something_very_complicated(
        something = "that",
        requires = many,
        arguments = "some of which may be long"
    )
    
    # Bad
    do_something_very_complicated("that", requires, many, arguments,
                                  "some of which may be long"
                                 )
  • Short unnamed arguments can also go on the same line as the function name, even if the whole function call spans multiple lines.

    map(x, f,
        extra_argument_a = 10,
        extra_argument_b = c(1, 43, 390, 210209)
    )
  • You may also place several arguments on the same line if they are closely related to each other.

    # Good
    paste0(
        "Requirement: ", requires, "\n",
        "Result: ", result, "\n"
    )
    
    # Bad
    paste0(
        "Requirement: ", requires,
        "\n", "Result: ",
        result, "\n")

4.7 Semicolons

Don’t put ; at the end of a line, and don’t use ; to put multiple commands on one line.

4.8 Assignment

Use <-, not =, for assignment.

4.9 Character vectors

Use ", not ', for quoting text. The only exception is when the text already contains double quotes and no single quotes.

# Good
"Text"
'Text with "quotes"'
'<a href="http://style.tidyverse.org">A link</a>'

# Bad
'Text'
'Text with "double" and \'single\' quotes'

4.10 Logical vectors

Use TRUE and FALSE rather than T and F.

4.11 Comments

Each line of a comment should begin with the comment symbol # and a single space.

In data analysis code, use comments to record important findings and analysis decisions. If you need comments to explain what your code is doing, consider rewriting your code to be clearer. If you discover that you have more comments than code, consider switching to R Markdown.

4.12 Resources

  • styler: formatting your code according to the tidyverse style guide (or your custom style guide) so you can direct your attention to the content of your code. It helps to keep the coding style consistent across projects and facilitate collaboration.
  • lintr: offering static code analysis for R. It checks adherence to a given style, syntax errors and possible semantic issues.