Getting started with R LanguageData framesReading and writing tabular data in plain-text files (CSV, TSV, etc.)Pipe operators (%>% and others)Linear Models (Regression)data.tableboxplotFormulaSplit functionCreating vectorsFactorsPattern Matching and ReplacementRun-length encodingDate and TimeSpeeding up tough-to-vectorize codeggplot2ListsIntroduction to Geographical MapsBase PlottingSet operationstidyverseRcppRandom Numbers GeneratorString manipulation with stringi packageParallel processingSubsettingDebuggingInstalling packagesArima ModelsDistribution FunctionsShinyspatial analysissqldfCode profilingControl flow structuresColumn wise operationJSONRODBClubridateTime Series and Forecastingstrsplit functionWeb scraping and parsingGeneralized linear modelsReshaping data between long and wide formsRMarkdown and knitr presentationScope of variablesPerforming a Permutation TestxgboostR code vectorization best practicesMissing valuesHierarchical Linear ModelingClassesIntrospection*apply family of functions (functionals)Text miningANOVARaster and Image AnalysisSurvival analysisFault-tolerant/resilient codeReproducible RUpdating R and the package libraryFourier Series and Transformations.RprofiledplyrcaretExtracting and Listing Files in Compressed ArchivesProbability Distributions with RR in LaTeX with knitrWeb Crawling in RArithmetic OperatorsCreating reports with RMarkdownGPU-accelerated computingheatmap and heatmap.2Network analysis with the igraph packageFunctional programmingGet user inputroxygen2HashmapsSpark API (SparkR)Meta: Documentation GuidelinesI/O for foreign tables (Excel, SAS, SPSS, Stata)I/O for database tablesI/O for geographic data (shapefiles, etc.)I/O for raster imagesI/O for R's binary formatReading and writing stringsInput and outputRecyclingExpression: parse + evalRegular Expressions (regex)CombinatoricsPivot and unpivot with data.tableInspecting packagesSolving ODEs in RFeature Selection in R -- Removing Extraneous FeaturesBibliography in RMDWriting functions in RColor schemes for graphicsHierarchical clustering with hclustRandom Forest AlgorithmBar ChartCleaning dataRESTful R ServicesMachine learningVariablesThe Date classThe logical classThe character classNumeric classes and storage modesMatricesDate-time classes (POSIXct and POSIXlt)Using texreg to export models in a paper-ready wayPublishingImplement State Machine Pattern using S4 ClassReshape using tidyrModifying strings by substitutionNon-standard evaluation and standard evaluationRandomizationObject-Oriented Programming in RRegular Expression Syntax in RCoercionStandardize analyses by writing standalone R scriptsAnalyze tweets with RNatural language processingUsing pipe assignment in your own package %<>%: How to ?R Markdown Notebooks (from RStudio)Updating R versionAggregating data framesData acquisitionR memento by examplesCreating packages with devtools

Pipe operators (%>% and others)

Other topics

Remarks:

Packages that use %>%

The pipe operator is defined in the magrittr package, but it gained huge visibility and popularity with the dplyr package (which imports the definition from magrittr). Now it is part of tidyverse, which is a collection of packages that "work in harmony because they share common data representations and API design".

The magrittr package also provides several variations of the pipe operator for those who want more flexibility in piping, such as the compound assignment pipe %<>%, the exposition pipe %$%, and the tee operator %T>%. It also provides a suite of alias functions to replace common functions that have special syntax (+, [, [[, etc.) so that they can be easily used within a chain of pipes.

Finding documentation

As with any infix operator (such as +, *, ^, &, %in%), you can find the official documentation if you put it in quotes: ?'%>%' or help('%>%') (assuming you have loaded a package that attaches pkg:magrittr).

Hotkeys

There is a special hotkey in RStudio for the pipe operator: Ctrl+Shift+M (Windows & Linux), Cmd+Shift+M (Mac).

Performance Considerations

While the pipe operator is useful, be aware that there is a negative impact on performance due mainly to the overhead of using it. Consider the following two things carefully when using the pipe operator:

  • Machine performance (loops)
  • Evaluation (object %>% rm() does not remove object)

Basic use and chaining

The pipe operator, %>%, is used to insert an argument into a function. It is not a base feature of the language and can only be used after attaching a package that provides it, such as magrittr. The pipe operator takes the left-hand side (LHS) of the pipe and uses it as the first argument of the function on the right-hand side (RHS) of the pipe. For example:

library(magrittr)

1:10 %>% mean
# [1] 5.5

# is equivalent to
mean(1:10)
# [1] 5.5

The pipe can be used to replace a sequence of function calls. Multiple pipes allow us to read and write the sequence from left to right, rather than from inside to out. For example, suppose we have years defined as a factor but want to convert it to a numeric. To prevent possible information loss, we first convert to character and then to numeric:

years <- factor(2008:2012)

# nesting
as.numeric(as.character(years))

# piping
years %>% as.character %>% as.numeric

If we don't want the LHS (Left Hand Side) used as the first argument on the RHS (Right Hand Side), there are workarounds, such as naming the arguments or using . to indicate where the piped input goes.

# example with grepl
# its syntax:
# grepl(pattern, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

# note that the `substring` result is the *2nd* argument of grepl
grepl("Wo", substring("Hello World", 7, 11))

# piping while naming other arguments
"Hello World" %>% substring(7, 11) %>% grepl(pattern = "Wo")

# piping with .
"Hello World" %>% substring(7, 11) %>% grepl("Wo", .)

# piping with . and curly braces
"Hello World" %>% substring(7, 11) %>% { c(paste('Hi', .)) }
#[1] "Hi World"

#using LHS multiple times in argument with curly braces and .
"Hello World" %>% substring(7, 11) %>% { c(paste(. ,'Hi', .)) }
#[1] "World Hi World"

Functional sequences

Given a sequence of steps we use repeatedly, it's often handy to store it in a function. Pipes allow for saving such functions in a readable format by starting a sequence with a dot as in:

. %>% RHS

As an example, suppose we have factor dates and want to extract the year:

library(magrittr) # needed to include the pipe operators
library(lubridate)
read_year <- . %>% as.character %>% as.Date %>% year

# Creating a dataset
df <- data.frame(now = "2015-11-11", before = "2012-01-01")
#          now     before
# 1 2015-11-11 2012-01-01

# Example 1: applying `read_year` to a single character-vector
df$now %>% read_year
# [1] 2015

# Example 2: applying `read_year` to all columns of `df`
df %>% lapply(read_year) %>% as.data.frame  # implicit `lapply(df, read_year)
#    now before
# 1 2015   2012

# Example 3: same as above using `mutate_all`
library(dplyr)
df %>% mutate_all(funs(read_year))
# if an older version of dplyr use `mutate_each`
#    now before
# 1 2015   2012

We can review the composition of the function by typing its name or using functions:

read_year
# Functional sequence with the following components:
# 
#  1. as.character(.)
#  2. as.Date(.)
#  3. year(.)
# 
# Use 'functions' to extract the individual functions. 

We can also access each function by its position in the sequence:

read_year[[2]]
# function (.) 
# as.Date(.)

Generally, this approach may be useful when clarity is more important than speed.

Assignment with %<>%

The magrittr package contains a compound assignment infix-operator, %<>%, that updates a value by first piping it into one or more rhs expressions and then assigning the result. This eliminates the need to type an object name twice (once on each side of the assignment operator <-). %<>% must be the first infix-operator in a chain:

library(magrittr)
library(dplyr)

df <- mtcars

Instead of writing

df <- df %>% select(1:3) %>% filter(mpg > 20, cyl == 6)

or

df %>% select(1:3) %>% filter(mpg > 20, cyl == 6) -> df

The compound assignment operator will both pipe and reassign df:

df %<>% select(1:3) %>% filter(mpg > 20, cyl == 6)

Exposing contents with %$%

The exposition pipe operator, %$%, exposes the column names as R symbols within the left-hand side object to the right-hand side expression. This operator is handy when piping into functions that do not have a data argument (unlike, say, lm) and that don't take a data.frame and column names as arguments (most of the main dplyr functions).

The exposition pipe operator %$% allows a user to avoid breaking a pipeline when needing to refer to column names. For instance, say you want to filter a data.frame and then run a correlation test on two columns with cor.test:

library(magrittr)
library(dplyr)
mtcars %>%
  filter(wt > 2) %$%
  cor.test(hp, mpg)

#> 
#>  Pearson's product-moment correlation
#> 
#> data:  hp and mpg
#> t = -5.9546, df = 26, p-value = 2.768e-06
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.8825498 -0.5393217
#> sample estimates:
#>        cor 
#> -0.7595673

Here the standard %>% pipe passes the data.frame through to filter(), while the %$% pipe exposes the column names to cor.test().

The exposition pipe works like a pipe-able version of the base R with() functions, and the same left-hand side objects are accepted as inputs.

Using the pipe with dplyr and ggplot2

The %>% operator can also be used to pipe the dplyr output into ggplot. This creates a unified exploratory data analysis (EDA) pipeline that is easily customizable. This method is faster than doing the aggregations internally in ggplot and has the added benefit of avoiding unnecessary intermediate variables.

library(dplyr)
library(ggplot)


diamonds %>% 
    filter(depth > 60) %>% 
    group_by(cut) %>% 
    summarize(mean_price = mean(price)) %>% 
    ggplot(aes(x = cut, y = mean_price)) + 
        geom_bar(stat = "identity")

Creating side effects with %T>%

Some functions in R produce a side effect (i.e. saving, printing, plotting, etc) and do not always return a meaningful or desired value.

%T>% (tee operator) allows you to forward a value into a side-effect-producing function while keeping the original lhs value intact. In other words: the tee operator works like %>%, except the return values is lhs itself, and not the result of the rhs function/expression.

Example: Create, pipe, write, and return an object. If %>% were used in place of %T>% in this example, then the variable all_letters would contain NULL rather than the value of the sorted object.

all_letters <- c(letters, LETTERS) %>%
    sort %T>% 
    write.csv(file = "all_letters.csv")

read.csv("all_letters.csv") %>% head()
#   x
# 1 a
# 2 A
# 3 b
# 4 B
# 5 c
# 6 C

Warning: Piping an unnamed object to save() will produce an object named . when loaded into the workspace with load(). However, a workaround using a helper function is possible (which can also be written inline as an anonymous function).

all_letters <- c(letters, LETTERS) %>%
    sort %T>% 
    save(file = "all_letters.RData")

load("all_letters.RData", e <- new.env())

get("all_letters", envir = e)
# Error in get("all_letters", envir = e) : object 'all_letters' not found

get(".", envir = e)
#  [1] "a" "A" "b" "B" "c" "C" "d" "D" "e" "E" "f" "F" "g" "G" "h" "H" "i" "I" "j" "J" 
# [21] "k" "K" "l" "L" "m" "M" "n" "N" "o" "O" "p" "P" "q" "Q" "r" "R" "s" "S" "t" "T" 
# [41] "u" "U" "v" "V" "w" "W" "x" "X" "y" "Y" "z" "Z"

# Work-around
save2 <- function(. = ., name, file = stop("'file' must be specified")) {
  assign(name, .)
  call_save <- call("save", ... = name, file = file)
  eval(call_save)
}

all_letters <- c(letters, LETTERS) %>%
    sort %T>%
    save2("all_letters", "all_letters.RData")

Syntax:

  • lhs %>% rhs # pipe syntax for rhs(lhs)

  • lhs %>% rhs(a = 1) # pipe syntax for rhs(lhs, a = 1)

  • lhs %>% rhs(a = 1, b = .) # pipe syntax for rhs(a = 1, b = lhs)

  • lhs %<>% rhs # pipe syntax for lhs <- rhs(lhs)

  • lhs %$% rhs(a) # pipe syntax for with(lhs, rhs(lhs$a))

  • lhs %T>% rhs # pipe syntax for { rhs(lhs); lhs }

Parameters:

lhsrhs
A value or the magrittr placeholder.A function call using the magrittr semantics

Contributors

Topic Id: 652

Example Ids: 2132,5661,13622,18922,23541,25222

This site is not affiliated with any of the contributors.