Getting started with R Language Data frames Reading and writing tabular data in plain-text files (CSV, TSV, etc.)Pipe operators (%>% and others)Linear Models (Regression)data.table boxplot Formula Split function Creating vectors Factors Pattern Matching and Replacement Run-length encoding Date and Time Speeding up tough-to-vectorize code ggplot2 Lists Introduction to Geographical Maps Base Plotting Set operations tidyverse Rcpp Random Numbers Generator String manipulation with stringi package Parallel processing Subsetting Debugging Installing packages Arima Models Distribution Functions Shiny spatial analysis sqldf Code profiling Control flow structures Column wise operation JSON RODBC lubridate Time Series and Forecasting strsplit function Web scraping and parsing Generalized linear models Reshaping data between long and wide forms RMarkdown and knitr presentation Scope of variables Performing a Permutation Test xgboost R code vectorization best practices Missing values Hierarchical Linear Modeling Classes Introspection *apply family of functions (functionals)Text mining ANOVA Raster and Image Analysis Survival analysis Fault-tolerant/resilient code Reproducible R Updating R and the package library Fourier Series and Transformations .Rprofile dplyr caret Extracting and Listing Files in Compressed Archives Probability Distributions with R R in LaTeX with knitr Web Crawling in R Arithmetic Operators Creating reports with RMarkdown GPU-accelerated computing heatmap and heatmap.2 Network analysis with the igraph package Functional programming Get user input roxygen2 Hashmaps Spark API (SparkR)Meta: Documentation Guidelines I/O for foreign tables (Excel, SAS, SPSS, Stata)I/O for database tables I/O for geographic data (shapefiles, etc.)I/O for raster images I/O for R's binary format Reading and writing strings Input and output Recycling Expression: parse + eval Regular Expressions (regex)Combinatorics Pivot and unpivot with data.table Inspecting packages Solving ODEs in R Feature Selection in R -- Removing Extraneous Features Bibliography in RMD Writing functions in R Color schemes for graphics Hierarchical clustering with hclust Random Forest Algorithm Bar Chart Cleaning data RESTful R Services Machine learning Variables The Date class The logical class The character class Numeric classes and storage modes Matrices Date-time classes (POSIXct and POSIXlt)Using texreg to export models in a paper-ready way Publishing Implement State Machine Pattern using S4 Class Reshape using tidyr Modifying strings by substitution Non-standard evaluation and standard evaluation Randomization Object-Oriented Programming in R Regular Expression Syntax in R Coercion Standardize analyses by writing standalone R scripts Analyze tweets with R Natural language processing Using pipe assignment in your own package %<>%: How to ?R Markdown Notebooks (from RStudio)Updating R version Aggregating data frames Data acquisition R memento by examples Creating packages with devtools

Code profiling

System.time

System time gives you the CPU time required to execute a R expression, for example:

system.time(print("hello world"))

# [1] "hello world"
#    user  system elapsed 
#       0       0       0

You can add larger pieces of code through use of braces:

system.time({
    library(numbers)
    Primes(1,10^5)
})

Or use it to test functions:

fibb <- function (n) {   
    if (n < 3) {
        return(c(0,1)[n])
    } else {
        return(fibb(n - 2) + fibb(n -1))
    }
}

system.time(fibb(30))

proc.time()

At its simplest, proc.time() gives the total elapsed CPU time in seconds for the current process. Executing it in the console gives the following type of output:

proc.time()

#       user     system    elapsed 
#    284.507    120.397 515029.305

This is particularly useful for benchmarking specific lines of code. For example:

t1 <- proc.time()
fibb <- function (n) {
    if (n < 3) {
        return(c(0,1)[n])
    } else {
        return(fibb(n - 2) + fibb(n -1))
    }
}
print("Time one")
print(proc.time() - t1)

t2 <- proc.time()
fibb(30)

print("Time two")
print(proc.time() - t2)

This gives the following output:

source('~/.active-rstudio-document')

# [1] "Time one"
#    user  system elapsed 
#       0       0       0 

# [1] "Time two"
#    user  system elapsed 
#   1.534   0.012   1.572

system.time() is a wrapper for proc.time() that returns the elapsed time for a particular command/expression.

print(t1 <- system.time(replicate(1000,12^2)))
##  user  system elapsed 
## 0.000   0.000   0.002

Note that the returned object, of class proc.time, is slightly more complicated than it appears on the surface:

str(t1)
## Class 'proc_time'  Named num [1:5] 0 0 0.002 0 0
##  ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...

Line Profiling

One package for line profiling is lineprof which is written and maintained by Hadley Wickham. Here is a quick demonstration of how it works with auto.arima in the forecast package:

library(lineprof)
library(forecast)

l <- lineprof(auto.arima(AirPassengers))
shine(l)

This will provide you with a shiny app, which allows you to delve deeper into every function call. This enables you to see with ease what is causing your R code to slow down. There is a screenshot of the shiny app below:

Microbenchmark

Microbenchmark is useful for estimating the time taking for otherwise fast procedures. For example, consider estimating the time taken to print hello world.

system.time(print("hello world"))

# [1] "hello world"
#    user  system elapsed 
#       0       0       0

This is because system.time is essentially a wrapper function for proc.time, which measures in seconds. As printing "hello world" takes less than a second it appears that the time taken is less than a second, however this is not true. To see this we can use the package microbenchmark:

library(microbenchmark)
microbenchmark(print("hello world"))
 
# Unit: microseconds
#                 expr    min     lq     mean  median     uq     max neval
# print("hello world") 26.336 29.984 44.11637 44.6835 45.415 158.824   100

Here we can see after running print("hello world") 100 times, the average time taken was in fact 44 microseconds. (Note that running this code will print "hello world" 100 times onto the console.)

We can compare this against an equivalent procedure, cat("hello world\n"), to see if it is faster than print("hello world"):

microbenchmark(cat("hello world\n"))

# Unit: microseconds
#                  expr    min      lq     mean median     uq     max neval
# cat("hello world\\n") 14.093 17.6975 23.73829 19.319 20.996 119.382   100

In this case cat() is almost twice as fast as print().

Alternatively one can compare two procedures within the same microbenchmark call:

microbenchmark(print("hello world"), cat("hello world\n"))
# Unit: microseconds
# expr                    min     lq     mean  median     uq     max neval
# print("hello world") 29.122 31.654 39.64255 34.5275 38.852 192.779   100
# cat("hello world\\n")  9.381 12.356 13.83820 12.9930 13.715  52.564   100

Benchmarking using microbenchmark

You can use the microbenchmark package to conduct "sub-millisecond accurate timing of expression evaluation".

In this example we are comparing the speeds of six equivalent data.table expressions for updating elements in a group, based on a certain condition.

More specifically:

A data.table with 3 columns: id, time and status. For each id, I want to find the record with the maximum time - then if for that record if the status is true, I want to set it to false if the time is > 7

library(microbenchmark)
library(data.table)

set.seed(20160723)
dt <- data.table(id = c(rep(seq(1:10000), each = 10)),
                time = c(rep(seq(1:10000), 10)),
                status = c(sample(c(TRUE, FALSE), 10000*10, replace = TRUE)))
setkey(dt, id, time)  ## create copies of the data so the 'updates-by-reference' don't affect other expressions
dt1 <- copy(dt)
dt2 <- copy(dt)
dt3 <- copy(dt)
dt4 <- copy(dt)
dt5 <- copy(dt)
dt6 <- copy(dt)

microbenchmark(
  
  expression_1 = {
    dt1[ dt1[order(time), .I[.N], by = id]$V1, status := status * time < 7 ] 
    },
  
  expression_2 = {
    dt2[,status := c(.SD[-.N, status], .SD[.N, status * time > 7]), by = id]
    },
  
  expression_3 = {
    dt3[dt3[,.N, by = id][,cumsum(N)], status := status * time > 7]
    },
  
  expression_4 = { 
    y <- dt4[,.SD[.N],by=id]
    dt4[y, status := status & time > 7]
  },
  
  expression_5 = {
    y <- dt5[, .SD[.N, .(time, status)], by = id][time > 7 & status]
    dt5[y, status := FALSE]
  },
  
  expression_6 = {
    dt6[ dt6[, .I == .I[which.max(time)], by = id]$V1 & time > 7, status := FALSE]
    },
  
  times = 10L ## specify the number of times each expression is evaluated
)

# Unit: milliseconds
#         expr         min          lq        mean      median         uq          max neval
# expression_1   11.646149   13.201670   16.808399   15.643384   18.78640    26.321346    10
# expression_2 8051.898126 8777.016935 9238.323459 8979.553856 9281.93377 12610.869058    10
# expression_3    3.208773    3.385841    4.207903    4.089515    4.70146     5.654702    10
# expression_4   15.758441   16.247833   20.677038   19.028982   21.04170    36.373153    10
# expression_5 7552.970295 8051.080753 8702.064620 8861.608629 9308.62842  9722.234921    10
# expression_6   18.403105   18.812785   22.427984   21.966764   24.66930    28.607064    10

The output shows that in this test expression_3 is the fastest.

References

data.table - Adding and modifying columns

data.table - special grouping symbols in data.table

Contributors

Topic Id: 2149

Example Ids: 7049,7050,7051,10661,11117

This site is not affiliated with any of the contributors.