Getting started with R Language Data frames Reading and writing tabular data in plain-text files (CSV, TSV, etc.)Pipe operators (%>% and others)Linear Models (Regression)data.table boxplot Formula Split function Creating vectors Factors Pattern Matching and Replacement Run-length encoding Date and Time Speeding up tough-to-vectorize code ggplot2 Lists Introduction to Geographical Maps Base Plotting Set operations tidyverse Rcpp Random Numbers Generator String manipulation with stringi package Parallel processing Subsetting Debugging Installing packages Arima Models Distribution Functions Shiny spatial analysis sqldf Code profiling Control flow structures Column wise operation JSON RODBC lubridate Time Series and Forecasting strsplit function Web scraping and parsing Generalized linear models Reshaping data between long and wide forms RMarkdown and knitr presentation Scope of variables Performing a Permutation Test xgboost R code vectorization best practices Missing values Hierarchical Linear Modeling Classes Introspection *apply family of functions (functionals)Text mining ANOVA Raster and Image Analysis Survival analysis Fault-tolerant/resilient code Reproducible R Updating R and the package library Fourier Series and Transformations .Rprofile dplyr caret Extracting and Listing Files in Compressed Archives Probability Distributions with R R in LaTeX with knitr Web Crawling in R Arithmetic Operators Creating reports with RMarkdown GPU-accelerated computing heatmap and heatmap.2 Network analysis with the igraph package Functional programming Get user input roxygen2 Hashmaps Spark API (SparkR)Meta: Documentation Guidelines I/O for foreign tables (Excel, SAS, SPSS, Stata)I/O for database tables I/O for geographic data (shapefiles, etc.)I/O for raster images I/O for R's binary format Reading and writing strings Input and output Recycling Expression: parse + eval Regular Expressions (regex)Combinatorics Pivot and unpivot with data.table Inspecting packages Solving ODEs in R Feature Selection in R -- Removing Extraneous Features Bibliography in RMD Writing functions in R Color schemes for graphics Hierarchical clustering with hclust Random Forest Algorithm Bar Chart Cleaning data RESTful R Services Machine learning Variables The Date class The logical class The character class Numeric classes and storage modes Matrices Date-time classes (POSIXct and POSIXlt)Using texreg to export models in a paper-ready way Publishing Implement State Machine Pattern using S4 Class Reshape using tidyr Modifying strings by substitution Non-standard evaluation and standard evaluation Randomization Object-Oriented Programming in R Regular Expression Syntax in R Coercion Standardize analyses by writing standalone R scripts Analyze tweets with R Natural language processing Using pipe assignment in your own package %<>%: How to ?R Markdown Notebooks (from RStudio)Updating R version Aggregating data frames Data acquisition R memento by examples Creating packages with devtools

Column wise operation

sum of each column

Suppose we need to do the sum of each column in a dataset

set.seed(20)
df1 <- data.frame(ID = rep(c("A", "B", "C"), each = 3), V1 = rnorm(9), V2 = rnorm(9))
m1 <- as.matrix(df1[-1])

There are many ways to do this. Using base R, the best option would be colSums

colSums(df1[-1], na.rm = TRUE)

Here, we removed the first column as it is non-numeric and did the sum of each column, specifying the na.rm = TRUE (in case there are any NAs in the dataset)

This also works with matrix

colSums(m1, na.rm = TRUE)

This can be done in a loop with lapply/sapply/vapply

 lapply(df1[-1], sum, na.rm = TRUE)

It should be noted that the output is a list. If we need a vector output

 sapply(df1[-1], sum, na.rm = TRUE)

 vapply(df1[-1], sum, na.rm = TRUE, numeric(1))

For matrices, if we want to loop through columns, then use apply with MARGIN = 1

 apply(m1, 2, FUN = sum, na.rm = TRUE)

There are ways to do this with packages like dplyr or data.table

 library(dplyr)
 df1 %>%
     summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)

Here, we are passing a regular expression to match the column names that we need to get the sum in summarise_at. The regex will match all columns that start with V followed by one or more numbers (\\d+).

A data.table option is

library(data.table)   
setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), .SDcols = 2:ncol(df1)]

We convert the 'data.frame' to 'data.table' (setDT(df1)), specified the columns to be applied the function in .SDcols and loop through the Subset of Data.table (.SD) and get the sum.

If we need to use a group by operation, we can do this easily by specifying the group by column/columns

 df1 %>%
   group_by(ID) %>%   
   summarise_at(vars(matches("^V\\d+")), sum, na.rm = TRUE)

In cases where we need the sum of all the columns, summarise_each can be used instead of summarise_at

df1 %>%
    group_by(ID) %>%
    summarise_each(funs(sum(., na.rm = TRUE)))

The data.table option is

setDT(df1)[, lapply(.SD, sum, na.rm = TRUE), by = ID]

Contributors

Topic Id: 2212

Example Ids: 7235

This site is not affiliated with any of the contributors.