Getting started with R Language Data frames Reading and writing tabular data in plain-text files (CSV, TSV, etc.)Pipe operators (%>% and others)Linear Models (Regression)data.table boxplot Formula Split function Creating vectors Factors Pattern Matching and Replacement Run-length encoding Date and Time Speeding up tough-to-vectorize code ggplot2 Lists Introduction to Geographical Maps Base Plotting Set operations tidyverse Rcpp Random Numbers Generator String manipulation with stringi package Parallel processing Subsetting Debugging Installing packages Arima Models Distribution Functions Shiny spatial analysis sqldf Code profiling Control flow structures Column wise operation JSON RODBC lubridate Time Series and Forecasting strsplit function Web scraping and parsing Generalized linear models Reshaping data between long and wide forms RMarkdown and knitr presentation Scope of variables Performing a Permutation Test xgboost R code vectorization best practices Missing values Hierarchical Linear Modeling Classes Introspection *apply family of functions (functionals)Text mining ANOVA Raster and Image Analysis Survival analysis Fault-tolerant/resilient code Reproducible R Updating R and the package library Fourier Series and Transformations .Rprofile dplyr caret Extracting and Listing Files in Compressed Archives Probability Distributions with R R in LaTeX with knitr Web Crawling in R Arithmetic Operators Creating reports with RMarkdown GPU-accelerated computing heatmap and heatmap.2 Network analysis with the igraph package Functional programming Get user input roxygen2 Hashmaps Spark API (SparkR)Meta: Documentation Guidelines I/O for foreign tables (Excel, SAS, SPSS, Stata)I/O for database tables I/O for geographic data (shapefiles, etc.)I/O for raster images I/O for R's binary format Reading and writing strings Input and output Recycling Expression: parse + eval Regular Expressions (regex)Combinatorics Pivot and unpivot with data.table Inspecting packages Solving ODEs in R Feature Selection in R -- Removing Extraneous Features Bibliography in RMD Writing functions in R Color schemes for graphics Hierarchical clustering with hclust Random Forest Algorithm Bar Chart Cleaning data RESTful R Services Machine learning Variables The Date class The logical class The character class Numeric classes and storage modes Matrices Date-time classes (POSIXct and POSIXlt)Using texreg to export models in a paper-ready way Publishing Implement State Machine Pattern using S4 Class Reshape using tidyr Modifying strings by substitution Non-standard evaluation and standard evaluation Randomization Object-Oriented Programming in R Regular Expression Syntax in R Coercion Standardize analyses by writing standalone R scripts Analyze tweets with R Natural language processing Using pipe assignment in your own package %<>%: How to ?R Markdown Notebooks (from RStudio)Updating R version Aggregating data frames Data acquisition R memento by examples Creating packages with devtools

Survival analysis

Random Forest Survival Analysis with randomForestSRC

Just as the random forest algorithm may be applied to regression and classification tasks, it can also be extended to survival analysis.

In the example below a survival model is fit and used for prediction, scoring, and performance analysis using the package randomForestSRC from CRAN.

require(randomForestSRC)

set.seed(130948) #Other seeds give similar comparative results
x1   <- runif(1000)
y    <- rnorm(1000, mean = x1, sd = .3)
data <- data.frame(x1 = x1, y = y)
head(data)

         x1          y
1 0.9604353  1.3549648
2 0.3771234  0.2961592
3 0.7844242  0.6942191
4 0.9860443  1.5348900
5 0.1942237  0.4629535
6 0.7442532 -0.0672639

(modRFSRC <- rfsrc(y ~ x1, data = data, ntree=500, nodesize = 5))

                             Sample size: 1000
                     Number of trees: 500
          Minimum terminal node size: 5
       Average no. of terminal nodes: 208.258
No. of variables tried at each split: 1
              Total no. of variables: 1
                            Analysis: RF-R
                              Family: regr
                      Splitting rule: mse
                % variance explained: 32.08
                          Error rate: 0.11

x1new   <- runif(10000)
ynew    <- rnorm(10000, mean = x1new, sd = .3)
newdata <- data.frame(x1 = x1new, y = ynew)

survival.results <- predict(modRFSRC, newdata = newdata)
survival.results

  Sample size of test (predict) data: 10000
                Number of grow trees: 500
  Average no. of grow terminal nodes: 208.258
         Total no. of grow variables: 1
                            Analysis: RF-R
                              Family: regr
                % variance explained: 34.97
                 Test set error rate: 0.11

Introduction - basic fitting and plotting of parametric survival models with the survival package

survival is the most commonly used package for survival analysis in R. Using the built-in lung dataset we can get started with Survival Analysis by fitting a regression model with the survreg() function, creating a curve with survfit(), and plotting predicted survival curves by calling the predict method for this package with new data.

In the example below we plot 2 predicted curves and vary sex between the 2 sets of new data, to visualize its effect:

require(survival)
s <- with(lung,Surv(time,status))

sWei  <- survreg(s ~ as.factor(sex)+age+ph.ecog+wt.loss+ph.karno,dist='weibull',data=lung)

fitKM <- survfit(s ~ sex,data=lung)
plot(fitKM)

lines(predict(sWei, newdata = list(sex      = 1, 
                                   age      = 1, 
                                   ph.ecog  = 1, 
                                   ph.karno = 90,
                                   wt.loss  = 2),
                                 type = "quantile",
                                 p    = seq(.01, .99, by = .01)),
                                 seq(.99, .01, by        =-.01),
                                 col = "blue")

lines(predict(sWei, newdata = list(sex      = 2, 
                                   age      = 1, 
                                   ph.ecog  = 1, 
                                   ph.karno = 90,
                                   wt.loss  = 2),
                                 type = "quantile",
                                 p    = seq(.01, .99, by = .01)),
                                 seq(.99, .01, by        =-.01),
                                 col = "red")

Kaplan Meier estimates of survival curves and risk set tables with survminer

Base plot

install.packages('survminer')
source("https://bioconductor.org/biocLite.R")
biocLite("RTCGA.clinical") # data for examples
library(RTCGA.clinical)
survivalTCGA(BRCA.clinical, OV.clinical,
             extract.cols = "admin.disease_code") -> BRCAOV.survInfo
library(survival)
fit <- survfit(Surv(times, patient.vital_status) ~ admin.disease_code,
               data = BRCAOV.survInfo)
library(survminer)
ggsurvplot(fit, risk.table = TRUE)

More advanced

ggsurvplot(
   fit,                     # survfit object with calculated statistics.
   risk.table = TRUE,       # show risk table.
   pval = TRUE,             # show p-value of log-rank test.
   conf.int = TRUE,         # show confidence intervals for 
                            # point estimaes of survival curves.
   xlim = c(0,2000),        # present narrower X axis, but not affect
                            # survival estimates.
   break.time.by = 500,     # break X axis in time intervals by 500.
   ggtheme = theme_RTCGA(), # customize plot and risk table with a theme.
 risk.table.y.text.col = T, # colour risk table text annotations.
  risk.table.y.text = FALSE # show bars instead of names in text annotations
                            # in legend of risk table
)

Based on

http://r-addict.com/2016/05/23/Informative-Survival-Plots.html

Contributors

Topic Id: 3788

Example Ids: 13086,13087,14660

This site is not affiliated with any of the contributors.