Getting started with R LanguageData framesReading and writing tabular data in plain-text files (CSV, TSV, etc.)Pipe operators (%>% and others)Linear Models (Regression)data.tableboxplotFormulaSplit functionCreating vectorsFactorsPattern Matching and ReplacementRun-length encodingDate and TimeSpeeding up tough-to-vectorize codeggplot2ListsIntroduction to Geographical MapsBase PlottingSet operationstidyverseRcppRandom Numbers GeneratorString manipulation with stringi packageParallel processingSubsettingDebuggingInstalling packagesArima ModelsDistribution FunctionsShinyspatial analysissqldfCode profilingControl flow structuresColumn wise operationJSONRODBClubridateTime Series and Forecastingstrsplit functionWeb scraping and parsingGeneralized linear modelsReshaping data between long and wide formsRMarkdown and knitr presentationScope of variablesPerforming a Permutation TestxgboostR code vectorization best practicesMissing valuesHierarchical Linear ModelingClassesIntrospection*apply family of functions (functionals)Text miningANOVARaster and Image AnalysisSurvival analysisFault-tolerant/resilient codeReproducible RUpdating R and the package libraryFourier Series and Transformations.RprofiledplyrcaretExtracting and Listing Files in Compressed ArchivesProbability Distributions with RR in LaTeX with knitrWeb Crawling in RArithmetic OperatorsCreating reports with RMarkdownGPU-accelerated computingheatmap and heatmap.2Network analysis with the igraph packageFunctional programmingGet user inputroxygen2HashmapsSpark API (SparkR)Meta: Documentation GuidelinesI/O for foreign tables (Excel, SAS, SPSS, Stata)I/O for database tablesI/O for geographic data (shapefiles, etc.)I/O for raster imagesI/O for R's binary formatReading and writing stringsInput and outputRecyclingExpression: parse + evalRegular Expressions (regex)CombinatoricsPivot and unpivot with data.tableInspecting packagesSolving ODEs in RFeature Selection in R -- Removing Extraneous FeaturesBibliography in RMDWriting functions in RColor schemes for graphicsHierarchical clustering with hclustRandom Forest AlgorithmBar ChartCleaning dataRESTful R ServicesMachine learningVariablesThe Date classThe logical classThe character classNumeric classes and storage modesMatricesDate-time classes (POSIXct and POSIXlt)Using texreg to export models in a paper-ready wayPublishingImplement State Machine Pattern using S4 ClassReshape using tidyrModifying strings by substitutionNon-standard evaluation and standard evaluationRandomizationObject-Oriented Programming in RRegular Expression Syntax in RCoercionStandardize analyses by writing standalone R scriptsAnalyze tweets with RNatural language processingUsing pipe assignment in your own package %<>%: How to ?R Markdown Notebooks (from RStudio)Updating R versionAggregating data framesData acquisitionR memento by examplesCreating packages with devtools

boxplot

Other topics

Create a box-and-whisker plot with boxplot() {graphics}

This example use the default boxplot() function and the irisdata frame.

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Simple boxplot (Sepal.Length)

Create a box-and-whisker graph of a numerical variable

boxplot(iris[,1],xlab="Sepal.Length",ylab="Length(in centemeters)",
           main="Summary Charateristics of Sepal.Length(Iris Data)")

enter image description here

Boxplot of sepal length grouped by species

Create a boxplot of a numerical variable grouped by a categorical variable

boxplot(Sepal.Length~Species,data = iris)

withcategorical

Bring order

To change order of the box in the plot you have to change the order of the categorical variable's levels.
For example if we want to have the order virginica - versicolor - setosa

newSpeciesOrder <- factor(iris$Species, levels=c("virginica","versicolor","setosa"))
boxplot(Sepal.Length~newSpeciesOrder,data = iris)

reorder

Change groups names

If you want to specifie a better name to your groups you can use the Names parameter. It take a vector of the size of the levels of categorical variable

boxplot(Sepal.Length~newSpeciesOrder,data = iris,names= c("name1","name2","name3"))

enter image description here

Small improvements

Color

col : add a vector of the size of the levels of categorical variable

boxplot(Sepal.Length~Species,data = iris,col=c("green","yellow","orange"))

withcolor

Proximity of the box

boxwex: set the margin between boxes.
Left boxplot(Sepal.Length~Species,data = iris,boxwex = 0.1)
Right boxplot(Sepal.Length~Species,data = iris,boxwex = 1)

changeproximity

See the summaries which the boxplots are based plot=FALSE

To see a summary you have to put the paramater plot to FALSE.
Various results are given

> boxplot(Sepal.Length~newSpeciesOrder,data = iris,plot=FALSE)
$stats #summary of the numerical variable for the 3 groups
     [,1] [,2] [,3]
[1,]  5.6  4.9  4.3 # extreme value 
[2,]  6.2  5.6  4.8 # first quartile limit
[3,]  6.5  5.9  5.0 # median limit
[4,]  6.9  6.3  5.2 # third quartile limit
[5,]  7.9  7.0  5.8 # extreme value

$n #number of observations in each groups
[1] 50 50 50

$conf #extreme value of the notchs
         [,1]     [,2]     [,3]
[1,] 6.343588 5.743588 4.910622
[2,] 6.656412 6.056412 5.089378

$out #extreme value
[1] 4.9

$group #group in which are the extreme value
[1] 1

$names #groups names
[1] "virginica"  "versicolor" "setosa"    

Additional boxplot style parameters.

Box

  • boxlty - box line type
  • boxlwd - box line width
  • boxcol - box line color
  • boxfill - box fill colors

Median

  • medlty - median line type ("blank" for no line)
  • medlwd - median line widht
  • medcol - median line color
  • medpch - median point (NA for no symbol)
  • medcex - median point size
  • medbg - median point background color

Whisker

  • whisklty - whisker line type
  • whisklwd - whisker line width
  • whiskcol - whisker line color

Staple

  • staplelty - staple line type
  • staplelwd - staple line width
  • staplecol - staple line color

Outliers

  • outlty - outlier line type ("blank" for no line)
  • outlwd - outlier line width
  • outcol - outlier line color
  • outpch - outlier point type (NA for no symbol)
  • outcex - outlier point size
  • outbg - outlier point background color

Example

Default and heavily modified plots side by side

par(mfrow=c(1,2))
# Default
boxplot(Sepal.Length ~ Species, data=iris)
# Modified
boxplot(Sepal.Length ~ Species, data=iris,
        boxlty=2, boxlwd=3, boxfill="cornflowerblue", boxcol="darkblue",
        medlty=2, medlwd=2, medcol="red", medpch=21, medcex=1, medbg="white",
        whisklty=2, whisklwd=3, whiskcol="darkblue",
        staplelty=2, staplelwd=2, staplecol="red",
        outlty=3, outlwd=3, outcol="grey", outpch=NA
        )

enter image description here

Syntax:

  • boxplot(x, ...) # generic function

  • boxplot(formula, data = NULL, ..., subset, na.action = NULL) ## S3 method for class 'formula'

  • boxplot(x, ..., range = 1.5, width = NULL, varwidth = FALSE, notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"), col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL) ## Default S3 method

Parameters:

ParametersDetails (source R Documentation)
formulaa formula, such as y ~ grp, where y is a numeric vector of data values to be split into groups according to the grouping variable grp (usually a factor).
dataa data.frame (or list) from which the variables in formula should be taken.
subsetan optional vector specifying a subset of observations to be used for plotting.
na.actiona function which indicates what should happen when the data contain NAs. The default is to ignore missing values in either the response or the group.
boxwexa scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower.
plotif TRUE (the default) then a boxplot is produced. If not, the summaries which the boxplots are based on are returned.
colif col is non-null it is assumed to contain colors to be used to colour the bodies of the box plots. By default they are in the background colour.

Contributors

Topic Id: 1005

Example Ids: 3259,20556

This site is not affiliated with any of the contributors.