The official vignette, "Reference semantics", is the best introduction to this topic.
A reminder: DT[where, select|update|do, by] syntax is used to work with columns of a data.table.
i argumentj argumentThese two arguments are usually passed by position instead of by name.
All modifications to columns can be done in j. Additionally, the set function is available for this use.
# example data
DT = as.data.table(mtcars, keep.rownames = TRUE)
Use the := operator inside j to create new columns or modify existing ones:
DT[, mpg_sq := mpg^2]
Use the i argument to subset to rows "where" edits should be made:
DT[1:3, newvar := "Hello"]
As in a data.frame, we can subset using row numbers or logical tests. It is also possible to use [a "join" in i when modifying][need_a_link].
Remove columns by setting to NULL:
DT[, mpg_sq := NULL]
Note that we do not <- assign the result, since DT has been modified in-place.
Add multiple columns by using the := operator's multivariate format:
DT[, `:=`(mpg_sq = mpg^2, wt_sqrt = sqrt(wt))]
# or
DT[, c("mpg_sq", "wt_sqrt") := .(mpg^2, sqrt(wt))]
The .() syntax is used when the right-hand side of LHS := RHS is a list of columns.
If the columns are dependent and must be defined in sequence, some ways to do that are:
DT[, c("mpg_sq", "mpg2_hp") := .(temp1 <- mpg^2, temp1/hp)]
# or
DT[, c("mpg_sq", "mpg2_hp") := {temp1 = mpg^2; .(temp1, temp1/hp)}]
For dynamically-determined column names, use parentheses:
vn = "mpg_sq"
DT[, (vn) := mpg^2]
setColumns can also be modified with set for a small reduction in overhead, though this is rarely necessary:
set(DT, j = "hp_over_wt", v = mtcars$hp/mtcars$wt)
# example data
DT = as.data.table(mtcars, keep.rownames = TRUE)
To rearrange the order of columns, use setcolorder. For example, to reverse them
setcolorder(DT, rev(names(DT)))
This costs almost nothing in terms of performance, since it is just permuting the list of column pointers in the data.table.
# example data
DT = as.data.table(mtcars, keep.rownames = TRUE)
To rename a column (while keeping its data the same), there is no need to copy the data to a column with a new name and delete the old one. Instead, we can use
setnames(DT, "mpg_sq", "mpq_squared")
to modify the original column by reference.
# example data
DT = data.table(iris)
To modify factor levels by reference, use setattr:
setattr(DT$Species, "levels", c("set", "ver", "vir")
# or
DT[, setattr(Species, "levels", c("set", "ver", "vir"))]
The second option might print the result to the screen.
With setattr, we avoid the copy usually incurred when doing levels(x) <- lvls, but it will also skip some checks, so it is important to be careful to assign a valid vector of levels.