Shifting and Diffing Columns in R’s dataframe
Goal
Goal of this post: showing how to shift and diff columns in R dataframes. Useful when you have absolute values in a data frame and you want to analyze variations.
Setup
For this tutorial we will use a data frame with the forecast temperature in Genoa for a week in August:
day <- c("Fri", "Sat", "Sun", "Mon", "Tue", "Wed", "Thu", "Fri") t_max <- c(28, 28, 30, 31, 31, 31, 33, 30) t_min <- c(13, 14, 17, 18, 20, 18, 22, 20) df <- data.frame(day, t_min, t_max) df
Perform operations on rows
Computing data on rows is straightforward; you just need to add a column with the desired operation.
For instance to get the difference between maximum and minimum temperature, we can do as follows:
df$variation <- df$t_max - df$t_min
df
Diffing Value on a Column
To compute the variations of a variable, we can use the diff
function.
The following code, for instance, computes the variations in the
maximum temperature from day to day. Notice that to insert the values
in the dataframe we need to pad the initial value(s) with NA
.
t_max_variation <- diff(df$t_max, 1) df$t_max_variation <- c(NA, t_max_variation) df
Shifting Values
Other operations might require to shift values of a column. For instance to compute the percent variation in the maximum temperature, we first create a new column which replicates the maximum temperature shifted by one day and then perform an operation on the data frame.
The function head
(and tail
) can be used to shift a vector. The
following code, for instance, takes all elements of t_max
but the
last.
t_max_shifted <- head(df$t_max, -1)
t_max_shifted
We can now use the same trick we used earlier to add t_max_shifted
to the data frame.
df$t_max_shifted <- c(NA, head(df$t_max, -1)) df
The variation in the maximum temperature as a percentage can now be computed as an operation on columns:
df$t_perc_var <- round(df$t_max_variation / df$t_max_shifted, digits=2)
df