Shifting and Diffing Columns in R’s dataframe

Menu

Goal

Goal of this post: showing how to shift and diff columns in R dataframes. Useful when you have absolute values in a data frame and you want to analyze variations.

Setup

For this tutorial we will use a data frame with the forecast temperature in Genoa for a week in August:

day   <- c("Fri", "Sat", "Sun", "Mon", "Tue", "Wed", "Thu", "Fri")
t_max <- c(28, 28, 30, 31, 31, 31, 33, 30)
t_min <- c(13, 14, 17, 18, 20, 18, 22, 20)

df <- data.frame(day, t_min, t_max)
df

Perform operations on rows

Computing data on rows is straightforward; you just need to add a column with the desired operation.

For instance to get the difference between maximum and minimum temperature, we can do as follows:

df$variation <- df$t_max - df$t_min
df

Diffing Value on a Column

To compute the variations of a variable, we can use the diff function.

The following code, for instance, computes the variations in the maximum temperature from day to day. Notice that to insert the values in the dataframe we need to pad the initial value(s) with NA.

t_max_variation <- diff(df$t_max, 1)

df$t_max_variation <- c(NA, t_max_variation)
df

Shifting Values

Other operations might require to shift values of a column. For instance to compute the percent variation in the maximum temperature, we first create a new column which replicates the maximum temperature shifted by one day and then perform an operation on the data frame.

The function head (and tail) can be used to shift a vector. The following code, for instance, takes all elements of t_max but the last.

t_max_shifted <- head(df$t_max, -1)
t_max_shifted

We can now use the same trick we used earlier to add t_max_shifted to the data frame.

df$t_max_shifted <- c(NA, head(df$t_max, -1))
df

The variation in the maximum temperature as a percentage can now be computed as an operation on columns:

df$t_perc_var <- round(df$t_max_variation / df$t_max_shifted, digits=2)
df