One of the most common issues that new (and experienced!) R users have is formatting dates both prior to data import and during analyses. The simplest approach to dealing with dates is to ensure that the flat file (csv or txt) that is being read into R contains a separate column for each component of date and time. So, if date is a variable in your dataset, convert the date to three separate columns (day, month, year) prior to import, and if date and time are variables, convert those to six separate columns (second, minute, hour, day, month, year).

If you’re using Excel to prepare your data for import, you can use the =YEAR() function, for example, to extract the year value from a cell formatted as a date. Note: If you ever try this and it results in an obscure date like “1900-01-01”, it means that you just need to format your year column differently. If you’re stuck, here’s a link to a good explanation of how to deal with this!

http://www.excelfunctions.net/YearFunction.html

Here’s a simple dataset that we’ll use for this tutorial: https://raw.githubusercontent.com/DanielleQuinn/RLessons/master/FormattingDates/date_data.txt

Load lubridate Package

library(lubridate)

Import Data

mydata<-read.delim("date_data.txt")

As you can see, the date and time information has been separated into individual columns prior to the data being imported into R

head(mydata)
  year month day hour minute second
1 2014     1  15   13     45     34
2 2014     1  15   22      0      0
3 2014     1  16   14     45     34
4 2014     1  16   23      0      0
5 2014     1  17   15     45     36
6 2014     1  17    0      0      0

We’re going to be using the ymd() and ymd_hms() functions. These functions are designed to take character and numeric vectors, and convert it to a POSIXct object. Essentially, a POSIXct object is recognized by R as being a date or date and time and will be handled as such. The format of the vectors that these functions can recognize is flexible, but it’s generally a good idea to use a standardized method. Here, we’re going to set up the input as a character vector formatted as “YYYY-MM-DD” (for ymd()) or “YYYY-MM-DD-HH-MM-SS” (for ymd_hms()).

Using ymd() to format dates

For now, we’ll only be concerned with creating a column of dates consisting of year, month, and day, and exclude the time components.

Use the values of year, month, and day to create a character string that is formatted as “YYYY-MM-DD”

To do this, use the paste() function:

paste(mydata$year, mydata$month, mydata$day, sep="-")
 [1] "2014-1-15" "2014-1-15" "2014-1-16" "2014-1-16" "2014-1-17"
 [6] "2014-1-17" "2014-1-18" "2014-1-18" "2014-1-19" "2014-1-19"
[11] "2014-1-20" "2014-1-20" "2014-1-21" "2014-1-21" "2014-1-22"
[16] "2014-1-22" "2014-1-23" "2014-1-23"

Note: The sep argument in the paste() function determines what character(s) are used to separate each piece that is being pasted together.

Now we can wrap that piece of code in the ymd() function to create a vector of dates, properly formatted as POSIXct.

form.dates<-ymd(paste(mydata$year, mydata$month, mydata$day, sep="-"))
form.dates
 [1] "2014-01-15 UTC" "2014-01-15 UTC" "2014-01-16 UTC" "2014-01-16 UTC"
 [5] "2014-01-17 UTC" "2014-01-17 UTC" "2014-01-18 UTC" "2014-01-18 UTC"
 [9] "2014-01-19 UTC" "2014-01-19 UTC" "2014-01-20 UTC" "2014-01-20 UTC"
[13] "2014-01-21 UTC" "2014-01-21 UTC" "2014-01-22 UTC" "2014-01-22 UTC"
[17] "2014-01-23 UTC" "2014-01-23 UTC"
str(form.dates)
 POSIXct[1:18], format: "2014-01-15" "2014-01-15" "2014-01-16" "2014-01-16" ...

We can use this vector to create a new column in mydata

mydata$date<-form.dates
str(mydata)
'data.frame':   18 obs. of  7 variables:
 $ year  : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ month : int  1 1 1 1 1 1 1 1 1 1 ...
 $ day   : int  15 15 16 16 17 17 18 18 19 19 ...
 $ hour  : int  13 22 14 23 15 0 16 1 17 2 ...
 $ minute: int  45 0 45 0 45 0 45 0 45 0 ...
 $ second: int  34 0 34 0 36 0 38 0 40 0 ...
 $ date  : POSIXct, format: "2014-01-15" "2014-01-15" ...

Using ymd_hms() to format date-times

Let’s create a second column that includes both date and time, both properly formatted.

Use the values of year, month, day, hour, minute, and secon to create a character string that is formatted as “YYYY-MM-DD-HH-mm-SS”, wrap it in the ymd_hms() function and create a new column in mydata. .

mydata$datetime<-ymd_hms(paste(mydata$year, mydata$month, mydata$day, mydata$hour, mydata$minute, mydata$second, sep="-"))
str(mydata)
'data.frame':   18 obs. of  8 variables:
 $ year    : int  2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
 $ month   : int  1 1 1 1 1 1 1 1 1 1 ...
 $ day     : int  15 15 16 16 17 17 18 18 19 19 ...
 $ hour    : int  13 22 14 23 15 0 16 1 17 2 ...
 $ minute  : int  45 0 45 0 45 0 45 0 45 0 ...
 $ second  : int  34 0 34 0 36 0 38 0 40 0 ...
 $ date    : POSIXct, format: "2014-01-15" "2014-01-15" ...
 $ datetime: POSIXct, format: "2014-01-15 13:45:34" "2014-01-15 22:00:00" ...

There are lots of other ways to format dates in R; some are more efficient given the particular format of the original data. Here’s a link to a good overview of some of the common approaches:

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ColeBeck/datestimes.pdf

Other Handy Functions in lubridate

Time Zones

head(mydata$datetime)
[1] "2014-01-15 13:45:34 UTC" "2014-01-15 22:00:00 UTC"
[3] "2014-01-16 14:45:34 UTC" "2014-01-16 23:00:00 UTC"
[5] "2014-01-17 15:45:36 UTC" "2014-01-17 00:00:00 UTC"

At the moment, the datetime column is in UTC, or Coordinated Universal Time because it’s the default when lubridate parses dates. To change this, you can use the function force_tz():

mydata$datetime<-force_tz(mydata$datetime, tzone="EST") # Set as Eastern Standard Time
mydata$datetime
 [1] "2014-01-15 13:45:34 EST" "2014-01-15 22:00:00 EST"
 [3] "2014-01-16 14:45:34 EST" "2014-01-16 23:00:00 EST"
 [5] "2014-01-17 15:45:36 EST" "2014-01-17 00:00:00 EST"
 [7] "2014-01-18 16:45:38 EST" "2014-01-18 01:00:00 EST"
 [9] "2014-01-19 17:45:40 EST" "2014-01-19 02:00:00 EST"
[11] "2014-01-20 18:45:42 EST" "2014-01-20 03:00:00 EST"
[13] "2014-01-21 19:45:44 EST" "2014-01-21 04:00:00 EST"
[15] "2014-01-22 20:45:46 EST" "2014-01-22 05:00:00 EST"
[17] "2014-01-23 21:45:48 EST" "2014-01-23 06:00:00 EST"

Set time zone based on location:

mytime<-ymd_hms("2015-08-14-05-30-00", tz="America/Halifax")
mytime
[1] "2015-08-14 05:30:00 ADT"

Check corresponding time in another location:

with_tz(mytime, "America/Vancouver")
[1] "2015-08-14 01:30:00 PDT"

Extracting Information From Date-Times

Extract the minute information from mytime

minute(mytime)
[1] 30

Change the minute to 34

minute(mytime)<-34
mytime
[1] "2015-08-14 05:34:00 ADT"

Find out the weekday a date falls on

wday(mytime) # As a number
[1] 6
wday(mytime, label=TRUE) # As a name
[1] Fri
Levels: Sun < Mon < Tues < Wed < Thurs < Fri < Sat

Or the month

month(mytime, label=TRUE)
[1] Aug
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

Time Intervals

Set up two dates

date1<-ymd_hms("2011-09-23-03-45-23")
date2<-ymd_hms("2011-10-03-21-02-19")

How much time has passed?

difftime(date2,date1)
Time difference of 10.72009 days
difftime(date2, date1, unit="mins") # In minutes
Time difference of 15436.93 mins
difftime(date2, date1, unit="secs") # In seconds
Time difference of 926216 secs

Leap Years

Check to see if a year was a leap year

leap_year(2011)
[1] FALSE
leap_year(2012)
[1] TRUE