Often, when we read in a data set that is supposed to contain date values, R will treat them as character types. This can cause problems if we’re attempting to use that date for a visualization, to extract the individual components (month, day, year), or simply order our data set from the most recent data to the oldest.
Luckily, we can convert our character strings to date variable types in R with just a few short lines of code. In this article, I’ll show you how to do this using the handy
lubridate package inside of the
To get started, we are going to use the same simple revenue data set that we used in the How to Load a CSV in R tutorial.
With that in mind, we first need to load in the
lubridate package (or install it using the
install.packages() function if you haven’t done so yet). I’ll also load in
dplyr to make use of some of it’s functionality later on.
library(lubridate) library(dplyr) revenue <- read.csv("simple_revenue.csv")
Now that we’ve loaded our data set, let’s take a look to remind you all what the data inside of this file looks like:
## Rows: 30 ## Columns: 2 ## $ date <chr> "6/1/2022", "6/2/2022", "6/3/2022", "6/4/2022", "6/5/2022", "6… ## $ revenue <chr> "$307.00 ", "$557.00 ", "$549.00 ", "$1,159.00 ", "$1,525.00 "…
Here, we can see that we have two columns
revenue, both of which are read in as character’s. We won’t deal with the
revenue column in this article (you can see that
here), though we will also need to convert that to a numeric. However, our focus is on the
In order to transform a date column of this structure into a proper date format, we can use the
mdy() function from the
mdy() just stands for ‘Month Day Year’ and will work for separators of
-, which are likely to be the way your date is separated between the individual values. We use this particular function because that is the way our data is structured.
If our data was in
day/month/year format, we would use the
dmy() function, and if it was
year/month/day we could use the
ymd() function, for example.
revenue %>% mutate(new_date = mdy(date)) %>% glimpse()
## Rows: 30 ## Columns: 3 ## $ date <chr> "6/1/2022", "6/2/2022", "6/3/2022", "6/4/2022", "6/5/2022", "… ## $ revenue <chr> "$307.00 ", "$557.00 ", "$549.00 ", "$1,159.00 ", "$1,525.00 … ## $ new_date <date> 2022-06-01, 2022-06-02, 2022-06-03, 2022-06-04, 2022-06-05, …
Here, we create a new column using
mutate() and title it
new_date. We do this just so we can keep our original
date column and get a peek at the difference. As you can see from the output above, our
new_date variable has a data type of
<date>. Now you may be wondering… ‘okay, but what was the point of that it looks like it just reformatted it a bit?’.
I’m so glad you asked.
Now that we have a proper date variable, we’re able to use a handful of functions within the
lubridate package to extract details about our date. For example, the
year() functions will extract the components of their same name. Let’s take a look.
revenue %>% mutate(new_date = mdy(date)) %>% mutate(day = day(new_date), month = month(new_date), year = year(new_date)) %>% head()
## date revenue new_date day month year ## 1 6/1/2022 $307.00 2022-06-01 1 6 2022 ## 2 6/2/2022 $557.00 2022-06-02 2 6 2022 ## 3 6/3/2022 $549.00 2022-06-03 3 6 2022 ## 4 6/4/2022 $1,159.00 2022-06-04 4 6 2022 ## 5 6/5/2022 $1,525.00 2022-06-05 5 6 2022 ## 6 6/6/2022 $1,310.00 2022-06-06 6 6 2022
Notice that all we did was apply those functions to the
new_date variable we created and it allowed us to very easily separate out the basic components that make up our date. If we were to try that with our regular
date column, an error would be thrown.
revenue %>% mutate(day = day(date), month = month(date), year = year(date)) %>% head()
## Error in `mutate()`: ## ! Problem while computing `day = day(date)`. ## Caused by error in `as.POSIXlt.character()`: ## ! character string is not in a standard unambiguous format
All we did in the code above is use the
date column that is of a character type. This gives us the error that
character string is not in a standard unambiguous format. If you get something like this, be sure to use the
head() functions to take a look at the data and each variable’s type; or the
str() function to see similar information. If it says the column you’re passing into
year() is a character, make sure you convert it to a
Date type using the steps from above.
While the date in our example was structured in a way that looks like it should automatically be treated as a date type, we can also parse this information if it is written in a more text-heavy way. For example:
mdy("July 4th, 2022")
##  "2022-07-04"
dmy("4th of July '22")
##  "2022-07-04"
As you can see, the
lubridate package gives us a lot of flexibility when dealing with dates. After we convert this information to a formal date string, many more functions are at our disposal. For example, we can get the week with the
isoweek() functions; the day of the week with the
wday() function, and even check if the year is a leap year with
leap_year(), among many other things.
We’ll likely continue convering the
lubridate package throughout this series as working with dates can be an important part of analyzing your data, but for now - I hope this has helped you convert your character string into a formal Date variable so you can use it properly.
For a cheat-sheet on the lubridate package, check out the lubridate website.