Parse JSON with R - with dates
As at June 2020, StackOveflow.com counts nearly half a million views to four questions about JSON and date parsing in R:
- 213k views - Importing data from a JSON file into R
- 100k views - Parse JSON with R
- 86k views - Convert character to time in R
- 57k views - Difference between as.POSIXct/as.POSIXlt
In many programming languages dealing with JSON and dates is an easy task, but R is a unique programming language and it’s easy to lost some hours struggling with them when you’re doing this for the first time.
Example dataset
We’ll use a very simple orders.json
example file, containings data about orders:
[
{ "_id" : "1", "total" : 14.8, "created_at" : "2019-07-04T08:51:26.442Z"},
{ "_id" : "2", "total" : 10.6, "created_at" : "2019-09-23T10:11:38.566Z"},
{ "_id" : "3", "total" : 0.02, "created_at" : "2019-10-11T13:02:55.907Z"},
// ...
]
Loading with jsonlite
Importing JSON data and renaming colums is a trivial with the jsonlite package:
library(jsonlite) // load the library
orders <- fromJSON("orders.json") // load the json file
colnames(orders) <- c('ID', 'Total', 'Date') // rename columns with better names
A quick review of the data will show the shape of imported data:
summary(orders)
ID Total Date
Length:500 Min. : 0.01 Length:500
Class :character 1st Qu.: 24.75 Class :character
Mode :character Median : 47.13 Mode :character
Mean : 83.98
3rd Qu.: 97.85
Max. :905.00
The json file was successfull loaded but the Date
column was recognized as a simple string column (character
). This is not wrong, because dates aren’t part of the JSON specification.
In JSON, a string like "2019-10-11T13:02:55.907Z"
is just a string generated by toISOString method of the JS class Date:
The toISOString() method returns a string in simplified extended ISO format (ISO 8601), which is always 24 or 27 characters long (YYYY-MM-DDTHH:mm:ss.sssZ or ±YYYYYY-MM-DDTHH:mm:ss.sssZ, respectively). The timezone is always zero UTC offset, as denoted by the suffix “Z”.
This function is also used by toJSON method of the JS class Date and JSON.stringify during the conversion of complex objects:
The instances of Date implement the toJSON() function by returning a string (the same as date.toISOString()). Thus, they are treated as strings.
So, when importing JSON inside R, a manual conversion of dates is required.
Converting characters to Date with as.Date
An easy starting point is the as.Date
function from base R:
orders$MyDate <- as.Date(orders$Date)
Looking at the head of the data frame will show the new MyDate
column:
> head(orders)
ID Total Date MyDate
1 14.80 2019-07-04T08:51:26.442Z 2019-07-04
2 10.60 2019-09-23T10:11:38.566Z 2019-09-23
3 0.02 2019-10-11T13:02:55.907Z 2019-10-11
The as.Date
function converts only the date part, leaving out time, as explained in the documentation:
[…] convert between character representations and objects of class “Date” representing calendar dates.
The default formats follow the rules of the ISO 8601 international standard which expresses a day as “2001-02-03”.
Converting characters with as.POSIXct
Base R contains more classes to represents date with times:
There are two basic classes of date/times. Class “POSIXct” represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. Class “POSIXlt” is a named list of vectors (representing seconds, minutes, hours, day, month, year, day of the week, day of the year, daylight saving, zone, gmt offset).
With as.POSIXct
(and the format
option) you can parse the JS date easily:
orders$PosixDate <- as.POSIXct(orders$Date, tz = "UTC", format = "%Y-%m-%dT%H:%M:%OSZ")
> head(orders)
ID Total Date PosixDate
1 14.80 2019-07-04T08:51:26.442Z 2019-07-04 08:51:26
2 10.60 2019-09-23T10:11:38.566Z 2019-09-23 10:11:38
3 0.02 2019-10-11T13:02:55.907Z 2019-10-11 13:02:55
Now the PosixDate
column contains a PosixCt
date/time object that can be used to further analyze your data.
More
Loading JSON data from an URL
MyOrgs <- fromJSON("https://api.github.com/users/lifeisfoo/orgs")
Extract part of the POSIXct date as a new column
orders$Hour <- format(orders$PosixDate, "%H")
Convert UNIX timestamps to POSIXct
# with seconds expressed as an integer
as.POSIXct(1592204799, origin = "1970-01-01")
# with seconds as a string
as.POSIXct(as.integer("1592204799"), origin = "1970-01-01")
# with milliseconds (default JS Date format) and loss of precision
as.POSIXct(as.integer(1592205386848 / 1000), origin = "1970-01-01")