miliucci.org

Software developer

(currently in love with Vidra.com)

Parse JSON with R - with dates

As at June 2020, StackOveflow.com counts nearly half a million views to four questions about JSON and date parsing in R:

In many programming languages dealing with JSON and dates is an easy task, but R is a unique programming language and it’s easy to lost some hours struggling with them when you’re doing this for the first time.

Example dataset

We’ll use a very simple orders.json example file, containings data about orders:

[
  { "_id" : "1", "total" : 14.8, "created_at" : "2019-07-04T08:51:26.442Z"},
  { "_id" : "2", "total" : 10.6, "created_at" : "2019-09-23T10:11:38.566Z"},
  { "_id" : "3", "total" : 0.02, "created_at" : "2019-10-11T13:02:55.907Z"},
  // ...
]

Loading with jsonlite

Importing JSON data and renaming colums is a trivial with the jsonlite package:

library(jsonlite) // load the library

orders <- fromJSON("orders.json") // load the json file
colnames(orders) <- c('ID', 'Total', 'Date') // rename columns with better names

A quick review of the data will show the shape of imported data:

summary(orders)

      ID                Total            Date             
Length:500         Min.   :  0.01   Length:500          
Class :character   1st Qu.: 24.75   Class :character  
Mode  :character   Median : 47.13   Mode  :character  
                   Mean   : 83.98                                        
                   3rd Qu.: 97.85                                        
                   Max.   :905.00                                        

The json file was successfull loaded but the Date column was recognized as a simple string column (character). This is not wrong, because dates aren’t part of the JSON specification.

In JSON, a string like "2019-10-11T13:02:55.907Z" is just a string generated by toISOString method of the JS class Date:

The toISOString() method returns a string in simplified extended ISO format (ISO 8601), which is always 24 or 27 characters long (YYYY-MM-DDTHH:mm:ss.sssZ or ±YYYYYY-MM-DDTHH:mm:ss.sssZ, respectively). The timezone is always zero UTC offset, as denoted by the suffix “Z”.

This function is also used by toJSON method of the JS class Date and JSON.stringify during the conversion of complex objects:

The instances of Date implement the toJSON() function by returning a string (the same as date.toISOString()). Thus, they are treated as strings.

So, when importing JSON inside R, a manual conversion of dates is required.

Converting characters to Date with as.Date

An easy starting point is the as.Date function from base R:

orders$MyDate <- as.Date(orders$Date)

Looking at the head of the data frame will show the new MyDate column:

> head(orders)
  ID  Total         Date               MyDate
  1   14.80 2019-07-04T08:51:26.442Z 2019-07-04
  2   10.60 2019-09-23T10:11:38.566Z 2019-09-23
  3    0.02 2019-10-11T13:02:55.907Z 2019-10-11

The as.Date function converts only the date part, leaving out time, as explained in the documentation:

[…] convert between character representations and objects of class “Date” representing calendar dates.

The default formats follow the rules of the ISO 8601 international standard which expresses a day as “2001-02-03”.

Converting characters with as.POSIXct

Base R contains more classes to represents date with times:

There are two basic classes of date/times. Class “POSIXct” represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. Class “POSIXlt” is a named list of vectors (representing seconds, minutes, hours, day, month, year, day of the week, day of the year, daylight saving, zone, gmt offset).

With as.POSIXct (and the format option) you can parse the JS date easily:

orders$PosixDate <- as.POSIXct(orders$Date, tz = "UTC", format = "%Y-%m-%dT%H:%M:%OSZ")

> head(orders)
  ID  Total         Date                  PosixDate
  1   14.80 2019-07-04T08:51:26.442Z 2019-07-04 08:51:26
  2   10.60 2019-09-23T10:11:38.566Z 2019-09-23 10:11:38
  3    0.02 2019-10-11T13:02:55.907Z 2019-10-11 13:02:55

Now the PosixDate column contains a PosixCt date/time object that can be used to further analyze your data.

More

Loading JSON data from an URL

MyOrgs <- fromJSON("https://api.github.com/users/lifeisfoo/orgs")

Extract part of the POSIXct date as a new column

orders$Hour <- format(orders$PosixDate, "%H")

Convert UNIX timestamps to POSIXct

# with seconds expressed as an integer
as.POSIXct(1592204799, origin = "1970-01-01")

# with seconds as a string
as.POSIXct(as.integer("1592204799"), origin = "1970-01-01")

# with milliseconds (default JS Date format) and loss of precision
as.POSIXct(as.integer(1592205386848 / 1000), origin = "1970-01-01")