odbc R Package Truncates JSON at 255 Characters

Jul 29, 2018 · 409 words · 2 minutes read Rdatabasesodbcpostgres

If you know me you probably know I’m a Postgres evangelist. One of the truly great features of Postgres is its native JSON(B) support.

If you’re a data scientist using R and a Postgres database, you’re probably using the odbc package. But if you’re using odbc in R you can run into a problem: odbc truncates JSON at 255 characters.

Here’s an example with a 457 character JSON blob.

First let’s create a temporary table with a JSON column. Temporary tables are automatically dropped at the end of a session, so we don’t need to remember to drop it later. Then we’ll insert an object with a key called text and value containing lorem ipsum.

CREATE TEMPORARY TABLE json_test (
  blob JSON
);

INSERT INTO json_test
VALUES ('{"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."}');

We can verify that the record was inserted correctly by casting the blob to text and checking the number of characters.

SELECT CHAR_LENGTH(blob :: TEXT) AS nchar
FROM json_test;
nchar
457

The query above returns 457 as expected, but look what happens when we try to select this record using odbc and dplyr:

tbl(conn, "json_test") %>%
  collect() %>%
  pull() %>%
  nchar()
## [1] 255

As you can see the data was truncated at 255 characters. I’m not exactly sure where the problem lies - somewhere in odbc (possibly here). If we cast the data as text before calling collect then it works fine.

tbl(conn, "json_test") %>%
  mutate(
    blob = as.character(blob)
  ) %>%
  collect() %>%
  pull() %>%
  nchar()
## [1] 457

This suggests odbc (or nanodbc) doesn’t know how to handle JSON and falls back to VARCHAR(255).

As it turns out I’m not the only one with this problem. Hopefully someone (who knows more than me) can put in the time to come up with a more elegant solution. It would be really nice if there were a way to register types similar to psycopg2 in Python. That way users could provide support for their own types. For now just try to remember that the odbc R package truncates JSON at 255 characters.