Reading JSONL with Rails
A year or so ago I mothballed one of my projects by dumping the database to AWS DynamoDB and throwing a static JS site up on AWS S3. This allowed me to still query the data, but only pay for it when I was actually using it rather than keeping the $20/mo server running 24/7. Recently though, I’ve decided to pick the project back up and bring it out of retirement.
I thought I had a backup of the Postgres database before I shut the server down, but time makes fools of us all and I wasn’t able to find where I put it. Whoops. At least I have the live database running in DynamoDB though! Except one problem: DynamoDB exports in JSONL format. JSONL format breaks Ruby’s JSON parser because… well because it’s not exactly standard JSON.
The part where I explain JSONL vs JSON
Feel free to scroll down or click here if you’re not interested in this and just want to see the solution.
JSON is key/value pairs with some curly braces around it. Something like
{
"employee": {
"name": "sonoo",
"salary": 56000,
"married": true
}
}
This matches how Javascript manages objects, hence JavaScript Object Notation. Some developers though (AWS included) don’t like how strict this format becomes in large and continuously variable datasets. Basically, it isn’t “webscale” for streaming transactions. If something changes in the middle of a write, you might have to rewrite the whole file. So someone along the way came up with the idea of JSON Lines (JSONL). JSONL uses JSON, but not in a hierarchical format. Each line is straight-up JSON, but each line is treated as an independent string of JSON.
Which means, when you export from DynamoDB, you get not a JSON file, but a file containing a bunch of JSON lines. Each line is JSON, but the entire file is not. Which means each line needs to be read individually, then parsed as JSON. As far as I know, Ruby doesn’t do this out of the box.
What I ended up doing
Knowing how JSONL works, it’s not a big leap to figure out how to parse it. I have my JSONL file called “import.jsonl” that I need to import every line of, and here’s how I do it in my model.
def self.importj()
require 'json'
file = open("#{Rails.root}/public/import.jsonl")
File.readlines(file).each do |line|
parsed = JSON.parse(line)
puts(parsed) #parsed is now a single line of parsed JSON
end
First I open the file in my public directory, then read each line one by one in a loop. Inside that loop, I parse the JSON and do whatever I need to do with it (in my real-world case, add each field inside each line to the database).
Does any of this make my job easier? Absolutely not. Is it faster? Heck no, it’s dramatically slower for me both to implement and to import. I get zero advantage from AWS’s decision. But it’s easy to overcome and didn’t take that much thinking once I understood what JSONL really was.
And maybe next time I’ll actually save a .sql file somewhere (and remember where I put it).