↖️ Show all posts

Stream Database Rows with sequel_pg

Reading the docs (thoroughly, maybe twice) can be a good thing.

As the haiku goes:

Pages filled with light,
Guiding hands through code’s maze,
Errors fade away.

TIL

I’ve been using Sequel for a while now. But up until today, I didn’t know that I could stream database rows with sequel_pg. This will reduce memory usage when working with large datasets and if I understand correctly, it will also speed up the processing of the data.

If your database is postgres, sequel_pg should be already your go-to gem.
Yet, you might not know that you can stream database rows with it.

== Streaming

If you are using PostgreSQL 9.2+ on the client, then sequel_pg
should enable streaming support.  This allows you to stream returned
rows one at a time, instead of collecting the entire result set in
memory (which is how PostgreSQL works by default).  You can check
if streaming is supported by:

  Sequel::Postgres.supports_streaming?

If streaming is supported, you can load the streaming support into the
database:

  DB.extension(:pg_streaming)

Then you can call the Dataset#stream method to have the dataset use
the streaming support:

  DB[:table].stream.each{|row| ...}

If you want to enable streaming for all of a database's datasets, you
can do the following:

  DB.stream_all_queries = true

In my project, I set up the streaming like this (the most basic way, I guess):

DB.extension(:pg_streaming)
DB.stream_all_queries = true

And voilà, free speed and memory savings. 🚀

Side Note

If you were using use_cursor in your project, make sure to benchmark it against the streaming feature. It can be faster than using a cursor (in a rare circumstance). In my case, streaming was the clear winner.


⬅️ Read previous Read next ➡️