2024-10-26
Stream Database Rows with sequel_pg
Reading the docs (thoroughly, maybe twice) can be a good thing.
As the haiku goes:
Pages filled with light,
Guiding hands through code's maze,
Errors fade away.
TIL
I've been using Sequel for a while now. But up until today, I didn't know that I
could stream database rows with sequel_pg. This will reduce memory usage when
working with large datasets and if I understand correctly, it will also speed up
the processing of the data.
If your database is postgres,
sequel_pg should be already your
go-to gem.
Yet, you might not know that you can stream database rows with it.
== Streaming
If you are using PostgreSQL 9.2+ on the client, then sequel_pg should enable
streaming support. This allows you to stream returned rows one at a time,
instead of collecting the entire result set in memory (which is how PostgreSQL
works by default). You can check if streaming is supported by:
Sequel::Postgres.supports_streaming?
If streaming is supported, you can load the streaming support into the database:
DB.extension(:pg_streaming)
Then you can call the Dataset#stream method to have the dataset use the
streaming support:
DB[:table].stream.each{|row| ...}
If you want to enable streaming for all of a database's datasets, you can do the
following:
DB.stream_all_queries = trueIn my project, I set up the streaming like this (the most basic way, I guess):
DB.extension(:pg_streaming)
DB.stream_all_queries = trueAnd voilà, free speed and memory savings. 🚀
Side Note
If you were using use_cursor in your project, make sure to benchmark it
against the streaming feature. It can be faster than using a cursor (in a rare
circumstance). In my case, streaming was the clear winner.
← Previous Post | Next Post →