Materialized View Alternatives

A while back I wrote about materialized views in Postgres. My main motivation was making Postgres’ full-text search faster, and to be able to filter and sort using indexes when searching over data coming from many different tables. Since then, I’ve learned about some alternatives that I want to share.

First, a note about why materialized views are not always the right tool for the job:

1. When creating materialized views, you end writing SQL that duplicates logic you’ve already implemented in Ruby, like this:

class Person < ActiveRecord::Base
  def relatives_names
    relatives.map(&:name).join(", ")
  end
end


class CreateMissionsDatatables < ActiveRecord::Migration
  def up
    execute <<<-eos
      CREATE MATERIALIZED VIEW denormalized_people
      AS
      SELECT
        people.*,
        string_agg(relatives.name, ", ") as relative_names
      FROM people
        LEFT OUTER JOIN relatives ON people.id = relatives.person_id
    eos

In that example, you have to write two versions of how to find and format the relative_names method: one in Ruby and one in SQL. You also duplicate other things such as knowledge of how to join the people and relatives tables (you probably already have this as a has_many association in ActiveRecord).

2. It is pretty messy to write all the SQL to keep a materialized view up to date, even when using my ActiveRecord Materialized Views gem.  It’s even messier to do so in a way that updates rows immediately for 1-to-1 updates, but invalidates them for later updating for 1-to-n updates. (If that last sentence made no sense to you, then you can either check out Dan Chak’s chapter on materialized views, or you can just take my word that optimizing materialized view update triggers is NOT something you want to be thinking about)

Now, the first alternative I can recommend is Sunspot. Backed by the Solr search engine, Sunspot builds indexes that can be used either for full-text search, or for filtering and ordering. Sunspot can index the results of Ruby methods from your model layer, which means no more need to duplicate your logic in SQL!  Furthermore, it is easy to configure Sunspot to update indexes in the background, so you can worry less about optimizing performance of updating your indexes. The drawback is that Sunspot runs as a separate process, making your test suite and production setup all that more complicated.

The next alternative is probably not production ready, but it is an exciting big-picture idea about the future of databases. In a talk called Turning the Database Inside Out, Martin Kleppmann notes that there is this fundamental conflict in database design, between optimizing for DRYness and write speed through normalization, and optimizing for read speed through denormalization. Furthermore, write-ahead-logs are thought of as an implementation detail that are not really part of a database’s public API.  Kleppmann is working on a project called Samza that is rethinking that, by promoting the log to be the basic unit of data storage (that is, optimizing for write speed because appending to a log is very fast), then streaming those logs to other processes that build materialized views on top of them (that is, optimizing for speed on the read side as well).  You should watch the talk: it’s very interesting, and Kleppmann does a better job of explaining his project than I do.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s