Skip to content

frestyl/queryalize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Queryalize lets you use Rails 3 to build queries just like with ActiveRecord::QueryMethods, except you can serialize the end result. This is useful for running queries that potentially return large result sets in the background using something like Resque or Delayed::Job.

Normally, using ActiveRecord::QueryMethods, you build queries like this:

query = User.where(:name => "something").order("created_at DESC")

With Queryalize, it's only a little different:

query = Querialize.new(User).where(:name => "something").order("created_at DESC")

However, now you get all of this goodness:

# NOTE the following methods DO NOT query the database,
# they return a representation of the query itself in one 
# of the following formats

json = query.to_json # => query as json data
yaml = query.to_yaml # => query as yaml data
hash = query.to_hash # => query as ruby hash

new_query_from_json = Queryalize.from_json(json)
new_query_from_yaml = Queryalize.from_yaml(yaml)
new_query_from_hash = Queryalize.from_hash(hash)

Why?

Imagine, for example, that you have a database that organizes music into several genres. You have built an admin interface that allows the administrator to filter the catalog of music by genre, and run updates against the result set. However, the database is large, and the query for "electronica" returns 1,000,000+ results. The administrator wants to re-process these entries such that the genre is "electronic" (without the annoying 'a' at the end).

Unfortunately, your schema is setup in such a way that you cannot simply run a single "UPDATE." Rather, you must iterate through each individual record and update its genre. Ouch. There is no way you can allow this to happen during the request, or it will certainly timeout. So you decide to queue the update, but how do you tell the queue workers which records to update? You could try to capture just the ids from the records, but you'd still need to store 1,000,000+ ids somewhere so the queue worker can reference them later, not to mention that actually collecting the ids takes a healthy amount of time and memory, and will probably also time out. You could build up your query and then use to_sql to pass the raw SQL to the queue worker, but then you can't use useful methods like 'find_each' in the queue task.

The solution is to serialize the query you've built, and then rebuild it in the queue task. It ends up looking something like this (if you're using Delayed::Job):

query = Queryalize.new(Music).joins("JOIN #{Genre.table_name} ON #{Genre.table_name}.music_id = #{Music.table_name}.id").where(["#{Genre.table_name}.name = ?", 'electronica'])
# see 1. below

worker = GenreWorker.new({
  :update => 'electronic',
  :query  => query.to_json
})

Delayed::Job.enqueue(worker)

# 1.
# written this way to demonstrate chaining, but a slightly cleaner way would be:
# genres = Genre.table_name
# query  = Queryalize.new(Music)
# query  = query.joins("JOIN #{genres} ON #{genres}.music_id = #{genres}.id")
# query  = query.where(["#{genres}.name = ?", 'electronica'])

The GenreWorker class looks something like this:

class GenreWorker

    def initialize(args)
      @update = args[:update]
      @query  = args[:query]
    end
    
    def perform
      Queryalize.from_json(@query).find_each do |music|
        music.genre.update_attribute(:name => @update)
      end
    end
  end
end

Notice the query was serialized and reconstructed to its original state, so you can seamlessly use ActiveRecord features like find_each. Simple!

About

Serialize chainable queries constructed with ActiveRecord::QueryMethods

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages