Wednesday, December 10, 2008

Paged Enumerable

Scraping some pages with WWW::Mechanize and finding myself needing to download multiple pages. So I wrote a paged enumerable module. Paged enumerable does everything Enumerable does, except instead of implementing a meaningful each you implement a meaningful each_page which yields an array or items per page.

to use the paged enumerable, you would write a class like this:
class MultiplePageSearch
  include PagedEnumerable

  def each_page
    10.times { |page| yield download_page(page) } # simulate 10 pages
  end 

private
  def download_page(page)
    puts "downloading page #{page}..."
    sleep 1 # simulate slooow operation
    start = page*10

    start...(start+10)
  end 
end

paged = MultiplePageSearch.new
puts paged.any? {|x| x > 50} # will only hit 5 pages
paged.each {|x| puts x} # will process all pages

The implementation of PagedEnumerable is quite simple
module PagedEnumerable  
  def self.included(obj)
    obj.send :include, Enumerable
  end 

  def each(&blk)
    each_page { |page| page.each(&blk) }
  end
end

and in most cases you would cache the pages for better performance on a second pass.

No comments: