Wednesday, December 10, 2008

Paged Enumerable

Scraping some pages with WWW::Mechanize and finding myself needing to download multiple pages. So I wrote a paged enumerable module. Paged enumerable does everything Enumerable does, except instead of implementing a meaningful each you implement a meaningful each_page which yields an array or items per page.

to use the paged enumerable, you would write a class like this:

class MultiplePageSearch
include PagedEnumerable

def each_page
10.times { |page| yield download_page(page) } # simulate 10 pages
end

private
def download_page(page)
puts "downloading page #{page}..."
sleep 1 # simulate slooow operation
start = page*10

start...(start+10)
end
end

paged = MultiplePageSearch.new
puts paged.any? {|x| x > 50} # will only hit 5 pages
paged.each {|x| puts x} # will process all pages

The implementation of PagedEnumerable is quite simple

module PagedEnumerable
def self.included(obj)
obj.send :include, Enumerable
end

def each(&blk)
each_page { |page| page.each(&blk) }
end
end

and in most cases you would cache the pages for better performance on a second pass.

No comments: