RSS aggregation using rails

I’ve been playing around with aggregating RSS feeds, and thought I’d share a few observations:

  • Handling all the different RSS variants is still not simple (many feeds aren’t even valid). After harvesting direct turned out to be challenging, I switched to using Google Base to authenticate with Google and then retrieving feeds from the undocumented Reader API (still waiting for official Google Data API) – eg. http://www.google.com/reader/atom/feed/http://www.yourfeedaddresshere.com. This way at least Google has done the work for you in terms of standardizing the feed to a common format, atom, and ensuring validity.
  • Converting contents encoded with HTML entities is easily undone using the htmlentities gem.
  • You can skip ruby-feedparser and simple-rss and move straight to hpricot, especially given the clean results from Google Reader.
  • A rake task is perfect for scheduling and queuing updates (make sure you throttle and cache appropriately).
Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • LinkedIn
  • PDF
  • StumbleUpon
  • Twitter
  • Reddit

2 Comments on “RSS aggregation using rails”