Yahoo Pipes Regex Module

Yahoo Pipes recently added a powerful module called Regex. Regex, if you’re not familiar with the general term, is short for Regular Expression. Regular expressions are a pseudo-mathematical notation that allows you to specify very powerful pattern matching for processing text data. Yahoo Pipes’ Regex module lets you do the same for web feeds – an idea long overdue. Previous options were to write your own parsers or use XSLT processors on the XML data. This is an idea that didn’t catch on.

Now, Yahoo Pipes makes it easy to manipulate web feed data. Unfortunately, it’s limited in that you can replace/remove text patterns but not extract them. Still, it’s pretty powerful and will come in handy for numerous Pipes applications.

An example: You want to take the current top seeded stories at the social bookmarking site Digg (feed: http://digg.com/rss/index.xml), find the username of the person who submitted each story, and insert that name in square brackets, “[]“, in each respective item’s title, then sort the output feed by item title. Here is some sample output directly from my Yahoo Pipes Digg home page sorted by submitter pipe:

[Andy.D] IBM doubles basic CPU cooling capabilities
[Blakovitch] Halo 3 Xbox 360 Case Mod
[Blakovitch] Why Do Great Movies Get Terrible DVD Cover Art?
[BradGroux] Black Xbox 360 “Elite” Spy Shots
[Cinin] Color matching sphere (Great for people that need color schemes)
[DiggityMcDigg] Microsoft says Vista sales doubling Windows XP pace

The text in brackets is the username that submitted the story. My pipe, which you can clone and edit, just inserts the username into each title. Very simple, but illustrates one use. Problem is, Pipes isn’t powerful enough (yet?) to group information. So if you wanted to count how many home page stories were submitted by each user, you’d have to use a custom application to parse and mine the web feed from my pipe. Hopefully Yahoo will add ever more powerful operators to Pipes, making it a full-blown visual interface with the power of database querying, but for RSS XML.

Now for those of you that are code geeks, you’ll understand when I say that Pipes’ Regex supports the regex syntax of the Perl scripting language. But you don’t need to know Perl to understand regexes. I’m going to be doing a mini-series here on how to use the Regex module in Yahoo Pipes.

Be forewarned that my Pipes Regex series will be tech-heavy, but still requires no programming. So if you’re not well-versed with regex syntax and you have a specific text manipulation you need done on a web feed, drop a comment here, specify the feed URL, and what you want to do. I will try to showcase the solution for any requests.

(c) Raj Kumar Dash / Chameleon Integration Systems.

How Can RSS Power Your Internet Marketing and Publishing?
Find out more in the most comprehensive and best guide on RSS for marketers, as acclaimed by leading RSS experts, developers, marketers and publishers.
Click here and get the step-by-step guide to taking full marketing advantage of RSS.

View full post on RSS Cases – From Technology to Praxis

Related posts:

  1. RSS Cases – Mon Mar 26, 2007
  2. Teqlo Web Feed and Application Mashup Tool
  3. The History and Future of RSS?
  4. Using RSS Radars to Find Domains for SEO/SEM
  5. Using RSS Radars to Find Domains for SEO/SEM

Buy Instrumental Beats

No Comments

Leave a reply

Get Adobe Flash playerPlugin by wpburn.com wordpress themes

Powered by Yahoo! Answers