I've been experimenting with Groovy lately and I wanted to create a simple script to help me learn more about Groovy. The main advantage to this kind of strategy, is that it's relatively easy to do, and gives you very consistent results that are easy to parse. Also, if you're tired of wading through job postings at companies you're not interested in, this gives you a way of focusing your efforts on a particular set of companies.

Setting Things Up

Before starting this project, I downloaded the latest release of Groovy, which includes the groovyConsole application. This makes it easier to develop your scripts and run them. You can download this from the Groovy website. Since I'm going to be parsing XML, I wanted to make sure that I had the write libraries. Groovy uses the ~/.groovy/lib directory to pick up user-specific libraries, so I downloaded and installed the Apache Xalan libraries into the directory.

I puttered around with the Groovy Eclipse plugin but I couldn't quite get what I wanted. I wanted it to simply execute a groovy script and show me the results in the console. Instead, it seemed to insist that I create a class for everything. If I had wanted to do that I could have stuck with Java, but since the goal was to teach myself the basics of Groovy, I used the Groovy Console instead.

The next step was to bookmark the RSS feeds that I wanted. I tagged each of the bookmarks as jobs/rss to help distinguish them from other sites that I had bookmarked.

The next step was to download the bookmarks from del.icio.us. The del.icio.us REST API normally requires authentication; however, if you all you want to do is fetch an RSS version of the bookmarks with a particular tag, then no authentication is required. Let's take a look at the code for this:

// write to console
def Writer writer = new PrintWriter(System.out);
writer << "";

// fetch the delicious rss feed and parse it
def deliciousRoot = fetchData("http://feeds.delicious.com/rss/mfortner/jobs/rss");
parseDeliciousFeed(deliciousRoot, writer);
writer.flush();

writer << "";
writer.close();

Since I'm going to be testing as I write the code, I want to output the results to the console. I know also that I'm going to eventually be writing the results to a file, so I define a writer that I'm going to use. Initially I'll define a print writer to wrap the System.out print stream -- that lets me write to the console. Afterwards, I'll replace it with a FileWriter and write the results to an output file.

I create a fetchData function that downloads and parses the RSS feed and gives me a handle on the root node of the RSS feed.

/**
* This function fetches the contents of the specified URL string and
* returns the root node of the document.
* @param urlStr - a valid URL string.
* @return the root node of the document
*/
def fetchData (urlStr ){
def url = new URL(urlStr);
def doc = DOMBuilder.parse(new InputStreamReader(url.openStream()))
return doc.documentElement
}

Once I have the data, then I need to extract each of the bookmarks out of the RSS feed using the parseDeliciousFeed function.

/**
* This function parses a jobs feed from delicious.
* @param root The root node of the document.
* @param writer A writer used to output the results.
*/
def parseDeliciousFeed(root, writer){
def items = root.getElementsByTagName("item");
def feedRootNode = null;
def feedUrl = null;

for ( item in items){
feedUrl = getSubTagValue(item, "link");

feedRootNode = fetchData(feedUrl);
parseJobFeed(feedRootNode, writer);
}
}

I use the getElementsByTagName method to extract each of the bookmarks. These bookmarks are stored in a NodeList called items which I can then loop through. Inside the load, I get the link for each bookmark, and download the data for each bookmark entry. I then call parseJobFeed to actually parse each of the job entries out of the bookmarked RSS feed. The getSubTag function is a convenience function to get any subtag from a node.

/**
* This function parses an RSS feed from a job site and gives you the relevant listings.
* @param root the document root.
*/
def parseJobFeed(root, writer){
def jobs = root.getElementsByTagName("item");
def channel = getSubNode(root, "channel");
def channelTitle = getSubTagValue(channel,"title");
writer << "

" << channelTitle << "

n";

def link = null;
def title = null;
def desc = null;

for (item in jobs){
link = getSubTagValue(item, "link");
title = getSubTagValue(item, "title");
desc = getSubTagValue(item, "description");
writer << "<< link << "">";
writer << title;
writer << "
n";
writer << "

" << desc << "

";
}
}

/**
* This function gets the text for a subtag for a given node.
*/
def getSubTagValue(node, nodeName){
return node.getElementsByTagName(nodeName).item(0).textContent
}


The Finishing Touches

This is fine and dandy for viewing the results in the GroovyConsole, but there are a couple of finishing touches that are needed. I want to be able to execute the script from the command line so that I can stick the script into a crontab and run it periodically. I also want to be able to pass a URL as a parameter rather than having it hard coded in the script itself. To do this, I use the CLIBuilder which is part of Groovy. This is based on the Apache Commons CLI project, and makes it easy to get the parameters you want from the command line.


def cli = new CliBuilder( usage: 'groovy fetch_delicious.gs' )
cli.h(longOpt: 'help', 'usage information')
cli.u(argName:'url', longOpt:'url', args: 1, required: true,
'delicious rss feed url')
cli.o(argName:'out', longOpt:'out', args: 1, required: true,
'output file')


def opt = cli.parse(args)
println "-o: " << opt.o;
println "-u: " << opt.u;

if (!opt) return

if (opt.h) cli.usage()


To run the script I entered the following command:

 

groovy fetch_delicious.gs --url=http://feeds.delicious.com/rss/mfortner/jobs/rss --out=./jobs2.html



In addition to running it from the console, I want to output the results to file. So I changed writer definition line to read:


def Writer writer = new BufferedWriter( new FileWriter(opt.o));

The Good, The Bad and The Ugly

The Groovy Console was a good choice for trying out little snippets of code. I was able to do "twitch coding" -- where you scribble a little code, execute it, go fix the problem, and start over. However, GroovyConsole a lot of limitations that made it poorly suited for a Groovy newbie like me:
  • No line numbers: Since all of the error messages use line numbers, it becomes rather tedious to count line numbers to find the likely culprit.
  • No Code Lookup: Since I'm a newbie, I need all of the help I can get to figure out what functions are available. I found myself trying out snippets of code in Eclipse just so I could see
  • No debugger: I guess most people who do a lot of scripting, feel that debuggers are unnecessary. After all you have the println statement, what more do you need. But the nice thing about a debugger is the fact that you're not littering up your code with println statements that you have to remove.

Acknowledgements

Many thanks to the writers and publishers of the Groovy In Action eBook which definitely made the process of learning Groovy easier.



Your Option (Login or Post by anonymous)