Web tools are what I call sites like Waralytics and WAR Watcher. They take information from one place and use it to provide a new service. This post aims to serve as an introduction to how to create such a tool. It will require some programming knowledge, but nothing difficult. If you don't know how to program, now is a great time to start!
Purpose: First off, what do we want to do? This example will show a list of Warhammer Online servers. This may not be particularly useful by itself, but it serves as a great starting point.
Where is the data? Next, we need to find a place to get the server list. It just so happens this information is readily available on the Mythic Realm War page. So the answer to our question is: http://realmwar.warhammeronline.com/realmwar/Index.war
Yep, it is just the web page. More often than not, this will be the primary source for web tools. other times it is an actual Web Service or XML files.
Get the data. Now we need to do 3 basic steps: Read the web page, pull out the information we want and then display it.
If you haven't already, bring up the webpage in your browser of choice. When loaded, view it's source (Right Click, View Page Source). You should see a lot of HTML. That is what we will have to go through to pull out the server information.
What we are doing is commonly called parsing or screen scraping. It is not a very optimal way of doing things, but we have little choice. To do this, I use the Python programming language. Most any scripting language (i.e. Perl, Ruby, VBScript) can do the same. They are easy to learn (compared to C) and have a number of tutorials available.
To make my life even easier, I use a Python module called BeautifulSoup. This module goes through all of that HTML and parses it for us, so we can easily navigate our way through. Now, on to the code!
Here we go get the web page. It will put all that raw HTML code in the html variable.
url = 'http://realmwar.warhammeronline.com/realmwar/Index.war'
response = urllib2.urlopen(url)
html = response.read()
Next, we give it to BeautifulSoup to parse and do all the hard work for us.
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
We still need to figure out where our data is in that web page. From looking at it, we can see that the server name is contained within a div tag, with a class name of PairSelect-Name. That should do nicely, so lets pull it out.
for x in soup.findAll('div'):
if x.has_key('class') and x['class'] == 'PairSelect-Name':
In the above code, we go through each div tag, and find one that has a class attribute and make sure it has the name we want. If it does, we pull out its anchor tag (a) and use the string version (x.a.string). The string version will give us what the anchor tag contains, which is the server name.
That is it! It only took about 9 lines to pull down the server names.
In the next installment, I will show how you can make this information available to everyone else.