more hip visualizations, part 0_1
A while back, I started messing around with data visualization stuff, and came up with a chloropleth North American map that attempted to show the places the Tragically Hip played most often. I now have a slightly shinier and more granular map that shows cities, which is step 0.1 on the way to glory. Not that I have any idea what constitutes ‘glory’ in this instance, but I’m told that life is about the journey, not the destination.
Once again, I started with the [admittedly unreliable] show archive on the Hip’s site. (Actually, I started with the .csv I generated last time, but whatever. That’s the data source. It’s still only North American shows, but I think it’d be easy enough to extend to the rest of them.) To get the shiny googlemap with markers for places they’ve played, I came up with the following steps:
- Pull a list of cities & states/provinces out of the csv.
- Ask google for the lat/lng coordinates of those cities.
- Turn said coordinates into xml tags.
- Feed the xml into googlemaps and get a marked map.
This went way better than my experiments last time, but there’s a lot I still want to do with this map to make it more useful interesting. It’s still some lazy, sloppy code, but here it is anyway.
Step one: get a list of cities to look up.
queries = []
reader = csv.reader(open('concerts.csv'), delimiter=",")
for row in reader:
if row[4].strip() in ['United States', 'Canada']:
city = row[2] + "+" + row[3]
queries.append(city)
locales = set(queries)
So that gives us a list of unique cities. (I may want to know that they’ve played Toronto 1938459 times or whatever, but I don’t need to look up Toronto’s coordinates more than once.) The way the geocode api works is that you pass in a URL with a bunch of parameters in the query string, and then it returns the coordinates in whatever format you’ve requested. So the next step is to set up all the junk to build those URLs.
scheme = 'http'
netloc = 'maps.google.com'
path = '/maps/geo'
params = ''
fragment = ''
query_dict = {'output': 'csv',
'sensor': 'false',
'key': 'REDACTED',
'q': '',}
all_query_strings = []
for city in locales:
query_dict['q'] = city
newqs = urllib.urlencode(query_dict)
all_query_strings.append(newqs)
all_urls = []
for place_qs in all_query_strings:
url = urlunparse((scheme, netloc, path, params, place_qs, fragment))
all_urls.append(url)
Yeah, I probably should have used list comprehensions there, but I tend to write those when I’m refactoring stuff. The first pass at something tends to be step by rudimentary step. At any rate, all_urls is now a list of URLs to give to google, which is step two.
f = open('coords.csv', 'wb')
# using regular file handling instead of the csv module because i am lazy
# and google sends back strings.
h = httplib2.Http()
for url in all_urls:
resp, content = h.request(url)
f.write(content)
f.write('\n')
time.sleep(1)
f.close()
There’s a limit to how many requests you can send in a given time period, but I don’t know what that limit is and am not in any hurry. Also, I should only have to run this once, so I’m not uptight about that one-second sleep in there. At any rate, I now have a csv called coords.csv that looks like this:
200,4,28.3936186,-81.5386842 200,4,32.9911550,-117.2711481
Etc. The format is: response code, accuracy, latitude, longitude. Now I want to turn all of that into xml, which is just straight-up string interpolation.
all_xml = open('coords.xml', 'wb')
reader = csv.reader(open('coords.csv'), delimited=",")
for row in reader:
latitude = row[2]
longitude = row[3]
xml_tag = "<marker lat='%s' lng='%s'/>\n" % (latitude, longitude, )
all_xml.write(xml_tag)
all_xml.close()
And that’s pretty much that. Add in the other xml stuff to make sure it’s a valid xml document, and then call the whole thing from an html file with some canned javascript.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<script src="http://maps.google.com/maps?file=api&v=2&key=REDACTED" type="text/javascript"></script>
<script type="text/javascript">
function initialize() {
if (GBrowserIsCompatible()) {
var map = new GMap2(document.getElementById("map_canvas"));
map.setCenter(new GLatLng(37, -92), 4);
map.setUIToDefault();
GDownloadUrl("coords.xml", function(data) {
var xml = GXml.parse(data);
var markers = xml.documentElement.getElementsByTagName("marker");
for (var i = 0; i < markers.length; i++) {
var point = new GLatLng(parseFloat(markers[i].getAttribute("lat")),
parseFloat(markers[i].getAttribute("lng")));
map.addOverlay(new GMarker(point));
}
});
}
};
</script>
</head>
<body onload="initialize()" onunload="GUnload()">
<div id="map_canvas" style="width: 1000px; height: 600px"></div>
</body>
</html>
Et voilá. An unuseful map of unlabeled markers, some of which are incorrect (like, why is there one in the Cayman Islands?). Having the coordinates, though, opens up a lot of possibilities for future awesomeness, which I’m sure someone else will come up with. My own current thoughts involve things like grouping the markers by year (or possibly by tour) and let people turn them on and off; and one marker per show would be better than one marker per city (although, in that case, do I stick with city markers and just say they’ve played Toronto a lot, or do I do the data massaging necessary to get coordinates for and show Lee’s Palace vs the Horseshoe vs the ACC?); and some way to make it collaborative and interactive would be completely amazing. Like, there would be one giant hipmap and if someone were to be logged into their google account, they could click on a marker for a show they’ve attended and add their info and their two cents. And who knows what I’m going to do about setlists. Something, something, something, someday, maybe.
