Archive for August, 2008

A Tale of Two Cities

Thursday, August 28th, 2008

How else could one title this post? Here’s the story: Birmingham City Council in the UK recently sent out 720,000 leaflets advertising their services; the picture on the leaflet, however, depicts a different city with the same name: Birmingham, Alabama (US). Classic local authority snafu.

It’s clear that the individual charged with illustrating the pamphlet searched for ‘Birmingham’, found what looked like a nice skyline, and failed to fact-check. It is also possible that this individual never realized that there are multiple towns called ‘Birmingham’ (eighteen, actually). Whatever the cause, the story highlights some fundamental-but-oft-overlooked challenges in geoparsing that we embrace at Yahoo! Geo Technologies.

Geoparsing is of course the process of identifying places referenced in free- or unstructured text, and is the essential ingredient of any system where we want to geolocate content with machine analysis. The two steps of successful geoparsing are (1) token identification, and (2) geographic disambiguation. Let’s take a look at each briefly:

The first step in geoparsing is token identification: identifying place-names, such as ‘Wayne’ or ‘The Bay Area’, in unstructured content like newspaper articles or web pages, while ensuring at the same time that one does not falsely identify terms like ‘New England Clam Chowder’ as a place (a post on our fun with these potential false-positives will follow).

But token identification is the easy half of the battle; many entity-recognition applications, like the otherwise excellent OpenCalais are not capable of geotagging the above BBC article on ‘Birmingham’, for example, as it — correctly — identifies seven ‘Birminghams’, but does not tell us whether those referred to within are the UK city, one of its seventeen US namesakes, or a mix of both. (You can try this yourself with any text using the Calais Viewer.) Human cognition can certainly determine this with a quick read-through, but we’re looking at machine parsing specifically here.

To do this properly, we first require the means to refer to a place in a permanent, unambiguous, and machine-friendly manner: usually this is attempted by expanding the geographic context so that the token ‘Wayne’, when found in text, can be indexed as ‘Wayne, PA, USA’; this works sometimes but is hardly machine-friendly. (Furthermore, there are ten towns called ‘Wayne’ in Pennsylvania, so the above string gets us no closer to our goal.) In truth, string-based indexing will always have its exceptions, so we have opened GeoPlanet, our gazetteer of places and their unique Where-on-Earth Identifiers (WOEIDs), to provide the vocabulary to describe the world’s places without ambiguity.

So, now that we’ve found the correct tokens (’Wayne’) in our hypothetical text, and dismissed misleading, place-sounding terms (’Yorkshire Pudding’), we then determine which place, of all the places with that name, is specifically being referenced. This is geographic disambiguation (or geodisambiguation for the portmanteau-inclined). Let’s take for example ‘Rome’, of which there are over thirty: there is of course ‘the’ Rome, in Italy (WOEID: 721943), and for many of us, this is the only Rome we know. However, residents of Rome, Georgia (WOEID: 2484261) would argue otherwise. This highlights the problem: how can we be certain which place is being referenced when we have only ‘Rome’ in the text? Obviously the language helps in some instances, as does context (is ‘Georgia’ or ‘Italy’ mentioned elsewhere in the document?). But when geodisambiguating at Yahoo! (and this is the fun bit), we take into account the location of the user (or publisher) to capture the ‘locality’ of the term, and really put geography in the first-person. For example, although ‘Rome’ by itself will usually refer to ‘Rome, Italy’, the probability of its referring to ‘Rome, Georgia’ increases as we move geographically towards the latter. This approach ensures that Yahoo! returns the ‘correct’ city when a search for ‘Birmingham’ is performed in the UK, compared to the same search in the US. This approach ensures that content originating from Rome, Georgia will be geoparsed and disambiguated correctly to the correct and local ‘Rome’.

Acknowledging that geography is in the eye of the beholder is just one way that Yahoo! Geo Technologies provides our users with the most personally georelevant results. Shame Birmingham Council did not come to us first.

Tyler Bell, Advanced Products Manager, Yahoo! Geo Technologies

Your Location, Your Data

Friday, August 15th, 2008

On Tuesday the talented team at Brickhouse launched Fire Eagle, Yahoo!’s user location management platform, to loud acclaim.

I’ve been a huge fan of Fire Eagle since its inception — as a business driver it was conceived to slice horizontally through the vertical towers that now dominate the Location Based Services landscape. Its launch not only returns ownership of User Location to the hands of the user, but undoubtedly triggers the tearing out of hair and significant re-writing of business plans; this can only be good — opening user location ensures that new businesses are built on the opportunities this affords, and not on the ‘captive audience, closed service’ concepts that currently dominate.

The product ethos is focused wholly on protecting user privacy while exposing the power of location; this is what really is most impressive. While developers usually tout the ‘heavy lifting’ that Fire Eagle does to make geolocation appear easy (more on this below), I would suggest that Fire Eagle’s greatest success is the care and attention evident in the product to ensure that users have complete control over who has access to their location, and at what granularity this is exposed. Far from shying away from the complex and at times intractably confused technical and policy issues surrounding user location, privacy, and geolocation, the Fire Eagle Team has carefully met them head-on and delivered a well-conceived, innovative, and enabling technology. This is delicate ground, certainly, and each subsequent step must be taken with similar diligence, but I am very excited to see what new ideas, products, and businesses emerge from Open User Location.

I am, of course, hardly an unbiased observer: the Yahoo! Geo Technologies team provides the machinery that performs the aforementioned ‘heavy lifting’. Our tech helps Fire Eagle determine where on earth its users are, assists with its geographic granularity protection, and ensures that developers can integrate the geographic data returned by Fire Eagle with other systems via ‘where on earth’ IDs (WOEIDs) and our GeoPlanet Web service. In a similar manner we also power the geoinformatics underlying Flickr’s new geotagging service. Combined with the significant geo wizardry and craft of the Fire Eagle, Flickr, and other teams at Yahoo!, we are continuing to provide the tools and platforms to spatially enable the Web and provide our users with the most personally georelevant experience possible.

Tyler Bell, Advanced Products Manager, Yahoo! Geo Technologies

Yahoo! GeoPlanet Forums Are Now Active

Wednesday, August 13th, 2008

Thanks to the good folk of the Yahoo! Developer Network we now have a set of forums dedicated to discussing all things associated with Yahoo! GeoPlanet.

You’ll find forum categories for requesting enhancements, showcasing applications or demos which make use of GeoPlanet, requests for using GeoPlanet in a commercial environment and general discussions, conversations and Geo related chat.

Members of the Yahoo! Geo Technologies group will be on the forums and we look forward to meeting and chatting with you all there.

You can find out more by pointing your browsers to the Yahoo! Developer Network GeoPlanet forums at http://developer.yahoo.net/forum/index.php?showforum=31 now.

Cheers,

Gary Gale, Head of UK Engineering, Yahoo! Geo Technologies