Posts Tagged ‘WOEIDs’

The First Annual Geo-Year Review

Friday, December 18th, 2009

The season of Christmas, Snow, Holidays, New Year, eating too much and getting caught up in the inevitable Christmas travel chaos is fast approaching but before we hang up our stockings for Santa to geotag, what sort of a geo-year was 2009?

In January we decided that people seemed to be having far too much fun on Twitter and we launched @yahoogeo on an unsuspecting world. GeoPlanet, launched at Where 2.0 in Burlingame in 2008 went from strength to strength and we helped host one of the London #geomob meetups.

In February we set out the “non golden rules of Geo“, which state

  1. Any attempt to codify a series of geo rules into a formal, one size fits all, taxonomy will fail due to Rule 2.
  2. Geo is bizarre, odd, eclectic and utterly human.
  3. People will in the main agree with Rule 1 with the exception of the rules governing their own region, area or country, which they will think are perfectly logical.
  4. People will, in the main, think that postal, administrative and colloquial hiearachies are one and the same thing and will overlap.
  5. Taking Rule 4 into account, they will then attempt to codify a one size fits all geo taxonomy.
  6. There is no Rule 6, see Rule 1.

Then in May we built upon the success of GeoPlanet and launched Placemaker and GeoPlanet Data at Where 2.0 in San Jose. A lot of people liked this.

In June we took our Open Location concept on the road and we turned up at a few conferences and talked for as long as we could get away with it on matters geo including WOEIDS, GeoPlanet, Fire Eagle and Placemaker. From London, to San Jose, by way of Palo Alto, Amsterdam, Southampton, Stratford-upon-Avon, Munich and Harrogate the geo message reached an amazing set of audiences in Where 2.0, WhereCamp, State of the Map, GeoCommunity, #geomob, Telematics, mashup* and the Association for Geographic Information, to name but a few. This took up almost the remainder of the year.

And finally in October, after a brief holiday, we brought GeoPlanet Data back online. A lot of people liked this.

So to 2010; what’s coming up next year. We’ve got some new products bubbling away which we hope you’ll like; all WOEID enabled of course. We’ll continue to be at some conferences and will be at Location Based Services Evolution 2010 in Berlin, Embedded Mobility 2010 in London, Where 2.0 in San Jose, Telematics in Detroit and State of the Map in Girona.

Have a geotastic Holiday season and a geotagged New Year.

Gary Gale, Director of Engineering, Yahoo! Geo Technologies

WOEIDS Are Trending On Twitter

Tuesday, November 10th, 2009

As we’ve mentioned before in this blog, the Geo Technologies group are aptly geographically disparate, split across Sunnyvale CA, London UK and Bangalore India. The London part of the group are responsible for the care, feeding and well being of our WOEIDs but while we were asleep last night in WOEID 44418 something big happened in WOEID 2487956.

Twitter are going to be using WOEIDs to help people track trending topics, in real time, by their location, so helping to answer the perennial question “what’s happening where I am?”. Twitter understands the power and flexibility that geotagging by WOEID yields:

We’re using Yahoo!’s Where on Earth IDs (WOEIDs) to name each location that we have information for — we’re doing so because those IDs give not only language-agnostic, but also permanent, stable, and unique identifiers for geographic locations

The WOEIDs returned by the Twitter API can be easily used with any other API which knows how to speak WOEIDs, such as Flickr, Fire Eagle and GeoPlanet, which in turn adds to success of our Open Location ethos.

You can read the full announcement post on the Twitter API Announcement list together with complemenatary coverage over on TechCrunch.

(WOEID 44418 is London and WOEID 2487956 is San Francisco by the way).

Gary Gale, Director of Engineering, Yahoo! Geo Technologies

A Tale of Two Cities

Thursday, August 28th, 2008

How else could one title this post? Here’s the story: Birmingham City Council in the UK recently sent out 720,000 leaflets advertising their services; the picture on the leaflet, however, depicts a different city with the same name: Birmingham, Alabama (US). Classic local authority snafu.

It’s clear that the individual charged with illustrating the pamphlet searched for ‘Birmingham’, found what looked like a nice skyline, and failed to fact-check. It is also possible that this individual never realized that there are multiple towns called ‘Birmingham’ (eighteen, actually). Whatever the cause, the story highlights some fundamental-but-oft-overlooked challenges in geoparsing that we embrace at Yahoo! Geo Technologies.

Geoparsing is of course the process of identifying places referenced in free- or unstructured text, and is the essential ingredient of any system where we want to geolocate content with machine analysis. The two steps of successful geoparsing are (1) token identification, and (2) geographic disambiguation. Let’s take a look at each briefly:

The first step in geoparsing is token identification: identifying place-names, such as ‘Wayne’ or ‘The Bay Area’, in unstructured content like newspaper articles or web pages, while ensuring at the same time that one does not falsely identify terms like ‘New England Clam Chowder’ as a place (a post on our fun with these potential false-positives will follow).

But token identification is the easy half of the battle; many entity-recognition applications, like the otherwise excellent OpenCalais are not capable of geotagging the above BBC article on ‘Birmingham’, for example, as it — correctly — identifies seven ‘Birminghams’, but does not tell us whether those referred to within are the UK city, one of its seventeen US namesakes, or a mix of both. (You can try this yourself with any text using the Calais Viewer.) Human cognition can certainly determine this with a quick read-through, but we’re looking at machine parsing specifically here.

To do this properly, we first require the means to refer to a place in a permanent, unambiguous, and machine-friendly manner: usually this is attempted by expanding the geographic context so that the token ‘Wayne’, when found in text, can be indexed as ‘Wayne, PA, USA’; this works sometimes but is hardly machine-friendly. (Furthermore, there are ten towns called ‘Wayne’ in Pennsylvania, so the above string gets us no closer to our goal.) In truth, string-based indexing will always have its exceptions, so we have opened GeoPlanet, our gazetteer of places and their unique Where-on-Earth Identifiers (WOEIDs), to provide the vocabulary to describe the world’s places without ambiguity.

So, now that we’ve found the correct tokens (‘Wayne’) in our hypothetical text, and dismissed misleading, place-sounding terms (‘Yorkshire Pudding’), we then determine which place, of all the places with that name, is specifically being referenced. This is geographic disambiguation (or geodisambiguation for the portmanteau-inclined). Let’s take for example ‘Rome’, of which there are over thirty: there is of course ‘the’ Rome, in Italy (WOEID: 721943), and for many of us, this is the only Rome we know. However, residents of Rome, Georgia (WOEID: 2484261) would argue otherwise. This highlights the problem: how can we be certain which place is being referenced when we have only ‘Rome’ in the text? Obviously the language helps in some instances, as does context (is ‘Georgia’ or ‘Italy’ mentioned elsewhere in the document?). But when geodisambiguating at Yahoo! (and this is the fun bit), we take into account the location of the user (or publisher) to capture the ‘locality’ of the term, and really put geography in the first-person. For example, although ‘Rome’ by itself will usually refer to ‘Rome, Italy’, the probability of its referring to ‘Rome, Georgia’ increases as we move geographically towards the latter. This approach ensures that Yahoo! returns the ‘correct’ city when a search for ‘Birmingham’ is performed in the UK, compared to the same search in the US. This approach ensures that content originating from Rome, Georgia will be geoparsed and disambiguated correctly to the correct and local ‘Rome’.

Acknowledging that geography is in the eye of the beholder is just one way that Yahoo! Geo Technologies provides our users with the most personally georelevant results. Shame Birmingham Council did not come to us first.

Tyler Bell, Advanced Products Manager, Yahoo! Geo Technologies