Posts Tagged ‘duplicates’

Of Duplicates, Line Endings, PlaceTypes and Other Data Critters

Thursday, November 26th, 2009

The GeoPlanet Data download contains places, a lot of places; almost 5.5M of them if you’re of a mind to count them all. Then there’s the adjacencies, or neighbours, there’s over 8.5M of them and the aliases, there’s almost 2M of them.

That’s a reasonable amount of data.

Naturally we do our best to ensure that it’s fully QA’d before we release it and that it’s as error free as is possible. But sometimes errors, minor niggles and other pesky data critters slip through.

i iz in ur GeoPlanet Data  messin with ur WOEIDz

The first critter is some duplicated WOEIDs; these were spotted by GJ (Zorgspliff). A small set of Indian postcode WOEIDs were duplicated and ended up with MSDOS line endings. This was due to a back-end processing error which categorised these postcodes as both current and historical.

The next critter is less a critter and more a need for clarity. Each WOEID has a placetype and Alison Wheeler (AlisonW) commented that some of the placetypes appeared to be duplicated, such as “Street“, which looks like it has placetype 4 and placetype 6. Actually, there’s two different sorts of “Street” placetype, which you can see clearly if you look at the long form of the placetype display on GeoPlanet:

http://where.yahooapis.com/v1/placetypes?select=long&appid=

Placetypes 1 through 5 can be considered for future use; we’re not currently using them and you won’t come across them in the GeoPlanet Data.

The current, v7.4, release of GeoPlanet Data has two of these data critters; naturally we’ve fixed them for the next release but we wanted to point them out to you rather than let you find them for yourself.

Gary Gale, Director of Engineering, Yahoo! Geo Technologies