The GeoPlanet Data download contains places, a lot of places; almost 5.5M of them if you’re of a mind to count them all. Then there’s the adjacencies, or neighbours, there’s over 8.5M of them and the aliases, there’s almost 2M of them.
That’s a reasonable amount of data.
Naturally we do our best to ensure that it’s fully QA’d before we release it and that it’s as error free as is possible. But sometimes errors, minor niggles and other pesky data critters slip through.
The first critter is some duplicated WOEIDs; these were spotted by GJ (Zorgspliff). A small set of Indian postcode WOEIDs were duplicated and ended up with MSDOS line endings. This was due to a back-end processing error which categorised these postcodes as both current and historical.
The next critter is less a critter and more a need for clarity. Each WOEID has a placetype and Alison Wheeler (AlisonW) commented that some of the placetypes appeared to be duplicated, such as “Street“, which looks like it has placetype 4 and placetype 6. Actually, there’s two different sorts of “Street” placetype, which you can see clearly if you look at the long form of the placetype display on GeoPlanet:
http://where.yahooapis.com/v1/placetypes?select=long&appid=
Placetypes 1 through 5 can be considered for future use; we’re not currently using them and you won’t come across them in the GeoPlanet Data.
The current, v7.4, release of GeoPlanet Data has two of these data critters; naturally we’ve fixed them for the next release but we wanted to point them out to you rather than let you find them for yourself.
Gary Gale, Director of Engineering, Yahoo! Geo Technologies
Tags: data, duplicates, GeoPlanet, placetypes

Hi Gary – I’m wondering who I should speak to in London at Yahoo! about a start-up that we operate to do with geo based augmented reality/retail shopping. We’re interested in possible deals with Yahoo. Thanks, Peter
Hi Peter — thanks for getting in touch. I’ve dropped you a mail.