Permanent Links


What should be the topic for the next Impossibly Stupid poll?

A Town Square Poll Space

Tech Corner

See Also

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[TXT]README.html2014-11-29 11:06 4.3K 
[   ]info.json2014-12-09 01:41 40  
[   ]tags=diaf2015-03-26 18:29 0  

Wait a minute (of arc) Mister Postman

Isn't it about time we got rid of ZIP codes, and all locally-designated postal codes in general? Some of the work I've been doing recently involves GIS data, and the uniformity has been amazingly useful. Further, given the ubiquity of GPS devices these days, I think that people are ready for something that is more directly tied to the real world. Let's explore how impossibly stupid this might be.

In the USA, we use a 5 digit number that often is extended by 4 more digits (aka, ZIP+4) for reasons only the USPS cares about. That makes for, at most, a billion unique territories. The amount of area represented by the initial 5 digits varies, but in my local metro they seem to be about the size of a square arc minute. In decimal degrees, that would require upwards of 14 characters (-YY.yy,-XXX.xx). Not a win length-wise, but in the balance we get uniform global coverage.

Let's say we don't care about global coverage. A smaller representation is possible if we note that the continental United States latitudinal span is within 26 degrees (50 to 24 N) and the longitudinal span is within 60 degrees (125 to 66 W). That means the degree values could be encoded as A-Z (base 26) for latitude and 0-9A-Za-z (base 62) for longitude. If we then use the same range for minutes that we do for longitude, that allows us to compact the location down to 4 characters (YyXx). A mixed representation (YXyx) might be even more interesting, since it allows the precision to be easily adjusted by simply appending pairs of digits.

That seems a pretty arbitrary encoding to use for just part of the United States, so lets try to apply the same idea of encoding to reduce the length of standard geographical coordinates. The entire degree-space that needs to be covered represents 180 x 360 = 64800 distinct values. To represent that in 2 characters, we'd need at least a base 255 encoding; easy for computers, but not for humans. Even the base 62 representation used in the last paragraph is pushing it, because humans aren't that particular about uppercase and lowercase differences, nor are they universally able to tell the difference between 1 and l and I when written down without context. Base 30 is about the highest you should expect out of people with the regular alphanumeric character set, and maybe throw in a few punctuation marks to get up to 32 if you want to make computers happier. Going to 3 characters only allows for base 41, but the step up to 4 characters gets us to base 16 without that much waste, which is all kinds of awesome for both people and computers.

Unfortunately, just another character gets us back to something that can be represented in base 10, and in all cases the mixing of latitude and longitude into a single value would make it harder to figure out what the neighboring areas are. Plus we still need 100x100 more resolution in order to get down to the same detail that the current system handles. We've outsmarted ourselves into a swamp; time to go back to stupid.

If we can't exploit a compressed encoding, lets see how far we can get by looking at symmetry. Since we want to keep latitude and longitude split, we're back to the original 180 x 360 space. Even that is commonly further split from 90N to 90S (or 90,-90) and 180W to 180E (or -180,180). If we take 90 degrees as the base unit of symmetry, we have 3 combinations of division (north/south, east/west, far/near) that represent the global octants, all of which can be represented by a single base 8 character (probably best not alphanumeric). Within that area, we still have to represent at least 9000 points (hundredths of a degree) of accuracy. That would be exceeded by 3 times using 3 characters if we use the aforementioned base 30 (as opposed to 5 characters for 90.00).

Whew! So where does that leave us? A 7 character long world-wide postal code that is tied to geographical position with sub-kilometer accuracy. Seems like a pretty good scheme for all concerns, even without going into how the position is ordered and encoded. For example, New York City (at 40.716667, -74) could be represented as DH5^R20. If you need to see it in action to make more sense of it, let me know and I can create a reference implementation in JavaScript.