I recently tweaked my old Chaos Game script to support larger canvas sizes (go ahead and open its frame in another tab/window to see), and that got me thinking back to another old post about replacing ZIP codes. It's time to revisit that impossibly stupid idea. So let's jump right into it with a reference implementation to make an example location code from a latitude/longitude position.
Of course, you may need to go in the other direction, too. For example, the old NYC location (40.716667, -74) is now
Q7N-9DH in this new scheme. Or maybe you're currently more interested in the Rio de Janeiro area (-22.90277778, -43.2075), which has a location code of
To explain what's happening here, I'm going to compare and contrast my approach with a similar scheme for encoding a location: an OpenStreetMap Shortlink. The main difference to note as we go forward is that OpenStreetMap's main intent is to create a shorter URL for computers to share, while my intent is to create something that is easier for people to share.
As noted in my ZIP code article, there are some basic symmetries (north/south, east/west, far/near) that are obvious starting points for dividing global coordinates into 2 parts. But as the Chaos Game shows, there's no reason to stop after the first 3 divisions, because you can keep halving the space that way and eventually cover all points on the map. OpenStreetMap properly calls it a quadtree because it is used for only two dimensions here, but it's a useful approach for partitioning any number of dimensions.
So my basic approach in doing that is the same as OpenStreetMap. We both fundamentally have the same starting data type for a location along each latitude/longitude dimension (-90 to 90 degrees and -180 to 180 degrees respectively): a sequence of bits that tell you if the desired point is less than or greater than the midpoint of a range along that dimension, done recursively until an arbitrary precision is reached. It is essentially a binary search for the location.
The OpenStreetMap resolution in each dimension is computed using 32-bit values. While this is fine (and very fast) for the specific purposes of mapping, it is not very good for arbitrary uses of binary partitioning nor is it very good for how humans might want to use it. For example, I live very near the 45th parallel in the Twin Cities, and that can be encoded as a single bit (i.e., test 45 against range -90 to 90 at 0 and we get
1 (45 being larger than 0), then range 0 to 90 at 45 and we're done (exact value matched). Similarly, there may be some applications that demand a greater precision than 32 bits. So to keep it flexible, I use bit strings to allow for as few or as many characters as are appropriate for whatever my needs are.
Of course, ZIP codes cover fairly large areas, so our need for precision here is relatively low! To get down to the desired hundredths of a degree of accuracy, you need at least 14 bits (180 / 214 = 0.010986328125). Note that you actually need one more bit for longitude than for latitude to get the same precision because it covers twice the range of degrees.
Once you have all these bits, you have to decide if you want to do anything special with them. For example, my ZIP code article discussed the possibility of interleaving the output values so that precision could be increased in both dimensions just by appending two more characters. OpenStreetMap takes that idea to the extreme and interleaves all the input bits themselves to create a Morton code.
Unfortunately, this is another idea that's great for computers using quadtrees, but terrible for human usability. There's a reason we don't already mix the numbers in latitude and longitude values. People tend to prefer dealing with each dimension independently. So I again sacrifice a small amount of computational efficiency to use something humans can more easily think about.
By keeping two values instead of creating a single combined value, we also get the side benefit of not needing to use the same precision for both. So for the Twin Cities example, the latitude (about 45) has the smallest binary partition of
1, but a suitable longitude (say -93.1640625) would have the smallest binary partition of
001111011. Unlike the OpenStreetMap method, there's no need to figure out how to mix the two at the representation stage.
As noted in the ZIP code article, base 32 encoding is really what I would put as the limit for what people can easily exchange in casual conversation. In particular, I've gone with Douglas Crockford's Base32 Encoding here. OpenStreetMap uses base 64 encoding, which is very common for computing purposes, but again not so great for human purposes. This is especially true when it comes to encoding a very small amount of data like 64-bits, since a base 32 encoding would be 13 characters long, yet the base 64 encoding will be still be 11 characters long and it gives up a lot of usability to shave off just those 2 characters.
A very nice coincidence for us is also the fact that we're looking to encode two ~15-bit numbers, and at 5 bits per character that makes base 32 a very nice fit for us, needing just 6 characters total to represent the values we're dealing with. That's global coverage for just 1 character more than a current ZIP code uses, so you could even add on 2 more characters to get even more accurate locations (within meters) for fewer characters than are in a ZIP+4 code.
But as we've seen, not all binary partitions are multiples of 5 bits. To solve this problem we use stop bits (
0 for sequences ending in
1 for sequences ending in
0) to pad our value to length. So, using the Twin Cities example again, latitude is represented as
1, and padding it out to 5 bits gives us
10000, which encodes to a
G. Likewise, longitude as
001111011 gets padded for encoding to
00111 10110 which is
P respectively. To add readability, we separate the latitude and longitude values with a dash, giving us the general location of the Twin Cities as
G-7P. That's even shorter and easier to communicate than the T~f1VV OpenStreetMap might give you.
Another interesting location to look at is the “origin” at (0, 0). As a location code it is simply “
-”, because 0 is the place where the search starts in both dimensions, so you don't even need a single bit to represent it, just give the dash! Very convenient. Contrast that to OpenStreetMap, which would encode (0, 0) at a low zoom level as wAAA (which you must also note is different from WAAA or Waaa or waaa or any number of other possible case-sensitive variations).
And there we have it. A location code that could globally replace postal codes by using just 6 characters (and optionally one separator character) to give nearly kilometer-level accuracy. And it's expandable to as many characters as you want if you need even greater precision. I'm at
QZP-7PN (within 10 blocks). Generate one for where you live and leave a comment if you happen to have an extra-cool location code (e.g.,
goo-gle must be near someone).
The foundation of a location code is also ripe to be used with other encodings, too. I'm already considering another follow-up article that uses a 4096 word dictionary to output a “location phrase” similar to an xkcd password. Stay tuned for that update.