address matching another term for geocoding
geocoding the process of using the text of an address to plot a point at that location on a map
reference database the base network data used as a source for geocoding
Whenever you use a program like MapQuest or Google Maps to find a map of a location, you’re typing in something (like “1600 Pennsylvania Avenue, Washington, D.C.”) and somehow the Website translates this string of characters into a map of a spatial location (like the White House). The process of taking a bunch of numbers and letters and finding the corresponding location that matches up with them is called address matching or geocoding (Figure 8.4). Although the process seems instantaneous, there are several steps involved in geocoding that are happening “behind the scenes” when you use an address-matching system (like those in GIS).
First, you need to have some sort of reference database in place—this is a road network that the addresses will be matched to. A TIGER/Line file or similar type of street centerline file, or some other road-network data (many are produced commercially) is needed here. What’s essential is that the line segments contain attributes along the lines of those found in a TIGER/Line file—for example, street direction, name of the street, address ranges on the left and right sides of the street, street suffix, and zip codes on the left and right sides of the street. This information will be used as the source to match addresses to as well as a source for the final plotted map.
parsing breaking an address up into its component parts
address standardization setting up the components of an address in a regular format
Next, the address information is parsed into its component pieces, and address standardization is performed to set up data in a consistent format. The geocoding process needs to standardize addresses to properly match a location using its appropriate attributes in the reference database. Table 8.1 shows a number of locations in the Washington, D.C., area with their addresses, as well as these addresses parsed and standardized. For instance, in the National Gallery of Art’s address, the street name is “Constitution.” When the address matches, the system refers to line segments with a name attribute of “Constitution” and those segments with a street-type attribute of “AVE” (rather than ST, BLVD, LN, or any other) and a suffix direction attribute of NW (instead of some other direction). The street number, 401, is used to determine which road segments match an address range (on the left or right side of the street, depending on whether the number is odd or even).
Location | Address | Prefix | Number | Street name | Street type | Suffix |
---|---|---|---|---|---|---|
White House | 1600 Pennsylvania Avenue NW | 1600 | Pennsylvania | AVE | NW | |
National Gallery of Art | 401 Constitution Avenue NW | 401 | Constitution | AVE | NW | |
U.S. Capitol | 1 1st Street NE | 1 | 1st | ST | NE | |
Office of the Federal Register | 800 North Capitol Street NW | N | 800 | Capitol | ST | NW |
Washington National Cathedral | 3101 Wisconsin Avenue NW | 3101 | Wisconsin | AVE | NW | |
United States Holocaust Memorial Museum | 100 Raoul Wallenberg Place SW | 100 | Raoul Wallenberg | PL | SW |
251
252
linear interpolation a method used in geocoding to place an address location among a range of addresses along a segment
After the address has been parsed and standardized, the matching takes place. The geocoding system will find the line segments in the reference database that are the best match to the component pieces of the address and (in ArcGIS) rank them. For instance, in trying to address match the National Gallery of Art, a line segment with attributes of Name = “Constitution,” Type = “AVE,” Suffix Direction = “NW,” and an address range on the left side of “401–451” would likely be the best (or top-ranked) match. A point corresponding with this line segment is placed at the approximate location along the line segment to match the street number. For instance, our address of 401 would have a point placed near the start of the segment, while an address of 425 would be placed close to the middle. The method used to plot a point at its approximate distance along the segment is called linear interpolation.
Keep in mind that the plotting is an approximation of where a specific point should be. For instance, if a road segment for “Smith Street” has an address range of 302 through 318, an address of 308 will be placed near the middle. However, if the actual real-world location of house number 308 is closer to the end of the street, then the placement of the plotted point will not necessarily match up with the actual location of the house. Where streets contain only a handful of houses that correspond with the address range in the reference file, plotted locations may be estimated incorrectly. See Figure 8.5 for an example of plotting a geocoded point on a road network in GIS.
batch geocoding matching a group of addresses together at once
In GIS, or in a vehicle navigation system, or in a smartphone equipped with geospatial technology, whenever you specify an address, the system will match the address and fix it as a destination point to be found. The process of geocoding multiple addresses at once is referred to as batch geocoding. For instance, in batch geocoding, you can have a list of the addresses of all the coffee shops in Seattle, and the GIS will match every address on the list, so you won’t have to input the addresses one at a time (for an example, see Hands-on Application 8.2: Geocoding Using Online Resources). If no match can be found, or if the ranking is so poor as to be below a certain threshold for a match, sometimes a point will not be matched at all for that address (or it may be matched incorrectly, or placed at something like the center of a zip code). You will sometimes be prompted to recheck the address, or to try to match the address interactively by going through the process manually.
253
Geocoding Using Online Resources
Many online mapping resources (see Hands-on Application 8.4: Online Mapping and Routing Applications and Shortest Paths for details on what they are and how they’re used) will geocode a single address, or a pair of addresses (so that you can calculate directions between the two). You can use GIS software to geocode multiple addresses, or you can use another resource, like the BatchGeo Website. Open your Web browser and go to http://www.batchgeo.com—this is a free online service that allows you to geocode an address (or multiple addresses in batches), and then view the plotted point (or points) on a map. Set up a batch of three or more addresses (such as home addresses for a group of family members, a group of friends, or several workplaces) and run the batch geocode utility. The Website will give you examples of how the addresses need to be formatted. From there, you can view your results on the Google Map that the Website generates.
Expansion Questions:
Finally, when an address is plotted, the system may have the capability to calculate the x and y coordinates of that point and return those values to the user. For example, the street address of the Empire State Building in New York City is “350 5th Ave., New York, NY 10018.” From address matching, using the free online gpsvisualizer.com utility (also used in this chapter’s lab), the GIS coordinates that fit that address are computed to be: latitude 40.74827 and longitude –73.98531 (Figure 8.6). Thus, geocoded points have spatial reference attached to them for further use as a geospatial dataset.
Geocoding is a simple yet powerful process, but it’s not infallible. There are several potential sources of error that can give an incorrect result and plot an address in an incorrect location. Since the addresses you’re entering will be parsed to match up with the segments in the reference database, the result can sometimes be an error in matching. For instance, the address of the White House in Table 8.1 is listed as “1600 Pennsylvania Avenue NW.” The line segments that match up with this street should have the name listed with “Avenue” (or “Ave.”) as a suffix. If you input something different, like “1600 Pennsylvania Street” or “1600 Penn Avenue,” the location could potentially receive a lower ranking, end up plotted somewhere else, or otherwise fail to get properly matched. With more complete information (like “1600 Pennsylvania Avenue NW, Washington, D.C., 20500”), the system will be better able to identify an accurate match.
254
It should also be kept in mind that the geocoding system can only properly identify locations if the line segments are in the reference database. If you’re searching for an address established after the reference database network file was put together, the system will not be able to match the address properly. If the point is plotted on the correct street but at an incorrect location on the street, it’s likely because of a problem with address range data in the reference database and how it reflects the real world (for instance, a house at #50 Smith Street is not necessarily halfway down a street segment that begins with 2 and ends with 100). The geocoding will usually be only as accurate as the base data it’s being matched to. If the reference database doesn’t contain all line segments (for instance, if it’s missing subdivisions, streets, or new freeway bypasses that haven’t yet been mapped and added), or if its attributes contain inaccurate address ranges or other incorrect information, or if any of its attributes are missing, the geocoding process will probably be unable to match the addresses, or they may plot the addresses in the wrong location.
New methods for geocoding addresses to get a more accurate match have been developed. Rather than using a line segment and interpolating the address location, point databases are being created in which a point represents the center of a parcel of land (for instance, a house or a commercial property). When geocoding with this point data, address information can be matched to the point representing the parcel, and the address location can be found for the road immediately next to the parcel.
255
Once locations are geocoded, the system (or GIS) can begin to examine the routes between locations to determine the shortest path from one location to another. With a vehicle navigation system, you enter the address of the destination you want to travel to, and the system will match that address. The device’s current position is determined using GPS and plotted on the network map—this will be the origin. The system will then compute the shortest route between the origin and destination across the network. The same holds true for an online system to find directions—it has a matched origin and destination, and it will compute what it considers the best route for you to follow between the two points. With so many different ways to get from the origin to the destination, the system now needs a way to determine the “shortest path” between these locations.