Consider the Social Security number 189-31-9431. The only information we can deduce about the holder of this number is that the person’s mailing address when he or she applied for the number was in Pennsylvania (Social Security numbers are assigned based on the ZIP code of the mailing address; see Spotlight 16.4). Figure 16.11 shows an Illinois driver’s license number: P142-4754-2173. What information about the holder can be deduced from this number? This time, we can determine the date of birth, sex, and much about the person’s name.
688
These two examples illustrate the extremes in coding personal data. The Social Security number has no personal data encoded in the number. It is entirely determined by the place and time that it is issued, not the individual to whom it is assigned. In contrast, in some states, driver’s license numbers are determined entirely by personal information about the holders. It is no coincidence that the unsophisticated Social Security numbering scheme predates computers. Agencies that have large databases that include personal information such as names, sex, and dates of birth find it convenient to encode these data into identification numbers. Examples of such agencies are the National Archives (where census records are kept), genealogical research centers, the Library of Congress, and state motor vehicle departments.
Ten Fun Facts about Social Security Numbers Spotlight 16.4
There are many methods in use to encode personal data such as name, sex, and date of birth. These methods are perhaps most widely used in assigning driver’s license numbers in some states. Coding license numbers solely from personal data enables automobile insurers, government entities, and law enforcement agencies to determine the number from the personal data. Many states encode the surname, first name, middle initial, date of birth, and sex by very sophisticated schemes.
689
In one scheme that is based on sound, the first four characters of the license number are obtained by applying the Soundex Coding System to the surname as follows:
Assign numbers to the remaining letters as follows:
Figure 16.12 shows three examples.
What is the advantage of this method? It is an error-correcting scheme. Indeed, it is designed so that likely misspellings of a name nevertheless result in the correct coding of the name. For example, frequent misspellings of the name Erickson are Ericksen, Eriksen, Ericson, and Ericsen. Observe that all of these yield the same coding as Erickson. If a law enforcement official, a genealogical researcher, or a librarian wanted to pull up the file from a data bank for someone whose name was pronounced “Erickson,“ the correct spelling isn’t essential because the computer searches for records that are coded as E-625 for all spelling variations. The search feature of a website where many mathematicians post their research papers uses the Soundex Coding System. This system was designed for the U.S. Census Bureau when much census information was obtained orally (see Spotlight 16.5).
690
Census Records at the National Archives Spotlight 16.5
One of the best places to look for information pertaining to family history is the old censuses that are kept up by the National Archives in Washington, D.C. By law, census records are open to the public 72 years after the census was taken. The data from 1880, 1900, 1910, and 1920 censuses (records from 1890 were destroyed by fire) were put on cards during the 1930s as a Works Progress Administration (WPA) project. This information was coded using the Soundex system so that names that sound alike regardless of how they are spelled are grouped together. On old documents, family names were so often misspelled—especially those that were not of British origin—that genealogists say several variations of a name may apply to one set of ancestors. To look for a surname on the index, the researchers must work out the Soundex code. This code, together with the state record, identifies a page number on microfilm where the data are located.
A typical census Soundex card is shown. Note the Soundex code in the upper-left corner (B350).
What is the Soundex code for the surname Jackman?
There are many schemes for encoding the date of birth and the sex in driver’s license numbers. For example, the last five digits of Illinois and Florida driver’s license numbers capture the year and date of birth as well as the sex. In Illinois, each day of the year is assigned a three-digit number in sequence beginning with 001 for January 1. However, each month is assumed to have 31 days. Thus, March 1 is given the number 063 because both January and February are assumed to have 31 days. These numbers are then used to identify the month and day of birth of male drivers. For females, the scheme is identical except that 600 is added to the number. The last two digits of the year of birth, separated by a dash (probably to obscure the fact that they represent the year of birth), are listed in the fifth and fourth positions from the end of the driver’s license number. Thus, a male born on October 13, 1940, would have the last five digits , whereas a female born on the same day would have 4-0892.
The scheme to identify birth date and sex in Florida is the same as in Illinois except that each month is assumed to have 40 days and 500 is added for women. Moreover, a dash occurs between the two digits for the year and the three digits for the day. For example, the five digits 49-585 belong to a woman born on March 5, 1949.
In this chapter, we have investigated how mathematics is used to append a check digit to an identification number for error detection. In the next chapter, we will show how codes consisting of 0s and 1s can be devised so that errors can be corrected.