16.5 16.4 Encoding Personal Data

Consider the Social Security number 189-31-9431. The only information we can deduce about the holder of this number is that the person’s mailing address when he or she applied for the number was in Pennsylvania (Social Security numbers are assigned based on the ZIP code of the mailing address; see Spotlight 16.4). Figure 16.11 shows an Illinois driver’s license number: P142-4754-2173. What information about the holder can be deduced from this number? This time, we can determine the date of birth, sex, and much about the person’s name.

688

These two examples illustrate the extremes in coding personal data. The Social Security number has no personal data encoded in the number. It is entirely determined by the place and time that it is issued, not the individual to whom it is assigned. In contrast, in some states, driver’s license numbers are determined entirely by personal information about the holders. It is no coincidence that the unsophisticated Social Security numbering scheme predates computers. Agencies that have large databases that include personal information such as names, sex, and dates of birth find it convenient to encode these data into identification numbers. Examples of such agencies are the National Archives (where census records are kept), genealogical research centers, the Library of Congress, and state motor vehicle departments.

Ten Fun Facts about Social Security Numbers Spotlight 16.4

  1. The first Social Security record was established for John David Sweeney on December 1, 1936, with a Social Security number (SSN) of 055-09-0001. Sweeney died at the age of 61 without receiving any Social Security benefits.
  2. The lowest SSN, 001-01-0001, was given to Grace D. Owen, of Concord, New Hampshire, in 1936.
  3. From 1937 until 1940, Social Security paid benefits in the form of a single, lump-sum payment. The average lump-sum payment during this period was $58.06. The first recipient of Social Security benefits was Ernest Ackerman, who retired one day after the program began. He paid in 5 cents and received a lump-sum payment of 17 cents.
  4. Payment of monthly Social Security benefits began in 1940. Ida May Fuller was the first American to receive a monthly Social Security benefit check. She received the check, amounting to $22.54, on January 31, 1940. By the time of her death in 1975 at the age of 100, Fuller had collected $22,888.92 in monthly Social Security benefits, compared to her total contribution of $24.75 to the system.
  5. Over 40,000 people have claimed to own the SSN 078-05-1 120. In 1938, the wallet manufacturer E. H. Ferree Company included a sample Social Security card with the number 078-05-1 120 in each wallet.
  6. From the start of the program in 1936 until 2005, an estimated $8.9 trillion was paid out as Social Security benefits, while $10.7 trillion was collected as income.
  7. Since 1936, over 420 million different Social Security numbers have been issued. Over 5.5 million new numbers are assigned every year. As of June 30, 2013, 18% of the U.S. population were receiving monthly Social Security benefits.
  8. Prior to June 25, 201 1, the first three digits of Social Security numbers were assigned geographically by state, with the lowest numbers in the Northeast and the highest in the Northwest. Since then, a randomized assignment methodology has been used.
  9. Currently invalid Social Security numbers include numbers with sets of zeros (as in 000-xx-xxxx, xxx-0-xxxx, and xxx-xx-0000), numbers starting with 666, numbers from 987-65-4320 through 98765-4329, and numbers with the starting three digits above 770.
  10. Social Security numbers are not reassigned after people die.
image
Figure 16.11: Figure 16.11 Illinois driver’s license.

There are many methods in use to encode personal data such as name, sex, and date of birth. These methods are perhaps most widely used in assigning driver’s license numbers in some states. Coding license numbers solely from personal data enables automobile insurers, government entities, and law enforcement agencies to determine the number from the personal data. Many states encode the surname, first name, middle initial, date of birth, and sex by very sophisticated schemes.

689

In one scheme that is based on sound, the first four characters of the license number are obtained by applying the Soundex Coding System to the surname as follows:

  1. Delete all occurrences of h and w. (For example, Schworer becomes Scorer and Hughgill becomes uggill.)
  2. Assign numbers to the remaining letters as follows:

  3. If two or more letters with the same numeric value are adjacent, omit all but the first. (For example, Scorer becomes Sorer and uggill becomes ugil.)
  4. Delete the first character of the original name if still present (Sorer becomes orer).
  5. Delete all occurrences of , , , , , and .
  6. Retain only the first three digits corresponding to the remaining letters; append trailing 0s if fewer than three letters remain; precede the three digits obtained in Step 6 with the first letter of the surname.

Figure 16.12 shows three examples.

image
Figure 16.12: Figure 16.12 The Soundex Coding System.

What is the advantage of this method? It is an error-correcting scheme. Indeed, it is designed so that likely misspellings of a name nevertheless result in the correct coding of the name. For example, frequent misspellings of the name Erickson are Ericksen, Eriksen, Ericson, and Ericsen. Observe that all of these yield the same coding as Erickson. If a law enforcement official, a genealogical researcher, or a librarian wanted to pull up the file from a data bank for someone whose name was pronounced “Erickson,“ the correct spelling isn’t essential because the computer searches for records that are coded as E-625 for all spelling variations. The search feature of a website where many mathematicians post their research papers uses the Soundex Coding System. This system was designed for the U.S. Census Bureau when much census information was obtained orally (see Spotlight 16.5).

690

Census Records at the National Archives Spotlight 16.5

One of the best places to look for information pertaining to family history is the old censuses that are kept up by the National Archives in Washington, D.C. By law, census records are open to the public 72 years after the census was taken. The data from 1880, 1900, 1910, and 1920 censuses (records from 1890 were destroyed by fire) were put on cards during the 1930s as a Works Progress Administration (WPA) project. This information was coded using the Soundex system so that names that sound alike regardless of how they are spelled are grouped together. On old documents, family names were so often misspelled—especially those that were not of British origin—that genealogists say several variations of a name may apply to one set of ancestors. To look for a surname on the index, the researchers must work out the Soundex code. This code, together with the state record, identifies a page number on microfilm where the data are located.

image

A typical census Soundex card is shown. Note the Soundex code in the upper-left corner (B350).

Self Check 6

What is the Soundex code for the surname Jackman?

  • J255

There are many schemes for encoding the date of birth and the sex in driver’s license numbers. For example, the last five digits of Illinois and Florida driver’s license numbers capture the year and date of birth as well as the sex. In Illinois, each day of the year is assigned a three-digit number in sequence beginning with 001 for January 1. However, each month is assumed to have 31 days. Thus, March 1 is given the number 063 because both January and February are assumed to have 31 days. These numbers are then used to identify the month and day of birth of male drivers. For females, the scheme is identical except that 600 is added to the number. The last two digits of the year of birth, separated by a dash (probably to obscure the fact that they represent the year of birth), are listed in the fifth and fourth positions from the end of the driver’s license number. Thus, a male born on October 13, 1940, would have the last five digits , whereas a female born on the same day would have 4-0892.

The scheme to identify birth date and sex in Florida is the same as in Illinois except that each month is assumed to have 40 days and 500 is added for women. Moreover, a dash occurs between the two digits for the year and the three digits for the day. For example, the five digits 49-585 belong to a woman born on March 5, 1949.

In this chapter, we have investigated how mathematics is used to append a check digit to an identification number for error detection. In the next chapter, we will show how codes consisting of 0s and 1s can be devised so that errors can be corrected.