Confidentiality

Ethical problems do not disappear once a study has been cleared by the review board, has obtained consent from its subjects, and has actually collected data about the subjects. It is important to protect the subjects’ privacy by keeping all data about individuals confidential. The report of an opinion poll may say what percentage of the 1500 respondents felt that legal immigration should be reduced. It may not report what you said about this or any other issue.

image

Confidentiality is not the same as anonymity. Anonymity means that subjects are anonymous—their names are not known even to the director of the study. It is not possible to determine which subject produced which data. Anonymity is rare in statistical studies. Even where anonymity is possible (mainly in surveys conducted by mail), it prevents any follow-up to improve nonresponse or inform subjects of results.

146

Any breach of confidentiality is a serious violation of data ethics. The best practice is to separate the identity of the subjects from the rest of the data at once. Sample surveys, for example, use the identification only to check on who did or did not respond. In an era of advanced technology, however, it is no longer enough to be sure that each individual set of data protects people’s privacy. The U.S. government, for example, maintains a vast amount of information about citizens in many separate databases—census responses, tax returns, Social Security information, data from surveys such as the Current Population Survey, and so on. Many of these databases can be searched by computers for statistical studies. A clever computer search of several databases might be able, by combining information, to identify you and learn a great deal about you, even if your name and other identification have been removed from the data available for search. A colleague from Germany once remarked that “female full professor of statistics with a PhD from the United States” was enough to identify her among all the 83 million residents of Germany. Privacy and confidentiality of data are hot issues among statisticians in the computer age. Computer hacking and thefts of laptops containing data add to the difficulties. Is it even possible to guarantee confidentiality of data stored in databases that can be hacked or stolen? Figure 7.1 displays the Internet privacy policy that appears on the Social Security website.

EXAMPLE 3 Use of government databases

Citizens are required to give information to the government. Think of tax returns and Social Security contributions, for example, in the United States. The government needs these data for administrative purposes—to see if we paid the right amount of tax and how large a Social Security benefit we are owed when we retire. Some people feel that individuals should be able to forbid any other use of their data, even with all identification removed. This would prevent using government records to study, say, the ages, incomes, and household sizes of Social Security recipients. Such a study could well be vital to debates on reforming Social Security.

NOW IT’S YOUR TURN

Question 7.3

7.3 Anonymous or confidential? A website describes one of its procedures for HIV testing as completely private. Your results are delivered to you and no one else—nothing is reported to your insurance or placed on your medical records. Does this practice offer anonymity or confidentiality?

147

image
Figure 7.1: Figure 7.1 The privacy policy of the government’s Social Security Administration website. (Source: Social Security Administration.)