| Preface | 5 |
|---|
| Contents | 9 |
|---|
| 1 Why Statistical Confidentiality? | 13 |
|---|
| 1.1 What Is Statistical Confidentiality? | 14 |
| 1.2 Stakeholders in the Statistical Process | 15 |
| 1.3 The Data Stewardship Organization's Dilemma | 15 |
| 1.4 The Value of Statistical Data | 18 |
| 1.5 Why Are DSOs Concerned About Statistical Confidentiality? | 20 |
| 1.5.1 A Difficult Context for a DSO | 20 |
| 1.5.1.1 Privacy Worries | 21 |
| 1.5.1.2 Confidentiality Concerns | 21 |
| 1.5.1.3 Changing Legal and Social Context | 22 |
| 1.5.1.4 Sensitivity to Social Impact—''Group Harm'' | 22 |
| 1.5.2 Providing Data and Protecting Confidentiality | 23 |
| 1.5.3 Consequences of a Confidentiality Breach | 24 |
| 1.5.4 What Motivates a DSO to Provide Confidentiality? | 25 |
| 1.5.4.1 Legal Requirements and Fair Information Practices | 25 |
| 1.5.4.2 Pragmatic Considerations | 28 |
| 1.5.4.3 Ethical Obligations | 29 |
| 1.6 High-Quality Statistical Data Raise Confidentiality Concerns | 30 |
| 1.6.1 Characteristics of High-Quality Statistical Data | 30 |
| 1.6.2 Disclosure Risk Problems Stemming from Characteristics of High-Quality Statistical Data | 33 |
| 1.7 Disclosure Risk and the Concept of the Data Snooper | 34 |
| 1.8 Strategies of Statistical Disclosure Limitation | 35 |
| 1.8.1 Restricted Access | 35 |
| 1.8.2 Restricted Data | 36 |
| 1.9 Summary | 36 |
| 2 Concepts of Statistical Disclosure Limitation | 39 |
|---|
| 2.1 Conceptual Models of Disclosure Risk | 39 |
| 2.1.1 Elements of the Disclosure Risk Problem | 41 |
| 2.1.1.1 Microdata | 41 |
| 2.1.1.2 Deliberate Linkage | 42 |
| 2.1.1.3 Aggregate Data | 43 |
| 2.1.1.4 Attribution and Subtractive Attack | 43 |
| 2.1.1.5 Linking Tables | 45 |
| 2.1.1.6 Hierarchical Tables | 46 |
| 2.1.1.7 Linking Anonymized Data Sets | 47 |
| 2.1.1.8 Spontaneous Recognition | 47 |
| 2.1.2 Perceived and Actual Risk | 47 |
| 2.1.3 Scenarios of Disclosure | 48 |
| 2.1.3.1 Motivation | 48 |
| 2.1.3.2 Means | 49 |
| 2.1.3.3 Opportunity | 49 |
| 2.1.3.4 Types of Attacks | 50 |
| 2.1.3.5 Key Variables | 51 |
| 2.1.3.6 Target Variables | 51 |
| 2.1.3.7 Effect of Data Divergence | 51 |
| 2.1.3.8 Likelihood of Success | 52 |
| 2.1.4 Data Environment Analysis | 54 |
| 2.2 Assessing the Risk | 54 |
| 2.2.1 Uniqueness | 54 |
| 2.2.2 Matching/Reidentification Experiments | 55 |
| 2.2.3 Disclosure Risk Assessment for Aggregate Data | 55 |
| 2.3 Controlling the Risk | 56 |
| 2.3.1 Metadata Level Controls | 56 |
| 2.3.2 Distorting the Data | 57 |
| 2.3.3 Controlling Access | 57 |
| 2.4 Data Utility Impact | 58 |
| 2.5 Summary | 59 |
| 3 Assessment of Disclosure Risk | 60 |
|---|
| 3.1 Thresholds and Other Proxies | 61 |
| 3.2 Risk Assessment for Microdata: Types of Matching | 62 |
| 3.2.1 File-Level Risk Metrics | 62 |
| 3.2.1.1 Population Uniqueness | 62 |
| 3.2.1.2 The Proportion of Sample Uniques that are Population Unique | 63 |
| 3.2.1.3 The Skinner and Elliot Method | 63 |
| 3.2.2 Record-Level Risk Metrics | 65 |
| 3.2.2.1 Probability Modeling Approaches | 65 |
| 3.2.2.2 Special Uniqueness | 66 |
| 3.3 Record Linkage Studies | 67 |
| 3.3.1 Using an External Data Set | 68 |
| 3.3.2 Using the Pre-SDL Data Set | 69 |
| 3.3.2.1 Distance-Based Record Linkage | 69 |
| 3.3.2.2 Probabilistic Record Linkage | 70 |
| 3.4 Risk Assessment for Count Data | 71 |
| 3.5 What is at Risk?: Understanding Sensitivity | 73 |
| 3.6 Summary | 74 |
| 4 Protecting Tabular Data | 76 |
|---|
| 4.1 Basic Concepts | 78 |
| 4.1.1 Structure of a Tabular Array | 78 |
| 4.1.2 Risky Cells | 81 |
| 4.1.2.1 Dominance Rule or (n, k)-Rule | 81 |
| 4.1.2.2 Prior/Posterior Ambiguity Rule | 81 |
| 4.1.2.3 n-Rule | 82 |
| 4.1.3 The Secondary Problem: The Data Snooper's Knowledge | 82 |
| 4.1.3.1 A Priori Knowledge | 82 |
| 4.1.3.2 The Output Pattern | 83 |
| 4.1.4 Disclosure Limitation | 86 |
| 4.1.5 Loss of Information | 87 |
| 4.1.6 The DSO's Problem | 87 |
| 4.1.7 Disclosure Auditing | 88 |
| 4.2 Four Methods to Protect Tables | 88 |
| 4.2.1 Cell Suppression | 89 |
| 4.2.2 Interval Publication | 92 |
| 4.2.3 Controlled Rounding | 93 |
| 4.2.4 Cell Perturbation | 96 |
| 4.2.5 All-in-One Method | 97 |
| 4.3 Other Methods | 97 |
| 4.3.1 Table Redesign | 98 |
| 4.3.2 Introducing Noise to Microdata | 98 |
| 4.3.3 Data Swapping | 99 |
| 4.3.4 Cyclic Perturbation | 99 |
| 4.3.5 Random Rounding | 100 |
| 4.3.6 Controlled Tabular Adjustment | 101 |
| 4.4 Summary | 103 |
| 5 Providing and Protecting Microdata | 104 |
|---|
| 5.1 Why Provide Access? | 106 |
| 5.2 Confidentiality Concerns | 110 |
| 5.3 Why Protect Microdata? | 114 |
| 5.4 Restricted Data | 116 |
| 5.4.1 In Order to Limit Disclosure, What Shall We Mask? | 119 |
| 5.5 Matrix Masking | 120 |
| 5.6 Masking Through Suppression | 121 |
| 5.7 Local S
|