Skip to end of metadata
Go to start of metadata

In addition to perturbing counts, it is important to ensure that your summation options (measures) do not allow individuals to be identified. This is particularly important if you have "sensitive" measures, where outlying values in the data might allow identification of specific individuals. One example might be salary information, where the CEO can potentially be identified because this individual's salary is significantly higher than that of any other employee.

Both SuperWEB2 and SuperCROSS allow users to create ranges and quantiles from your summation options:

In SuperWEB2, users can access the median for a summation option if you have enabled this using the cat addstatfunction command in SuperADMIN.

In addition, they can create ranges and quantiles using the Range button.

 
 

In SuperCROSS, users can create User Defined Fields for ranges and quantiles.

They can also access the median and various percentiles from the Define Recode window.

  

When you have sensitive measures, it is important to ensure that the quantile and range options do not allow individuals to be identified.

When you activate perturbation, quantile perturbation is also activated automatically, but it is not configured by default. This means that unless you add the configuration, all quantiles will automatically be disabled.

If you do not want to use quantile perturbation on your system (i.e., you want to allow users to create all quantiles), follow the steps in the "Disable Quantile Perturbation" section below.

Quantile Perturbation

Quantile perturbation adjusts the sizes of each quantile to avoid revealing sensitive information:

  1. SuperSERVER first calculates the fractional quantile or percentile position (i.e., a number between 0 and 1). For a median there are two quantile groups, so this number would be 0.5
  2. It then perturbs this number before working out the quantile boundary of the percentile. So it effectively works out another percentile:

    p' = p + perturbation_factor/number_of_contributors

The perturbation factor is used to determine how many values to move forwards or backwards from the original boundary of the percentile. For example, if the population is 1,000 and we want the median value (two quantile ranges), then without perturbation the median would be the 500th (or 501st) value. If the perturbation factor is -5, then the fraction would be adjusted down to 0.495, and SuperSERVER would return the 495th value as the median instead.

Configure Quantile Perturbation

When perturbation is configured, quantile perturbation is switched on automatically. To configure the settings, you need three quantile perturbation configuration files for each SXV4.

By default, these must be located in the same directory as the SXV4 file, although you can configure an alternative location for the files if necessary. See below for details.

<sxv4_filename>.sxv4.quantile_validation.csv

This file determines how quantile ranges and percentile summations can be used, and sets the minimum values for allowing the quantile with or without perturbation.

It is in CSV format, and contains the following columns:

Number of RangesThe number of ranges in the quantile. Users will not be able to generate quantiles unless they are listed in this column.
Minimum number of cellsThe minimum number of records required to allow this quantile to be generated.
Perturbation ThresholdThe threshold above which no perturbation will be applied. If the number of contributing records is above this value then there will be no quantile perturbation. To have no perturbation at all, set this value to be the same as the previous column.
Description and Comments(Optional). This column allows you to add comments to the file. The comments are not shown to users.

The following example allows only quantiles with 2, 4, 5 and 10 ranges, and sets the minimum numbers of cells and the threshold value for each:

Number of Ranges, Minimum number of cells, Perturbation Threshold, Description and Comments
2,500,50000, "Two ranges. Also for median"
4,10,100000
5,100,100000
10,1000,100000

<sxv4_filename>.sxv4.quantile_perturbation.csv

This file contains the perturbation factors, which can either be positive, zero or negative values. It is in CSV format with 128 rows and 100 columns.

Each column is for a successive percentile, so the first column is for the 1st percentile, and the 50th column is for the 50th percentile (i.e. the median).

For example:

3,-4,-2,3,4, ...
-1,2,-2,0,-4,
-4,4,-1,-3,-3,
3,0,-2,1,2,2,
-4,4,2,3,4,3,
-2,2,0,-2,-4,-4,
...

<sxv4_filename>.sxv4.quantile_config.properties

This file provides other configuration settings for quantile perturbation. It currently contains one setting:

RSEPerturbationFactor

This setting accounts for the effect on RSE (Relative Standard Error) for surveys that are configured with weighting.

It adjusts the jackknife variance used to calculated the RSE:

Adjusted variance = variance + RSEPerturbationFactor / number of contributors ^ 2

If you are not using weighted surveys (or you do not want to use quantile perturbation), set the RSEPerturbationFactor property to 0.

For example:

RSEPerturbationFactor = 0

Configure the Location of the Quantile Perturbation Files

By default, SuperSERVER expects the three quantile perturbation configuration files to be located in the same directory as the SXV4 file. If you wish, you can configure an alternative location for the files using the following module properties:

QUANTILEVALIDATION
quantile_validation.csv
QUANTILEPTABLE

quantile_perturbation.csv

QUANTILECONFIG
quantile_config.properties

To set the location, specify the full path and filename of the configuration file you want to use. Any backslashes in the path will need to be escaped with an additional backslash (forward slashes can also be used but do not need to be escaped).

For example:

method perturbation_method perturbation addproperty QUANTILEVALIDATION "C:\\my\\path\\my.sxv4.quantile_validation.csv"
method perturbation_method perturbation addproperty QUANTILEPTABLE "C:\\my\\path\\my.sxv4.quantile_perturbation.csv"
method perturbation_method perturbation addproperty QUANTILECONFIG "C:\\my\\path\\my.sxv4.quantile_config.properties"

Disable Quantile Perturbation

Quantile perturbation is enabled automatically when you activate perturbation. However, none of the above configuration files are created by default, so this means that quantiles will initially be automatically disabled for all of your SXV4s unless you create and add the three configuration files.

If you do not want to apply quantile perturbation for some or all of your SXV4s, then you need to add the three configuration files for each of your SXV4s, as follows:

To help you with this configuration, we have provided examples of the files as they need to be set up to disable quantile perturbation. Click the links below to download these examples, then simply make as many copies as you need, rename them so they include the name of your SXV4 in the filename, and copy to the same directory as the SXV4 file(s).

<sxv4_filename>.sxv4.quantile_validation.csv

To disable quantile perturbation, ensure this file has contents similar to the following. This example allows all quantiles from 2 to 10 ranges, and sets the minimum number of cells and the threshold to 1 in all cases, therefore no perturbation will be applied:

Number of Ranges, Minimum number of cells, Perturbation Threshold, Description and Comments
2,1,1,"Median"
3,1,1,"3 ranges"
4,1,1,"4 ranges"
5,1,1,"5 ranges"
6,1,1,"6 ranges"
7,1,1,"7 ranges"
8,1,1,"8 ranges"
9,1,1,"9 ranges"
10,1,1,"Deciles"
Download Example
<sxv4_filename>.sxv4.quantile_perturbation.csv 

To disable quantile perturbation, make sure this file contains 128 rows and 100 columns with all the values set to zero. For example:

0,0,0,0,0, ...
0,0,0,0,0,
...
Download Example
<sxv4_filename>.sxv4.quantile_config.properties 

Set the value of the RSEPerturbationFactor property to 0:

RSEPerturbationFactor = 0
Download Example

Ranges

It is also important to make sure that users cannot create ranges that are small enough to allow individuals to be identified. You can use the ranges command in SuperADMIN to control the minimum and maximum acceptable values for ranges, as well as the minimum increment.

  • No labels