This is the documentation for SuperSTAR 9.8

SuperSTAR 9.9 is now available.
View this page in the SuperSTAR 9.9 documentation or visit the SuperSTAR 9.9 documentation home.

Skip to end of metadata
Go to start of metadata
SummaryDefines formula definitions for statistical functions available in SuperCROSS.
Default Location

C:\ProgramData\STR\SuperCROSS

Weighted and Unweighted Summations

The statistics.xml file defines all the formulas for unweighted summation options in SuperCROSS.

If you have weighted datasets, then the formulas will either come from statistics.xml or the formulas.xml file, depending on the type of weighting in use:

Weighting ModeUnweighted SummationsWeighted SummationsRSE of Weighted Summations
SuperSERVER weightingstatistics.xml statistics.xmlstatistics.xml
SuperCROSS client weightingstatistics.xmlformulas.xml-

Example

<STATISTICSFUNCTION id="type1">
    <TAGS measure="measure"/>
    
    <LABELTEMPLATE expression="%FUNCTION of %MEASURE"/> 
    <LABELTEMPLATE expression="Weighted %FUNCTION of %MEASURE" weightType="weighted"/>
    <LABELTEMPLATE expression="RSE of Weighted %FUNCTION of %MEASURE" weightType="rse"/>

    <!-- SUM is available for all summations -->
    <FORMULA id="SUM" name="Sum"/>
    
    <!-- MEAN is available only for measures -->
    <FORMULA id="MEAN" name="Mean" sumType="measure"/>

    <!-- STANDDEV is available only for unweighted measures -->
    <FORMULA id="STANDDEV" name="Standard Deviation" weightType="unweighted" sumType="measure"/>

    <FORMULA id="VARIANCE" name="Variance" weightType="unweighted" sumType="measure"/>

    <FORMULA id="PERCENTILE" name="First Quartile" sumType="measure">
        <PARAMETER>0.25</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Median" sumType="measure">
        <PARAMETER>0.5</PARAMETER>
    </FORMULA>
    
    <FORMULA id="PERCENTILE" name="Last Quartile" sumType="measure">
        <PARAMETER>0.75</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="First Decile (10%)" sumType="measure">
        <PARAMETER>0.1</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Second Decile (20%)" sumType="measure">
        <PARAMETER>0.2</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Third Decile (30%)" sumType="measure">
        <PARAMETER>0.3</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Fourth Decile (40%)" sumType="measure">
        <PARAMETER>0.4</PARAMETER>
    </FORMULA>
 
    <FORMULA id="PERCENTILE" name="Fifth Decile (50%)" sumType="measure">
        <PARAMETER>0.5</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Sixth Decile (60%)" sumType="measure">
        <PARAMETER>0.6</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Seventh Decile (70%)" sumType="measure">
        <PARAMETER>0.7</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Eighth Decile (80%)" sumType="measure">
        <PARAMETER>0.8</PARAMETER>
    </FORMULA>

    <FORMULA id="PERCENTILE" name="Last Decile (90%)" sumType="measure">
        <PARAMETER>0.9</PARAMETER>
    </FORMULA>

    <FORMULA id="GINI" name="Gini" weightType="unweighted" sumType="measure"/>

    <FORMULA id="LARGE_N" name="Largest (3)" weightType="unweighted" sumType="measure">
        <PARAMETER>3</PARAMETER>
    </FORMULA>

    <FORMULA id="SMALL_N" name="Smallest (3)" weightType="unweighted" sumType="measure">
        <PARAMETER>3</PARAMETER>
    </FORMULA>
    
    <FORMULA id="COUNT_DISTINCT" name="Count Distinct" weightType="unweighted" sumType="measure"/>

    <!-- The following "PERCENT" formula is an example of a custom formula that will be filtered from the SuperCROSS GUI but may be used 
    directly in a TXD -->
    <FORMULA id="PERCENT" name="%" type="SMSW" dp="1">
        <UDF id="x_udf"/>
        <DER_EXPRESSION>("x_udf"%refF("x_udf"))
        </DER_EXPRESSION>
    </FORMULA>
    
    <UDF_DEFINITION id="x_udf">
        <UDF_FORMULA>("measure")</UDF_FORMULA>
    </UDF_DEFINITION>
</STATISTICSFUNCTION>

Structure

<STATISTICSFUNCTION>
The root node.
<TAGS>
This node contains tags that can be used in custom weighting formulas (such as the example PERCENT formula at the bottom of the file).
<LABELTEMPLATE>

A template label string used when the summation option is added to a table. You can use the following variables in the template and they will be replaced with the relevant values when the string is displayed:

%FUNCTION
The name of the formula.
%MEASURE
The name of the measure.
%WEIGHT
The weight applied to this measure.

For example, the template string %FUNCTION of %MEASURE, weighted by %WEIGHT might be displayed in SuperCROSS as Mean of Income, weighted by Household.

This node has an optional attribute, weightType. Use this to define different labels for different weighting types. For example:

  • <LABELTEMPLATE expression="Weighted %FUNCTION of %MEASURE" weightType="weighted"/> applies to weighted summations only.
  • <LABELTEMPLATE expression="RSE of Weighted %FUNCTION of %MEASURE" weightType="rse"/> applies to RSE of weighted summations only.

This label template string is ignored for unweighted sums and counts.

As shown above, you can include multiple instances of the <LABELTEMPLATE>. The ones that appear later in the file take precedence over the earlier ones, so you should specify the most general version first and then override it for specific weightType values as required.

<FORMULA>

A formula that will be available in the summation options. The formula node is described in more detail below.

<PARAMETER>

Some formulas have a child node that sets a parameter for the formula.

Formula

Each formula node has the following attributes:

id

An that identifies the type of formula. The SuperCROSS GUI supports the following id values:

SUM
The total of all the values.
MEAN
The total of all the values divided by the number of values.
STANDDEV
How spread out a distribution is, defined as the square root of the variance.
VARIANCE
How spread out a distribution is. This function is closely related to the Standard Deviation. The variance is defined as the average squared deviation of each number from its mean.
PERCENTILE

Divides the values from a continuous variable into a series of ranges. In the default statistics.xml file this formula is used to generate quartiles, deciles and the median.

This formula has a child <PARAMETER> node that specifies the required percentile, expressed as a value between 0 and 1.

GINI
A measure of inequality of a distribution, defined as a ratio with values between 0 and 1. The numerator is the area between the Lorenz curve of the distribution and the uniform (perfect) distribution line; the denominator is the area under the uniform distribution line.
LARGE_N

The x largest values from contributing unit records.

This formula has a child <PARAMETER> node that specifies how many values to include. In the example above the formula is configured to take the 3 largest values from the unit records.

SMALL_N

The x smallest values from contributing unit records.

This formula has a child <PARAMETER> node that specifies how many values to include. In the example above the formula is configured to take the 3 smallest values from the unit records.

COUNT_DISTINCT
A count of the number of distinct results.

Each type of formula can only be specified in the file once.

For formulas that take a parameter, the combination of ID and parameter must be unique (for example you can only specify one instance of a PERCENTILE with a parameter of 0.5).

You can also use the statistics.xml file to define formulas for use directly in TXDs (as shown in the example above). These formulas will be filtered out by the SuperCROSS GUI and will not appear in the list of available Functions in the Define Recode window in SuperCROSS.

The example above contains an example custom formula for "percent". Custom formulas are specified in the same way as in the weighting formulas.xml file. Refer to the documentation on that file for more details.

name
The formula name. This will be displayed in the list of available Functions in the Define Recode window in SuperCROSS.
sumType

(Optional). Specifies whether the formula applies to counts only (sumType="count") or measures only (sumType="measure").

If this attribute is omitted then by default formulas to apply to both counts and measures.

weightType

(Optional). Specifies whether the formula applies to either unweighted summations only (weightType="unweighted"), weighted summations only (weightType="weighted"), RSE of weighted summations only (weightType="rse"), or a combination specified as a comma separated list (for example, weightType="unweighted,weighted").

If this attribute is omitted then by default formulas apply to all summations.

  • No labels