Contents:
1. Model DTD and tag description.
2. Sample exported XML.
3. Scoring algorithm.
1. Model DTD and
tag description.
Note: the model below assumes that dictionary tags are defined elsewhere.
Variables are referred to by name.
<!-- multinomial-regression scoring model. -->
<!ENTITY % NUMBER "NMTOKEN">
<!ELEMENT regression-model
(
factor-list?,
covariate-list?,
predictor-to-parameter-correlation-matrix?,
parameter-table)>
<!ATTLIST regression-model
model-id CDATA #REQUIRED
response-variable-name CDATA #REQUIRED
number-parameters %NUMBER; #REQUIRED
model-type (regression | general-linear | log-linear | multinomial-logistic) #REQUIRED
verbose-model-specification CDATA #IMPLIED>
<!ELEMENT factor-list
(var-name+)>
<!ELEMENT covariate-list
(var-name+)>
<!ELEMENT var-name (#PCDATA)>
<!ELEMENT predictor-to-parameter-correlation-matrix
(predictor-to-parameter-cell+)>
<!ELEMENT predictor-to-parameter-cell
(#PCDATA)>
<!ATTLIST predictor-to-parameter-cell
predictor-name
CDATA #REQUIRED
parameter-name
CDATA #REQUIRED>
<!ELEMENT parameter-table
(parameter-cell+)>
<!ELEMENT parameter-cell
EMPTY>
<!ATTLIST parameter-cell
target-category
CDATA #REQUIRED
parameter-name
CDATA
#REQUIRED
beta
%NUMBER; #REQUIRED
std-error
%NUMBER; #IMPLIED
df
%NUMBER; #IMPLIED>
All cells determined to be missing from the xml file at model parsing will be assumed to be empty. Since empty cells make up a large chunk of the matrix, this will reduce the size of the exported model.For each predictor variable v and each parameter p, the corresponding cell value is missing (empty) if there is no correlation between v and p. These empty cells are not exported with the model. If there is a correlation between a covariate predictor and the parameter, the cell value is set to the exponent that the covariate is raised to in the dependency expression. Example: assuming variable jobcat is a factor and work is a covariate, the parameter [jobcat=professional] * work * work is correlated to the covariate work, and the number that should be entered in the cell is 2 because work is present at second power in the expression . If there is a correlation between the factor variable and the parameter, the cell value is set to the value of the factor variable that determines the correlation. Example: assuming the categories of the factor variable jobcat are: professional, clerical, skilled, unskilled, the cell in the matrix that corresponds to (jobcat=skilled, jobcat) has a value of skilled.
Here is the information about the variables:
Name | Type | Number of categories | Categories (numeric coding in parentheses) |
JOBCAT | Response | 7 | Clerical (1), Office trainee (2), Security officer (3), College trainee(4), Exempt employee(5), MBA trainee (6), and Technical (7) |
SEX | Factor | 2 | Males (0), and Females (1) |
MINORITY | Factor | 2 | White (0), and Nonwhite (1) |
AGE | Covariate | ||
WORK | Covariate |
The parameter estimates are displayed as follows:
The predictor-to-parameter-correlation-matrix is:
Parameter | SEX | MINORITY | AGE | WORK |
Intercept | . | . | . | . |
[SEX = 0] | 0 | . | . | . |
[SEX = 1] | 1 | . | . | . |
[MINORITY = 0]([SEX = 0]) | 0 | 0 | . | . |
[MINORITY = 1]([SEX = 0]) | 0 | 1 | . | . |
[MINORITY = 0]([SEX = 1]) | 1 | 0 | . | . |
[MINORITY = 1]([SEX = 1]) | 1 | 1 | . | . |
AGE | . | . | 1 | . |
WORK | . | . | . | 1 |
This predictor-to-parameter combinations mapping is the same for each response category.
The corresponding XML model is :
<REGRESSION-MODEL3. Scoring algorithm.
MODEL-ID="{ 0xf7292af1, 0x3df1, 0x11d3, { 0xb4, 0xd6, 0x0, 0x60, 0x97, 0x59, 0x4f, 0xa1 } }"
RESPONSE-VARIABLE-NAME="jobcat"
NUMBER-PARAMETERS="9"
MODEL-TYPE="multinomial-logistic"
VERBOSE-MODEL-SPECIFICATION="NOMREG jobcat BY sex minority WITH age work /INTERCEPT = INCLUDE /MODEL = sex minority(sex) age work"><FACTOR-LIST>
<VAR-NAME>sex</VAR-NAME>
<VAR-NAME>minority</VAR-NAME>
</FACTOR-LIST>
<COVARIATE-LIST>
<VAR-NAME>age</VAR-NAME>
<VAR-NAME>work</VAR-NAME>
</COVARIATE-LIST>
<PREDICTOR-TO-PARAMETER-CORRELATION-MATRIX>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="sex" PARAMETER-NAME="[SEX=0]">1</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="sex" PARAMETER-NAME="[SEX=1]">2</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="sex" PARAMETER-NAME="[MINORITY=0]([SEX=0])">1</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="sex" PARAMETER-NAME="[MINORITY=1]([SEX=0])">1</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="sex" PARAMETER-NAME="[MINORITY=0]([SEX=1])">2</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="sex" PARAMETER-NAME="[MINORITY=1]([SEX=1])">2</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="minority" PARAMETER-NAME="[MINORITY=0]([SEX=0])">1</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="minority" PARAMETER-NAME="[MINORITY=1]([SEX=0])">2</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="minority" PARAMETER-NAME="[MINORITY=0]([SEX=1])">1</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="minority" PARAMETER-NAME="[MINORITY=1]([SEX=1])">2</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="age" PARAMETER-NAME="age">1</PREDICTOR-TO-PARAMETER-CELL>
<PREDICTOR-TO-PARAMETER-CELL PREDICTOR-NAME="work" PARAMETER-NAME="work">1</PREDICTOR-TO-PARAMETER-CELL>
</PREDICTOR-TO-PARAMETER-CORRELATION-MATRIX>
<PARAMETER-TABLE>
<PARAMETER-CELL TARGET-CATEGORY="1" PARAMETER-NAME="Intercept" BETA="26.836" STD-ERROR="3526.252" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="1" PARAMETER-NAME="[SEX=0]" BETA="-.719" STD-ERROR="3526.250" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="1" PARAMETER-NAME="[MINORITY=0]([SEX=0])" BETA="-19.214" STD-ERROR="1.187" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="1" PARAMETER-NAME="[MINORITY=0]([SEX=1])" BETA="-.114" STD-ERROR="2606.65" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="1" PARAMETER-NAME="AGE" BETA="-.133" STD-ERROR=".086" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="1" PARAMETER-NAME="WORK" BETA="7.885E-02" STD-ERROR=".104" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="2" PARAMETER-NAME="Intercept" BETA="31.077" STD-ERROR="3526.252" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="2" PARAMETER-NAME="[SEX=0]" BETA="-.869" STD-ERROR="3526.250" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="2" PARAMETER-NAME="[MINORITY=0]([SEX=0])" BETA="-18.99" STD-ERROR="1.213" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="2" PARAMETER-NAME="[MINORITY=0]([SEX=1])" BETA="1.01" STD-ERROR="2606.65" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="2" PARAMETER-NAME="AGE" BETA="-.3" STD-ERROR=".091" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="2" PARAMETER-NAME="WORK" BETA=".152" STD-ERROR=".111" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="3" PARAMETER-NAME="Intercept" BETA="6.836" STD-ERROR="4061.421" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="3" PARAMETER-NAME="[SEX=0]" BETA="16.305" STD-ERROR="4061.419" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="3" PARAMETER-NAME="[MINORITY=0]([SEX=0])" BETA="-20.041" STD-ERROR="1.297" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="3" PARAMETER-NAME="[MINORITY=0]([SEX=1])" BETA="-.73" STD-ERROR="3449.165" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="3" PARAMETER-NAME="AGE" BETA="-.156" STD-ERROR=".107" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="3" PARAMETER-NAME="WORK" BETA=".267" STD-ERROR=".124" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="4" PARAMETER-NAME="Intercept" BETA="8.816" STD-ERROR="2862.832" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="4" PARAMETER-NAME="[SEX=0]" BETA="15.264" STD-ERROR="2862.829" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="4" PARAMETER-NAME="[MINORITY=0]([SEX=0])" BETA="-16.799" STD-ERROR="1.546" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="4" PARAMETER-NAME="[MINORITY=0]([SEX=1])" BETA="16.48" STD-ERROR="0.00" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="4" PARAMETER-NAME="AGE" BETA="-.133" STD-ERROR=".091" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="4" PARAMETER-NAME="WORK" BETA="-.16" STD-ERROR=".126" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="5" PARAMETER-NAME="Intercept" BETA="5.862" STD-ERROR="5011.208" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="5" PARAMETER-NAME="[SEX=0]" BETA="16.437" STD-ERROR="5011.207" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="5" PARAMETER-NAME="[MINORITY=0]([SEX=0])" BETA="-17.309" STD-ERROR="1.383" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="5" PARAMETER-NAME="[MINORITY=0]([SEX=1])" BETA="15.888" STD-ERROR="4412.753" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="5" PARAMETER-NAME="AGE" BETA="-.105" STD-ERROR=".090" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="5" PARAMETER-NAME="WORK" BETA="6.914E-02" STD-ERROR=".109" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="6" PARAMETER-NAME="Intercept" BETA="6.495" STD-ERROR="9095.723" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="6" PARAMETER-NAME="[SEX=0]" BETA="17.297" STD-ERROR="9095.722" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="6" PARAMETER-NAME="[MINORITY=0]([SEX=0])" BETA="-19.098" STD-ERROR=".000" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="6" PARAMETER-NAME="[MINORITY=0]([SEX=1])" BETA="16.841" STD-ERROR="8780.225" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="6" PARAMETER-NAME="AGE" BETA="-.141" STD-ERROR=".119" DF="1"/>
<PARAMETER-CELL TARGET-CATEGORY="6" PARAMETER-NAME="WORK" BETA="-5.058E-02" STD-ERROR=".184" DF="1"/>
</PARAMETER-TABLE></REGRESSION-MODEL>
We will use the above example to illustrate the steps that should be followed in the scoring process. Say the following case (observation) must be scored:
obs = (sex=1 minority=0 age=25 work=4)