Data Dictionary
Last Modified: 7/19/99 11:59 AM

Contents:

1. Model DTD and tag description.
2. Sample exported XML.


1. Model DTD and tag description.
Note: the model below assumes that dictionary tags are defined elsewhere. Variables are referred to by name.

<!ELEMENT data-dictionary ((categorical | ordinal | continuous)+)>
<!ELEMENT categorical (category+)>
<!ATTLIST categorical
    name CDATA #REQUIRED
>
<!ELEMENT category EMPTY>
<!ATTLIST category
    value CDATA #REQUIRED
    display-value CDATA #IMPLIED
    proportion CDATA #IMPLIED
    missing (true | false) "false"
>
<!ELEMENT ordinal (order+)>
<!ATTLIST ordinal
    name CDATA #REQUIRED
>
<!ELEMENT order EMPTY>
<!ATTLIST order
    value CDATA #REQUIRED
    display-value CDATA #IMPLIED
    rank CDATA #REQUIRED
    proportion CDATA #IMPLIED
    missing (true | false) "false"
>
<!-- The predicates indicate the values that represent missing values -->
<!ELEMENT continuous ((%predicates;)*)>
<!ATTLIST continuous
    name CDATA #REQUIRED
    minimum CDATA #IMPLIED
    maximum CDATA #IMPLIED
    mean CDATA #IMPLIED
    median CDATA #IMPLIED
    standard-deviation CDATA #IMPLIED
    inter-quartile-range CDATA #IMPLIED
>

data-dictionary  - marks the beginning of the container whose contents define the complete set of fields referenced in any model in the file. Any field referenced any place in the PMML file must be declared here.
categorical - one of the three data types that a field may have.
category - one of the values that a categorical field may take. When exporting a PMML file, an entry should be made for each category that was found in the training data. If this value represents missing data, the missing attribute should be set to true.
ordinal - one of the three data types that a field may have.
order - one of the values that an ordinal field may take. When exporting a PMML file, an entry should be made for each ordinal value that was found in the training data. If this value represents missing data, the missing attribute should be set to true.
continuous - one of the three data types that a field may have. To indicate which continuous values represent missing values, construct a series of predicates.

2. Sample exported XML.

<data-dictionary> <categorical name="gender"> <category value="M"/> <category value="F"/> <category value="." missing="true"/> </categorical> <ordinal name="grade"> <order value="A" rank="4"/> <order value="B" rank="3"/> <order value="C" rank="2"/> <order value="D" rank="1"/> <order value="F" rank="0"/> <order value="I" rank="N/A" missing="true"/> </ordinal> <continuous name="height"> <compound-predicate bool-op="or"> <predicate name="height" op="le" value="0"/> <predicate name="height" op="ge" value="11"/> </compound-predicate> </continuous> </data-dictionary>