DDI Logo

 

DDI Tag Library


This Tag Library describing the five main sections of the Document Type Definition (DTD) for social science data documentation developed by the Data Documentation Initiative (DDI) Committee. These documents present English language descriptions of XML (eXtensible Markup Language) DTD elements and attributes and instructions for their use as of Version 1 (Final) by Jerome McDonough, UC-Berkeley Library.


The following are the highest level components of any document that will be marked up in compliance with this DTD.

A graphical representation of the document hierarchy is also available.


  1. Document Description
    Items describing the marked-up document itself as well as its source documents (citation, title, etc.)

    Element -- optional, not repeatable.

  2. Study Description
    Items describing the overall data collection (title, citation, methodology, study scope, data access, etc.)

    Element -- required, repeatable.

  3. Data Files Description
    Items relating to the format, size, and structure of the data files

    Element -- optional, repeatable.

  4. Variables Description
    Items relating to variables in the data collection

    Element -- optional, repeatable.

  5. Other Study-Related Materials
    Other study-related material not included in the other sections (bibliography, separate questionnaire file, etc.)

    Element -- optional, repeatable.


Document Description
(Codebook Header)

Section 1.0 of the Data Documentation Initiative (DDI) DTD


Document Description's Place within the Document Structure


    Document

          |---DOCUMENT DESCRIPTION

          |---Study Description

          |---Data Files Description

          |---Variables Description

          |---Other Study-Related Materials


Role of the Document Description

The Document Description consists of bibliographic information describing the DDI-compliant document itself as a whole. This Document Description can be considered the wrapper or header whose elements uniquely describe the full contents of the compliant DDI file. Since the Document Description section is used to identify the DDI-compliant file within an electronic resource discovery environment, this section should be as complete as possible. The author in the Document Description should be the individual(s) or organization(s) directly responsible for the intellectual content of the DDI version, as distinct from the person(s) or organization(s) responsible for the intellectual content of the earlier paper or electronic edition from which the DDI edition may have been derived. The producer in the Document Description should be the agency or person that prepared the marked-up document. Note that the Document Description section contains a Documentation Source subsection (1.4) consisting of information about the source of the DDI-compliant file-- that is, the hardcopy or electronic codebook that served as the source for the marked-up codebook. These sections allow the creator of the DDI file to produce version, responsibility, and other descriptions relating to both the creation of that DDI file as a separate and reformatted version of source materials (either print or electronic) and the original source materials themselves.


To comply with the Dublin Core, it is recommended that the following elements in the Document Description be used when the appropriate information is available:


DUBLIN CORE    DDI
------------------

Title          1.1.1.1 title (Title of Marked-up Document)    

Creator        1.1.2.1 AuthEnty (Authoring Entity)       

Publisher      1.1.3.1 producer (Producer)               
               [NOTE: The Dublin Core specifies that the 
               publisher should be "the entity    
               responsible for making the resource
               available *in its present form*"   
               (emphasis added).  For a DDI codebook
               the publisher should be the entity 
               responsible for making the         
               *electronic* DDI version available.

Contributor    1.1.2.3 othId (Other Ident. & Acknowl.)   

Date           1.1.3.3 prodDate (Date of Production)     
               [NOTE: The DC Date element 
               should refer to the date the       
               electronic resource (e.g., the DDI 
               version of the codebook) was created,
               not any preceding paper version.]  

Identifier     Suggested DC Identifier: URL for DDI
               Codebook, if applicable.
               Alternatively, use the IDNo element 
               (1.1.1.5) within the Document Description 
               citation element.

Relation       Partially maps to 1.4 docSrc (Documentation
               Source).  No mapping currently exists
               for the relation type component.

Rights         1.1.3.2 copyright (Copyright)


Document Description

<docDscr> 1.0
Description: This section contains information about both the document being created (the marked-up document) and the source document (the electronic or print codebook which is the source(s) of information), if one exists. It addition, it provides information on how to use the document contents and on the status of the document itself. Although this element is optional, it is strongly recommended that all marked-up documents contain at minimum the following nested set of elements: <docDscr> 1.0, <citation> 1.1, <titlStmt> 1.1.1, and <titl> 1.1.1.1 (required).
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Citation -- Marked-up Document, Guide to Documentation, Documentation Status, Documentation Source, Notes (Document Description)


Citation -- Marked-up Document
<citation> 1.1 (Generic element A.6)
Description: Citation for the marked-up document. This element encodes the bibliographic information describing the marked-up codebook, including title information, statement of responsibility, production and distribution information, series and version information, text of a preferred bibliographic citation, and notes (if any). A MARCURI attribute is provided to link to the MARC record for this citation. Remarks: Note that it is the elements within this citation element that are the primary source for most generic search engines through their relationship to the Dublin Core tags.

Optional
Not Repeatable
Attributes: ID, xml:lang, source, MARCURI
Contains Elements: Title Statement -- Marked-up Document, Responsibility Statement -- Marked-up Document, Production Statement -- Marked-up Document, Distributor Statement -- Marked-up Document, Series Statement -- Marked-up Document, Version Statement -- Marked-up Document, Bibliographic Citation -- Marked-up Document, Holdings Information -- Marked-up Document, Notes (Citation) -- Marked-up Document


Title Statement -- Marked-up Document
<titlStmt> 1.1.1 (Generic element A.6.1)
Description: Title statement for the marked-up document.
Required
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Title -- Marked-up Document, Subtitle -- Marked-up Document, Alternative Title -- Marked-up Document, Parallel Title -- Marked-up Document, ID Number -- Marked-up Document


Title -- Marked-up Document
<titl> 1.1.1.1 (Generic element A.6.1.1)
Description: Contains the full authoritative title of the marked-up codebook. The marked-up codebook title will in most cases be identical to the title for the data collection (2.1.1). A full title should indicate the geographic scope of the data collection as well as the time period covered. Equivalent to Dublin Core Title.
Examples:
<titl>Domestic Violence Experience in Omaha, Nebraska, 1986-1987</titl>
<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<titl>Monitoring the Future: A Continuing Study of American Youth, 1995</titl>

Required
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Subtitle -- Marked-up Document
<subTitl> 1.1.1.2 (Generic element A.6.1.2)
Description: A subtitle is a secondary title used to amplify or state certain limitations of the main title. It may repeat information already in the main title.
Examples:
<titl>Monitoring the Future: A Continuing Study of American Youth, 1995</titl>
<subTitl>A Continuing Study of American Youth, 1995</subTitl>

<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<subTitl>Public Use Microdata Sample</subTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Alternative Title -- Marked-up Document
<altTitl> 1.1.1.3 (Generic element A.6.1.3)
Description: The alternative title may be the title by which a data collection is commonly referred to or it may be an abbreviation for the title.
Examples:
<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<altTitl>PUMS</altTitl>

<titl>Equality of Educational Opportunity (Coleman) Study (EEOS), 1996</titl>
<altTitl>The Coleman Study</altTitl>
<altTitl>EEOS</altTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Parallel Title -- Marked-up Document
<parTitl> 1.1.1.4 (Generic element A.6.1.4)
Description: Title translated into another language.
Example:
<titl>Politbarometer West [Germany], Partial Accumulation, 1977-1995</titl>
<parTitl>Politbarometer, 1977-1995: Partielle Kumulation</parTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


ID Number -- Marked-up Document
<IDNo> 1.1.1.5 (Generic element A.6.1.5)
Description: Unique string or number (producer's or archive's number) for the marked-up document. An "agency" attribute is supplied. Equivalent to Dublin Core Identifier.
Examples:
<IDNo agency='ICPSR'>6678</IDNo>
<IDNo agency='ZA'>2010</IDNo>

Optional
Repeatable
Attributes: ID, xml:lang, source, agency
Contains: #PCDATA, Link to other element(s) within the codebook.


Responsibility Statement -- Marked-up Document
<rspStmt> 1.1.2 (Generic Element A.6.2)
Description: Responsibility for the creation of the marked-up codebook.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Authoring Entity / Primary Investigator -- Marked-up Document, Other Identifications / Acknowledgments -- Marked-up Document


Authoring Entity / Primary Investigator -- Marked-up Document
<AuthEnty> 1.1.2.1 (Generic element A.6.2.1)
Description: The person, corporate body, or agency responsible for the marked-up document's substantive and intellectual content. Usually the same as the authoring entity responsible for the data collection (2.1.2.1). Repeat the element for each author, and use the affiliation attribute if available. Invert first and last name and use commas. Equivalent to Dublin Core Creator. Remarks: The author in the Document Description should be the individual(s) or organization(s) directly responsible for the intellectual content of the DDI version, as distinct from the person(s) or organization(s) responsible for the intellectual content of the earlier paper or electronic edition from which the DDI edition may have been derived. The producer (1.1.3.1) in the Document Description should be the agency or person that prepared the marked-up document.

Examples:
<AuthEnty>United States Department of Commerce. Bureau of the Census</AuthEnty>
<AuthEnty affiliation='European Commission'>Rabier, Jacques-Rene</AuthEnty>

Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Other Identifications / Acknowledgments -- Marked-up Document
<othId> 1.1.2.2 (Generic element A.6.2.2)
Description: Statements of responsibility not recorded in the title and statement of responsibility areas. Indicate here the persons or bodies connected with the work, or significant persons or bodies connected with previous editions and not already named in the description. For example, the name of the person who edited the marked-up documentation might be cited here, using the role and affiliation attributes. Remarks: The paragraph tag <p> must be used in this element.

Example:
<othId role='editor' affiliation='INRA'><p>Jane Smith</p></othId>

Optional
Repeatable
Attributes: ID, xml:lang, source, type, role, affiliation
Contains: <p>, othId


Production Statement -- Marked-up Document
<prodStmt> 1.1.3 (Generic element A.6.3)
Description: Production statement for the marked-up document.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Producer -- Marked-up Document, Copyright -- Marked-up Document, Date of Production -- Marked-up Document, Place of Production -- Marked-up Document, Software Used in Production -- Marked-up Document, Funding Agency -- Marked-up Document, Grant Number -- Marked-up Document


Producer -- Marked-up Document
<producer> 1.1.3.1 (Generic element A.6.3.1)
Description: The producer of the marked-up document is the person or organization with the financial or administrative responsibility for the physical processes whereby the marked-up document was brought into existence. Use the role attribute to distinguish different stages of involvement in the production process, such as original producer. Equivalent to Dublin Core Publisher.
Example:
<producer abbr='ICPSR' affiliation='Institute for Social Research'>Inter-university Consortium for Political and Social Research</producer>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Copyright -- Marked-up Document
<copyright> 1.1.3.2 (Generic element A.6.3.2)
Description: Copyright statement for the marked-up document. Equivalent to Dublin Core Rights.
Example:
<copyright>Copyright(c) ICPSR, 2000</copyright>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Production -- Marked-up Document
<prodDate> 1.1.3.3 (Generic element A.6.3.3)
Description: Date the marked-up document was produced (not distributed or archived). The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. Equivalent to Dublin Core Date.
Example:
<prodDate date='1999-01-25'>January 25, 1999</prodDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Place of Production -- Marked-up Document
<prodPlac> 1.1.3.4 (Generic element A.6.3.4)
Description: Address of the archive or agency that produced the marked-up document.
Example:
<prodPlac>Ann Arbor, MI: Inter-university Consortium for Political and Social Research</prodPlac>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Software Used in Production -- Marked-up Document
<software> 1.1.3.5 (Generic element A.6.3.5)
Description: Software used to produce the marked-up document. A "version" attribute permits specification of the software version number. The "date" attribute is provided to enable specification of the date (if any) for the software release. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Examples:
<software version='1.0'>MRDC Codebook Authoring Tool</software>
<software version='8.0'>Arbortext Adept Editor</software>

Optional
Repeatable
Attributes: ID, xml:lang, source, version, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Funding Agency -- Marked-up Document
<fundAg> 1.1.3.6 (Generic element A.6.3.6)
Description: The source(s) of funds for production of the marked-up document. If different funding agencies sponsored different stages of the production process, use the role attribute to distinguish them.
Examples:
<fundAg abbr='NSF' role="infrastructure">National Science Foundation</fundAg>
<fundAg abbr='SUN' role="equipment">Sun Microsystems</fundAg>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Grant Number -- Marked-up Document
<grantNo> 1.1.3.7 (Generic element A.6.3.7)
Description: The grant/contract number of the project that sponsored the markup effort. If more than one, indicate the appropriate agency using the "agency" attribute. If different funding agencies sponsored different stages of the production process, use the role attribute to distinguish the grant numbers.
Example:
<grantNo agency='Bureau of Justice Statistics'>J-LEAA-018-77</grantNo>

Optional
Repeatable
Attributes: ID, xml:lang, source, agency, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Distributor Statement -- Marked-up Document
<distStmt> 1.1.4 (Generic element A.6.4)
Description: Distribution statement for the marked-up document.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Distributor -- Marked-up Document, Contact Person -- Marked-up Document, Depositor -- Marked-up Document, Date of Deposit -- Marked-up Document, Date of Distribution -- Marked-up Document


Distributor -- Marked-up Document
<distrbtr> 1.1.4.1 (Generic element A.6.4.1)
Description: The organization designated by the author or producer to generate copies of particular marked-up documentation including any necessary editions or revisions. Names and addresses may be specified and other archives may be co-distributors. A URI attribute is included to provide an URN or URL to the ordering service or download facility on a website.
Example:
<distrbtr abbr='ICPSR' affiliation='Institute for Social Research' URI='http://www.icpsr.umich.edu'>Ann Arbor, MI: Inter-university Consortium for Politcal and Social Research</distrbtr>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Contact Person -- Marked-up Document
<contact> 1.1.4.2 (Generic element A.6.4.2)
Description: Names and addresses of individuals responsible for the marked-up document. Individuals listed as contact persons will be used as resource persons regarding problems or questions raised by the user community. The URI attribute should be used to indicate a URN or URL for the homepage of the contact individual. The email attribute is used to indicate an email address for the contact individual.
Example:
<contact affiliation='University of Wisconsin' email='jsmith@...>Jane Smith</contact>

Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation, URI, email
Contains: #PCDATA, Link to other element(s) within the codebook.


Depositor -- Marked-up Document
<depositr> 1.1.4.3 (Generic element A.6.4.3)
Description: The name of the person (or institution) who provided this marked-up documentation to the archive storing it.
Example:
<depositr abbr='BJS' affiliation='U.S. Department of Justice'>Bureau of Justice Statistics</depositr>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Deposit -- Marked-up Document
<depDate> 1.1.4.4 (Generic element A.6.4.4)
Description: The date that the marked-up document was deposited with the archive that originally received it. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<depDate date='1999-01-25'>January 25, 1999</depDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Distribution -- Marked-up Document
<distDate> 1.1.4.5 (Generic element A.6.4.5)
Description: Date that the marked-up document was made available for distribution/presentation. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<distDate date='1999-01-25'>January 25, 1999</distDate>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Series Statement -- Marked-up Document
<serStmt> 1.1.5 (Generic element A.6.5)
Description: Series statement for the marked-up document. The URI attribute is provided to point to a central Internet repository of series information.

Optional
Not Repeatable
Attributes: ID, xml:lang, source, URI
Contains Elements: Series Name -- Marked-up Document, Series Information -- Marked-up Document


Series Name -- Marked-up Document
<serName> 1.1.5.1 (Generic element A.6.5.1)
Description: The name of the series to which the marked-up document belongs. This will probably be the same as the Series Name for the study or data collection (2.1.5.1).
Example:
<serName abbr='CPS'>Current Population Survey Series</serName>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr
Contains: #PCDATA, Link to other element(s) within the codebook.


Series Information -- Marked-up Document
<serInfo> 1.1.5.2 (Generic element A.6.5.2)
Description: Contains a history of the series and a summary of those features that apply to the series as a whole. This will probably be the same as the Series Information for the study or data collection (2.1.5.2).
Example:
<serInfo>The Current Population Survey (CPS) is a household sample survey conducted monthly by the Census Bureau to provide estimates of employment, unemployment, and other characteristics of the general labor force, estimates of the population as a whole, and estimates of various subgroups in the population. The entire non-institutionalized population of the United States is sampled to obtain the respondents for this survey series.</serInfo>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Statement -- Marked-up Document
<verStmt> 1.1.6 (Generic element A.6.6)
Description: Version statement for the marked-up document.

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Version -- Marked-up Document, Version Responsibility Statement -- Marked-up Document, Notes (Version) -- Marked-up Document


Version -- Marked-up Document
<version> 1.1.6.1 (Generic element A.6.6.1)
Description: Also known as release or edition. If there have been substantive changes in the marked-up document since its creation, this statement should be used. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. Remarks: ICPSR distinguishes among the terms "release," "version," and "edition" in the following ways:
  • ICPSR Edition: Used only for intensively processed collections, for which ICPSR has produced a unique edition of the data. This usually involves checking for undocumented codes and consistency checks. Signals that additional intellectual effort has gone into producing the collection.
  • ICPSR Version: Used to indicate that ICPSR has revised the format of a collection or added components to it, in most cases without changing any data values. A study is considered an "ICPSR version" if one or more of these steps has been performed: (1) Converting software-specific system files or export/transport files to raw data; (2) Generating SAS and/or SPSS data definition statements; (3) Reformatting files, e.g., removing blanks to use space more efficiently; (4)Scanning hardcopy documentation; or (5)Reformatting machine-readable documentation, e.g., converting text created in a word-processing package to ASCII text.
  • Release: Used for data collections that are being disseminated exactly as they came from the data depositor (except for the addition of an ICPSR cover and ICPSR front matter).
Example:
<version type='edition' date='1999-01-25'>Second ICPSR Edition</version>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type (release, version, edition), date
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Responsibility Statement -- Marked-up Document
<verResp> 1.1.6.2 (Generic element A.6.6.2)
Description: Used to indicate the organization or person responsible for the version of the marked-up document.
Example:
<verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes (Version) -- Marked-up Document
<notes> 1.1.6.3 (Generic element A.4)
Description: Used to indicate additional information regarding the version or the version responsibility statement for the marked-up document, in particular to indicate what makes a new version different from its predecessor. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes resp='Jane Smith'>Additional information on derived variables has been added to this marked-up version of the documentation.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Bibliographic Citation -- Marked-up Document
<biblCit> 1.1.7 (Generic element A.6.7)
Description: Complete bibliographic reference containing all of the standard elements of a citation that can be used to cite the marked-up document. The "format" attribute is provided to enable specification of the particular citation style used, e.g. APA, MLA, Chicago, etc.
Example:
<biblCit format='MRDF'>Rabier, Jacques-Rene, and Ronald Inglehart. EURO-BAROMETER 11: YEAR OF THE CHILD IN EUROPE, APRIL 1979 [Codebook file]. Conducted by Institut Francais D'Opinion Publique (IFOP), Paris, et al. ICPSR ed. Ann Arbor, MI: Inter-university Consortium for Political and Social Resarch [producer and distributor], 1981. </biblCit>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, format
Contains: #PCDATA, Link to other element(s) within the codebook.


Holdings Information -- Marked-up Document
<holdings> 1.1.8 ((Generic element A.6.8)
Description: Information concerning either the physical or electronic holdings of the cited work. Attributes include: location--The physical location where a copy is held; callno--The call number for a work at the location specified; and URI--A URN or URL for accessing the electronic copy of the cited work.
Example:
<holdings location='ICPSR DDI Repository' callno='inap.' URI='http://www.icpsr.umich.edu/DDIrepository/'> Marked-up Codebook for Current Population Survey, 1999: Annual Demographic File</holdings>

Optional
Repeatable
Attributes: ID, xml:lang, source, location, callno, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes (Citation) -- Marked-up Document
<notes> 1.1.9 (Generic element A.4)
Description: Used to indicate additional information regarding the citation for the marked-up document. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes resp='Jane Smith'>This citation was prepared by the archive based on information received from the markup authors.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Guide to the Documentation -- Marked-up Document
<guide> 1.2
Description: List of terms and definitions used in the document. Provided to assist users in using the document correctly. For further examples, see the Codebook Information section of any of the printed, bound codebooks distributed by ICPSR.
Example:
<guide>Metro Area OR Twin Cities = Minneapolis/St. Paul MSA; Greater MN = All Minnesota Counties not included in the Minneapolis/St. Paul MSA; The Range = Upper Northeast quadrant of Minnesota traditionally associated with iron ore and taconite mining.</guide>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Documentation Status -- Marked-up Document
<docStatus> 1.3
Description: Use this field to indicate if the document is being presented/distributed before it has been finalized. Some data producers and social science data archives employ data processing strategies that provide for release of data and documentation at various stages of processing.
Example:
<docStatus>This marked-up document includes a provisional data dictionary and brief citation only for the purpose of providing basic access to the data file. A complete codebook will be published at a later date.</docStatus>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Documentation Source
<docSrc> 1.4 (Generic element A.6)
Description: Citation for the source document. This element encodes the bibliographic information describing the source codebook, including title information, statement of responsibility, production and distribution information, series and version information, text of a preferred bibliographic citation, and notes (if any). Information for this section should be taken directly from the source document whenever possible. If additional information is obtained and entered in the elements within this section, the source of this information should be noted in the source attribute of the particular element tag. A MARCURI attribute is provided to link to the MARC record for this citation.
Optional
Repeatable
Attributes: ID, xml:lang, source, MARCURI
Contains Elements: Title Statement -- Source Document, Responsibility Statement -- Source Document, Production Statement -- Source Document, Distributor Statement -- Source Document, Series Statement -- Source Document, Version Statement -- Source Document, Bibliographic Citation -- Source Document, Holdings Information -- Source Document, Notes (Version) -- Source Document


Title Statement -- Source Document
<titlStmt> 1.4.1 (Generic element A.6.1)
Description: Title statement for the source document.
Required
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Title -- Source Document, Subtitle -- Source Document, Alternative Title -- Source Document, Parallel Title -- Source Document, ID Number -- Source Document


Title -- Source Document
<titl> 1.4.1.1 (Generic element A.6.1.1)
Description: Contains the full authoritative title of the source document. The source document title will in many cases be identical to the title for the marked-up document. If the source document contains no title, the title provided in this element should indicate the geographic scope of the data collection as well as the time period covered.
Examples:
<titl>Domestic Violence Experience in Omaha, Nebraska, 1986-1987</titl>
<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<titl>Monitoring the Future: A Continuing Study of American Youth, 1995</titl>

Required
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Subtitle -- Source Document
<subTitl> 1.4.1.2 (Generic element A.6.1.2)
Description: A subtitle is a secondary title used to amplify or state certain limitations of the main title. It may repeat information already in the main title.
Examples:
<titl>Monitoring the Future: A Continuing Study of American Youth, 1995</titl>
<subTitl>A Continuing Study of American Youth, 1995</subTitl>

<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<subTitl>Public Use Microdata Sample</subTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Alternative Title -- Source Document
<altTitl> 1.4.1.3 (Generic element A.6.1.3)
Description: The alternative title may be the title by which a data collection is commonly referred to or it may be an abbreviation for the title.
Examples:
<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<altTitl>PUMS</altTitl>

<titl>Equality of Educational Opportunity (Coleman) Study (EEOS), 1996</titl>
<altTitl>The Coleman Study</altTitl>
<altTitl>EEOS</altTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Parallel Title -- Source Document
<parTitl> 1.4.1.4 (Generic element A.6.1.4)
Description: Title translated into another language.
Example:
<titl>Politbarometer West [Germany], Partial Accumulation, 1977-1995</titl>
<parTitl>Politbarometer, 1977-1995: Partielle Kumulation</parTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


ID Number -- Source Document
<IDNo> 1.4.1.5 (Generic element A.6.1.5)
Description: Unique string or number (producer's or archive's number) for the source document. An "agency" attribute is supplied.
Examples:
<IDNo agency='ICPSR'>6678</IDNo>
<IDNo agency='ZA'>2010</IDNo>

Optional
Repeatable
Attributes: ID, xml:lang, source, agency
Contains: #PCDATA, Link to other element(s) within the codebook.


Responsibility Statement -- Source Document
<rspStmt> 1.4.2 (Generic element A.6.2)
Description: Responsibility for the creation of the source document.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Authoring Entity / Primary Investigator -- Source Document, Other Identifications / Acknowledgments -- Source Document,


Authoring Entity / Primary Investigator -- Source Document
<AuthEnty> 1.4.2.1 (Generic element A.6.2.1)
Description: The person, corporate body, or agency responsible for the source document's substantive and intellectual content. Usually the same as the authoring entity responsible for the data collection (2.1.2.1). Repeat the element for each author, and use the affiliation attribute if available. Invert first and last name and use commas. Remarks: The author in this element should be the individual(s) or organization(s) directly responsible for the intellectual content of the source document, as distinct from the person(s) or organization(s) responsible for the intellectual content of the marked-up document.

Examples:
<AuthEnty>United States Department of Commerce. Bureau of the Census</AuthEnty>
<AuthEnty affiliation='European Commission'>Rabier, Jacques-Rene</AuthEnty>

Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Other Identifications / Acknowledgments -- Source Document
<othId> 1.4.2.2 (Generic element A.6.2.2)
Description: Statements of responsibility not recorded in the title and statement of responsibility areas. Indicate here the persons or bodies connected with the work, or significant persons or bodies connected with previous editions and not already named in the description. For example, the name of the person who edited the source document might be cited here, using the role and affiliation attributes. Remarks: The paragraph tag <p> must be used in this element.

Example:
<othId role='editor' affiliation='INRA'><p>Jane Smith</p></othId>

Optional
Repeatable
Attributes: ID, xml:lang, source, type, role, affiliation
Contains: <p>, othId


Production Statement -- Source Document
<prodStmt> 1.4.3 (Generic element A.6.3)
Description: Production statement for the source document.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Producer -- Source Document, Copyright -- Source Document, Date of Production -- Source Document, Place of Production -- Source Document, Software Used in Production -- Source Document, Funding Agency -- Source Document, Grant Number -- Source Document


Producer -- Source Document
<producer> 1.4.3.1 (Generic element A.6.3.1)
Description: The producer of the source document is the person or organization with the financial or administrative responsibility for the physical processes whereby the source document was brought into existence. Use the role attribute to distinguish different stages of involvement in the production process, such as original producer.
Example:
<producer abbr='MNPoll' affiliation='Minneapolis Star Tibune Newspaper' role = 'original producer'>Star Tribune Minnesota Poll</producer>
<producer abbr='MRDC' affiliation='University of Minnesota' role = 'final production'>Machine Readable Data Center</producer>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Copyright -- Source Document
<copyright> 1.4.3.2 (Generic element A.6.3.2)
Description: Copyright statement for the source document.
Example:
<copyright>Copyright(c) ICPSR, 2000</copyright>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Production -- Source Document
<prodDate> 1.4.3.3 (Generic element A.6.3.3)
Description: Date the source document was produced (not distributed or archived). The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<prodDate date='1999-01-25'>January 25, 1999</prodDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Place of Production -- Source Document
<prodPlac> 1.4.3.4 (Generic element A.6.3.4)
Description: Address of the archive or agency that produced the source document.
Example:
<prodPlac>Ann Arbor, MI: Inter-university Consortium for Political and Social Research</prodPlac>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Software Used in Production -- Source Document
<software> 1.4.3.5 (Generic element A.6.3.5)
Description: Identifies the software used in creating or storing the source document. A "version" attribute permits specification of the software version number. The "date" attribute is provided to enable specification of the date (if any) for the software release. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<software version='4.0'>PageMaker</software>

Optional
Repeatable
Attributes: ID, xml:lang, source, version, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Funding Agency -- Source Document
<fundAg> 1.4.3.6 (Generic element A.6.3.6)
Description: The source(s) of funds for production of the source document. If different funding agencies sponsored different stages of the production process, use the role attribute to distinguish them.
Example:
<fundAg abbr='NSF'>National Science Foundation</fundAg>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Grant Number -- Source Document
<grantNo> 1.4.3.7 (Generic element A.6.3.7)
Description: The grant/contract number of the project that sponsored the documentation effort. If more than one, indicate the appropriate agency using the "agency" attribute. If different funding agencies sponsored different stages of the production process, use the role attribute to distinguish the grant numbers.
Example:
<grantNo agency='Bureau of Justice Statistics'>J-LEAA-018-77</grantNo>

Optional
Repeatable
Attributes: ID, xml:lang, source, agency, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Distributor Statement -- Source Document
<distStmt> 1.4.4 (Generic element A.6.4)
Description: Distribution statement for the source document.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Distributor -- Source Document, Contact Person -- Source Document, Depositor -- Source Document, Date of Deposit -- Source Document, Date of Distribution -- Source Document


Distributor -- Source Document
<distrbtr> 1.4.4.1 (Generic element A.6.4.1)
Description: The organization designated by the author or producer to generate copies of a particular source document including any necessary editions or revisions. Distributor of the source document. Names and addresses may be specified, and other archives may be co-distributors. A URI attribute is included to provide an URN or URL to the ordering service or download facility on a website.
Example:
<distrbtr abbr='ICPSR" affiliation='Institute for Social Research' URI='http://www.icpsr.umich.edu'>Ann Arbor, MI: Inter-university Consortium for Political and Social Research</distrbtr>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Contact Person -- Source Document
<contact> 1.4.4.2 (Generic element A.6.4.2)
Description: Names and addresses of individuals responsible for the source document. May be PIs. Individuals listed as contact persons will be used as resource persons regarding problems or questions raised by the user community. The URI attribute should be used to indicate a URN or URL for the homepage of the contact individual. The email attribute is used to indicate an email address for the contact individual.
Example:
<contact affiliation='University of Wisconsin' email='jsmith@uwisc.edu'>Jane Smith</contact>

Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation, URI, email
Contains: #PCDATA, Link to other element(s) within the codebook.


Depositor -- Source Document
<depositr> 1.4.4.3 (Generic element A.6.4.3)
Description: The name of the person (or institution) who provided this source document to the archive storing it.
Example:
<depositr abbr='BJS' affiliation='U.S. Department of Justice'>Bureau of Justice Statistics</depositr>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Deposit -- Source Document
<depDate> 1.4.4.4 (Generic element A.6.4.4)
Description: The date that the source document was deposited with the archive that originally received it. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<depDate date='1999-01-25'>January 25, 1999</depDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Distribution -- Source Document
<distDate> 1.4.4.5 (Generic element A.6.4.5)
Description: The date that the source document was released for distribution. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<distDate date='1999-01-25'>January 25, 1999</distDate>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Series Statement -- Source Document
<serStmt> 1.4.5 (Generic element A.6.5)
Description: Series statement for the source document. The URI attribute is provided to point to a central Internet repository of series information.

Optional
Not Repeatable
Attributes: ID, xml:lang, source, URI
Contains Elements: Series Name -- Source Document, Series Information -- Source Document


Series Name -- Source Document
<serName> 1.4.5.1 (Generic element A.6.5.1)
Description: The name of the data series to which the source document belongs. This will probably be the same as the Series Name for the study or data collection (2.1.5.1).
Example:
<serName abbr='CPS'>Current Population Survey Series</serName>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr
Contains: #PCDATA, Link to other element(s) within the codebook.


Series Information -- Source Document
<serInfo> 1.4.5.2 (Generic element A.6.5.2)
Description: Contains a history of the data series and a summary of those features that apply to the series as a whole. This will probably be the same as the Series Information for the study or data collection (2.1.5.2).
Example:
<serInfo>The Current Population Survey (CPS) is a household sample survey conducted monthly by the Census Bureau to provide estimates of employment, unemployment, and other charcteristics of the general labor force, estimates of the population as a whole, and estimates of various subgroups in the population. The entire non-institutionalized population of the United States is sampled to obtain the respondents for this survey series.</serInfo>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Statement -- Source Document
<verStmt> 1.4.6 (Generic element A.6.6)
Description: Version statement for the source document.

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Version -- Source Document, Version Responsibility Statement -- Source Document, Notes (Version) -- Source Document


Version -- Source Document
<version> 1.4.6.1 (Generic element A.6.6.1)
Description: Also known as release or edition. If there have been substantive changes in the source document since its creation, this statement should be used. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. Remarks: ICPSR distinguishes among the terms "release," "version," and "edition" in the following ways:
  • ICPSR Edition: Used only for intensively processed collections, for which ICPSR has produced a unique edition of the data. This usually involves checking for undocumented codes and consistency checks. Signals that additional intellectual effort has gone into producing the collection.
  • ICPSR Version: Used to indicate that ICPSR has revised the format of a collection or added components to it, in most cases without changing any data values. A study is considered an "ICPSR version" if one or more of these steps has been performed: (1) Converting software-specific system files or export/transport files to raw data; (2) Generating SAS and/or SPSS data definition statements; (3) Reformatting files, e.g., removing blanks to use space more efficiently; (4)Scanning hardcopy documentation; or (5)Reformatting machine-readable documentation, e.g., converting text created in a word-processing package to ASCII text.
  • Release: Used for data collections that are being disseminated exactly as they came from the data depositor (except for the addition of an ICPSR cover and ICPSR front matter).
Example:
<version type='edition' date='1999-01-25'>Second ICPSR Edition</version>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type (release, version, edition), date
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Responsibility Statement -- Source Document
<verResp> 1.4.6.2 (Generic element A.6.6.2)
Description: Used to indicate the organization or person responsible for the version of the source document.
Example:
<verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes (Version) -- Source Document
<notes> 1.4.6.3 (Generic element A.4)
Description: Used to indicate additional information regarding the version or the version responsibility statement, in particular to indicate what makes a new version different from its predecessor. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes resp='Jane Smith'>The source codebook was produced from original hardcopy materials using Optical Character Recognition (OCR).</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Bibliographic Citation -- Source Document
<biblCit> 1.4.7 (Generic element A.6.7)
Description: Complete bibliographic reference containing all of the standard elements of a citation that can be used to cite the source document. The "format" attribute is provided to enable specification of the particular citation style used, e.g. APA, MLA, Chicago, etc.
Example:
<biblCit format='MRDF'>Rabier, Jacques-Rene, and Ronald Inglehart. EURO-BAROMETER 11: YEAR OF THE CHILD IN EUROPE, APRIL 1979 [Computer file]. Conducted by Institut Francais D'Opinion Publique (IFOP), Paris, et al. ICPSR ed. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producer and distributor], 1981. </biblCit>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, format
Contains: #PCDATA, Link to other element(s) within the codebook.


Holdings Information -- Source Document
<holdings> 1.4.8 (Generic element A.6.8)
Description: Information concerning either the physical or electronic holdings of the cited work. Attributes include: location--The physical location where a copy is held; callno--The call number for a work at the location specified; and URI--A URN or URL for accessing the electronic copy of the cited work.
Example:
<holdings location='University of Michigan Graduate Library' callno='inap.' URI='http://www.umich.edu/library/'> Codebook for Current Population Survey, 1999: Annual Demographic File </holdings>

Optional
Repeatable
Attributes: ID, xml:lang, source, format, location, callno, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes -- Source Document
<notes> 1.4.9 (Generic element A.4)
Description: Used to indicate additional information about the source document. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes resp='Jane Smith'>A machine-readable version of the source codebook was supplied by the Zentralarchiv.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Notes -- Document Description
<notes> 1.5 (Generic element A.4)
Description: Used to indicate additional information about the document description as a whole. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes>This Document Description, or header information, can be used within an electronic resource discovery environment.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.

Study Description

Section 2.0 of the Data Documentation Initiative (DDI) DTD


Study Description's Place within the Document Structure

    Document
          
          |---Document Description
          |---STUDY DESCRIPTION
          |---Data Files Description
          |---Variable Description
          |---Other Study-Related Materials

Role of the Study Description

The Study Description consists of information about the data collection, study, or compilation that the DDI-compliant documentation file describes. This section includes information about how the study should be cited, who collected or compiled the data, who distributes the data, keywords about the content of the data, summary (abstract) of the content of the data, data collection methods and processing, etc. Note that some content of the Study Description's Citation -- e.g., Responsibility Statement -- may be identical to that of the Documentation Citation. This is usually the case when the producer of a data collection also produced the print or electronic codebook for that data collection.

Study Description

The access attribute is used to link to the Access Conditions element describing access and terms of use for the entire dataset.
Required
Repeatable
Attributes: ID, xml:lang, source, access
Contains Elements:
Citation (of Study)
Required
Repeatable
Attributes: ID, xml:lang, source

Study Scope
Optional
Repeatable
Attributes: ID, xml:lang, source

Methodology and Processing (Study Level)
Optional
Repeatable
Attributes: ID, xml:lang, source

Data Access
Optional
Repeatable
Attributes: ID, xml:lang, source

Other Study Description Materials
Optional
Repeatable
Attributes: ID, xml:lang, source


Citation

Section 2.1 of the Study Description (2.0)

of the Data Documentation Initiative (DDI) DTD


Citation's Place within the Study Description

    Document
          |
          |---Document Description
          |---Study Description
          |               |---CITATION
          |               |---Study Scope
          |               |---Methodology
          |               |---Data Access
          |               |---Other Study Description Materials
          |
          |---Data Files Description
          |---Variables Description
          |---Other Study-Related Materials


Citation -- Data Collection

<citation> 2.1 (Generic element A.6)
Description: Citation for the data collection described by the marked-up documentation. This element encodes the bibliographic information describing the data collection, including title information, statement of responsibility, production and distribution information, series and version information, text of a preferred bibliographic citation, and notes (if any). A MARCURI attribute is provided to link to the MARC record for this citation.
Optional
Not Repeatable
Attributes: ID, xml:lang, source, MARCURI
Contains Elements: Title Statement -- Data Collection, Responsibility Statement -- Data Collection Production Statement -- Data Collection, Distributor Statement -- Data Collection, Series Statement -- Data Collection, Version Statement -- Data Collection, Bibliographic Citation -- Data Collection, Holdings Information -- Data Collection, Notes (Citation) -- Data Collection


Title Statement -- Data Collection
<titlStmt> 2.1.1 (Generic element A.6.1)
Description: Title statement for the data collection..
Required
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Title -- Data Collection, Subtitle -- Data Collection, Alternative Title -- Data Collection, Parallel Title -- Data Collection, ID Number -- Data Collection


Title -- Data Collection
<titl> 2.1.1.1 (Generic element A.6.1.1)
Description: Contains the full authoritative title of the data collection. The data collection title will in most cases be identical to the title for the marked-up document (1.1.1.1) and the source document (1.4.1.1). A full title should indicate the geographic scope of the data collection as well as the time period covered.
Examples:
<titl>Domestic Violence Experience in Omaha, Nebraska, 1986-1987</titl>
<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<titl>Monitoring the Future: A Continuing Study of American Youth, 1995</titl>

Required
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Subtitle -- Data Collection
<subTitl> 2.1.1.2 (Generic element A.6.1.2)
Description: A subtitle is a secondary title used to amplify or state certain limitations of the main title. It may repeat information already in the main title.
Examples:
<titl>Monitoring the Future: A Continuing Study of American Youth, 1995</titl>
<subTitl>A Continuing Study of American Youth, 1995</subTitl>

<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<subTitl>Public Use Microdata Sample</subTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Alternative Title -- Data Collection
<altTitl> 2.1.1.3 (Generic element A.6.1.3)
Description: The alternative title may be the title by which a data collection is commonly referred to or it may be an abbreviation for the title.
Examples:
<titl>Census of Population, 1950 [United States]: Public Use Microdata Sample</titl>
<altTitl>PUMS</altTitl>

<titl>Equality of Educational Opportunity (Coleman) Study (EEOS), 1996</titl>
<altTitl>The Coleman Study</altTitl>
<altTitl>EEOS</altTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Parallel Title -- Data Collection
<parTitl> 2.1.1.4 (Generic element A.6.1.4)
Description: The title translated into another language.
Example:
<titl>Politbarometer West [Germany], Partial Accumulation, 1977-1995</titl>
<parTitl>Politbarometer, 1977-1995: Partielle Kumulation</parTitl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


ID Number -- Data Collection
<IDNo> 1.1.1.5 (Generic element A.6.1.5)
Description: Unique string or number (producer's or archive's number) for the data collection. An "agency" attribute is supplied.
Examples:
<IDNo agency='ICPSR'>6678</IDNo>
<IDNo agency='ZA'>2010</IDNo>

Optional
Repeatable
Attributes: ID, xml:lang, source, agency
Contains: #PCDATA, Link to other element(s) within the codebook.


Responsibility Statement -- Data Collection
<rspStmt> 1.1.2 (Generic element A.6.2)
Description: Responsibility for the data collection.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Authoring Entity / Primary Investigator -- Data Collection, Other Identifications / Acknowledgments -- Data Collection


Authoring Entity / Primary Investigator -- Data Collection
<AuthEnty> 1.1.2.1 (Generic element A.6.2.1)
Description: The person, corporate body, or agency responsible for the data collection's substantive and intellectual content. Repeat the element for each author, and use the affiliation attribute if available. Invert first and last name and use commas. Remarks: The author in this element should be the individual(s) or organization(s) directly responsible for the intellectual content of the data collection, as distinct from the person(s) or organization(s) responsible for the intellectual content of the marked-up document.

Examples:
<AuthEnty>United States Department of Commerce. Bureau of the Census</AuthEnty>
<AuthEnty affiliation='European Commission'>Rabier, Jacques-Rene</AuthEnty>

Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Other Identifications / Acknowledgments -- Data Collection
<othId> 2.1.2.2 (Generic element A.6.2.2)
Description: Statements of responsibility not recorded in the title and statement of responsibility areas. Indicate here the persons or bodies connected with the work, or significant persons or bodies connected with previous editions and not already named in the description. For example, the name of the person who cleaned the data collection might be cited here, using the role and affiliation attributes.
Example:
<othId role='processor' affiliation='INRA'>Jane Smith</othId>

Optional
Repeatable
Attributes: ID, xml:lang, source, type, role, affiliation
Contains: <p>, othId


Production Statement -- Source Document
<prodStmt> 2.1.3 (Generic element A.6.3)
Description: Production statement for the data collection.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Producer -- Data Collection, Copyright -- Data Collection, Date of Production -- Data Collection, Place of Production -- Data Collection, Software Used in Production -- Data Collection, Funding Agency -- Data Collection, Grant Number -- Data Collection


Producer -- Data Collection
<producer> 2.1.3.1 (Generic element A.6.3.1)
Description: The producer of the data collection is the person or organization with the financial or administrative responsibility for the physical processes whereby the data collection was brought into existence. Use the role attribute to distinguish different stages of involvement in the production process, such as original producer.
Example:
<producer abbr='ICPSR' affiliation='Institute for Social Research'>Inter-university Consortium for Political and Social Research</producer>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Copyright -- Data Collection
<copyright> 2.1.3.2 (Generic element A.6.3.2)
Description: Copyright statement for the data collection.
Example:
<copyright>Copyright(c) ICPSR, 2000</copyright>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Production -- Data Collection
<prodDate> 2.1.3.3 (Generic element A.6.3.3)
Description: Date the data collection was produced (not distributed or archived). The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<prodDate date='1998-07-21'>July 21, 1998</prodDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Place of Production -- Data Collection
<prodPlac> 2.1.3.4 (Generic element A.6.3.4)
Description: Address of the archive or agency that produced the data collection.
Example:
<prodPlac>Ann Arbor, MI: Inter-university Consortium for Political and Social Research</prodPlac>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Software Used in Production -- Data Collection
<software> 2.1.3.5 (Generic element A.6.3.5)
Description: Identifies the software used in creating or storing the data collection. A "version" attribute permits specification of the software version number. The "date" attribute is provided to enable specification of the date (if any) for the software release. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<software version='6.12'>SAS</software>

Optional
Repeatable
Attributes: ID, xml:lang, source, version, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Funding Agency -- Data Collection
<fundAg> 2.1.3.6 (Generic element A.6.3.6)
Description: The source(s) of funds for production of the data collection. If different funding agencies sponsored different stages of the production process, use the role attribute to distinguish them.
Example:
<fundAg abbr='NSF'>National Science Foundation</fundAg>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Grant Number -- Data Collection
<grantNo> 2.1.3.7 (Generic element A.6.3.7)
Description: The grant/contract number of the project that sponsored the data collection effort. If more than one, indicate the appropriate agency using the "agency" attribute. If different funding agencies sponsored different stages of the production process, use the role attribute to distinguish the grant numbers.
Example:
<grantNo agency='Bureau of Justice Statistics'>J-LEAA-018-77</grantNo>

Optional
Repeatable
Attributes: ID, xml:lang, source, agency, role
Contains: #PCDATA, Link to other element(s) within the codebook.


Distributor Statement -- Data Collection
<distStmt> 2.1.4 (Generic element A.6.4)
Description: Distribution statement for the data collection.

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Distributor -- Data Collection, Contact Person -- Data Collection, Depositor -- Data Collection, Date of Deposit -- Data Collection, Date of Distribution -- Data Collection


Distributor -- Data Collection
<distrbtr> 2.1.4.1 (Generic element A.6.4.1)
Description: The organization designated by the author or producer to generate copies of a particular data collection including any necessary editions or revisions. Names and addresses may be specified, and other archives may be co-distributors. A URI attribute is included to provide an URN or URL to the ordering service or download facility on a website.
Example:
<distrbtr abbr='ICPSR" affiliation='Institute for Social Research' URI='http://www.icpsr.umich.edu'>Ann Arbor, MI: Inter-university Consortium for Political and Social Research</distrbtr>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Contact Person -- Data Collection
<contact> 2.1.4.2 (Generic element A.6.4.2)
Description: Names and addresses of individuals responsible for the data collection. May be PIs. Individuals listed as contact persons will be used as resource persons regarding problems or questions raised by the user community. The URI attribute should be used to indicate a URN or URL for the homepage of the contact individual. The email attribute is used to indicate an email address for the contact individual.
Example:
<contact affiliation='University of Wisconsin' email="jsmith@...'>Jane Smith</contact>

Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation, URI, email
Contains: #PCDATA, Link to other element(s) within the codebook.


Depositor -- Data Collection
<depositr> 2.1.4.3 (Generic element A.6.4.3)
Description: The name of the person (or institution) who provided this data collection to the archive storing it.
Example:
<depositr abbr='BJS' affiliation='U.S. Department of Justice'>Bureau of Justice Statistics</depositr>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Deposit -- Data Collection
<depDate> 2.1.4.4 (Generic element A.6.4.4)
Description: The date that the data collection was deposited with the archive that originally received it. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<depDate date='1999-01-25'>January 25, 1999</depDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Distribution -- Data Collection
<distDate> 2.1.4.5 (Generic element A.6.4.5)
Description: The date that the data collection was released for distribution. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<distDate date='1999-01-25'>January 25, 1999</distDate>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Series Statement -- Data Collection
<serStmt> 2.1.5 (Generic element A.6.5)
Description: Series statement for the data collection. The URI attribute is provided to point to a central Internet repository of series information.

Optional
Not Repeatable
Attributes: ID, xml:lang, source, URI
Contains Elements: Series Name -- Data Collection, Series Information -- Data Collection


Series Name -- Data Collection
<serName> 2.1.5.1 (Generic element A.6.5.1)
Description: The name of the data series to which the collection belongs
Example:
<serName abbr='CPS'>Current Population Survey Series</serName>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr
Contains: #PCDATA, Link to other element(s) within the codebook.


Series Information -- Data Collection
<serInfo> 2.1.5.2 (Generic element A.6.5.2)
Description: Contains a history of the data series and a summary of those features that apply to the data series as a whole.
Example:
<serInfo>The Current Population Survey (CPS) is a household sample survey conducted monthly by the Census Bureau to provide estimates of employment, unemployment, and other characteristics of the general labor force, estimates of the population as a whole, and estimates of various subgroups in the population. The entire non-institutionalized population of the United States is sampled to obtain the respondents for this survey series.</serInfo>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Statement -- Data Collection
<verStmt> 2.1.6 (Generic element A.6.6)
Description: Version statement for the data collection.

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Version -- Data Collection, Version Responsibility Statement -- Data Collection, Notes (Version) -- Data Collection


Version -- Data Collection
<version> 2.1.6.1 (Generic element A.6.6.1)
Description: Also known as release or edition. If there have been substantive changes in the data collection since its creation, this statement should be used. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. Remarks: ICPSR distinguishes among the terms "release," "version," and "edition" in the following ways:
  • ICPSR Edition: Used only for intensively processed collections, for which ICPSR has produced a unique edition of the data. This usually involves checking for undocumented codes and consistency checks. Signals that additional intellectual effort has gone into producing the collection.
  • ICPSR Version: Used to indicate that ICPSR has revised the format of a collection or added components to it, in most cases without changing any data values. A study is considered an "ICPSR version" if one or more of these steps has been performed: (1) Converting software-specific system files or export/transport files to raw data; (2) Generating SAS and/or SPSS data definition statements; (3) Reformatting files, e.g., removing blanks to use space more efficiently; (4)Scanning hardcopy documentation; or (5)Reformatting machine-readable documentation, e.g., converting text created in a word-processing package to ASCII text.
  • Release: Used for data collections that are being disseminated exactly as they came from the data depositor (except for the addition of an ICPSR cover and ICPSR front matter).
Example:
<version type='edition' date='1999-01-25'>Second ICPSR Edition</version>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type (release, version, edition), date
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Responsibility Statement -- Data Collection
<verResp> 2.1.6.2 (Generic element A.6.6.2)
Description: Used to indicate the organization or person responsible for the version of the data collection.
Example:
<verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes (Version) -- Data Collection
<notes> 2.1.6.3 (Generic element A.6.6.3)
Description: Used to indicate additional information regarding the version or the version responsibility statement for the data collection, in particular to indicate what makes a new version differnt from its predecessor. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes resp='Jane Smith'>Data for 1998 have been added to this version of the data collection.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Bibliographic Citation -- Data Collection
<biblCit format='MRDF'> 2.1.7 (Generic element A.6.7)
Description: Complete bibliographic reference containing all of the standard elements of a citation that can be used to cite the data collection. The "format" attribute is provided to enable specification of the particular citation style used, e.g. APA, MLA, Chicago, etc.
Example:
<biblCit>Rabier, Jacques-Rene, and Ronald Inglehart. EURO-BAROMETER 11: YEAR OF THE CHILD IN EUROPE, APRIL 1979 [Computer file]. Conducted by Institut Francais D'Opinion Publique (IFOP), Paris, et al. ICPSR ed. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [producer and distributor], 1981. </biblCit>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, format
Contains: #PCDATA, Link to other element(s) within the codebook.


Holdings Information -- Data Collection
<holdings> 2.1.8 (Generic element A.6.8)
Description: Information concerning either the physical or electronic holdings of the cited work. Attributes include: location--The physical location where a copy is held; callno--The call number for a work at the location specified; and URI--A URN or URL for accessing the electronic copy of the cited work.
Example:
<holdings location='University of Michigan Graduate Library callno='inap.' URI='http://www.umich.edu/library/'> Data File for Current Population Survey, 1999: Annual Demographic File</holdings>

Optional
Repeatable
Attributes: ID, xml:lang, source, location, callno, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes (Citation) -- Data Collection
<notes> 2.1.9 (Generic element A.4)
Description: Used to indicate additional information regarding the citation for the data collection. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes resp='Jane Smith'>This citation was sent to ICPSR by the agency depositing the data.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Study Scope

Section 2.2 of the Study Description (2.0)

of the Data Documentation Initiative (DDI) DTD


Study Scope's Place within the Document Structure

    Document
          |
          |---Document Description
          |---Study Description
          |               |---Citation
          |               |---STUDY SCOPE
          |               |---Methodology And Processing (Study Level)
          |               |---Data Access
          |               |---Other Study Description Materials (Encoder-defined)
          |
          |---Data Files Description
          |---Variable Description
          |---Other Study-Related Materials


To comply with the Dublin Core, it is recommended that the following elements in the Study Scope section be used when the appropriate information is available:

DUBLIN CORE    DDI
------------------

Subject        2.2.1.1 keyword (Keywords)
               2.2.1.2 topcClas (Topic Classification) 

Description    2.2.2 abstract (Abstract)

Coverage       2.2.3.1 timePrd (Time Period Covered)
               2.2.3.2 collDate (Date of Collection)
               2.2.3.3 nation (Country)
               2.2.3.4 geogCover (Geographic Coverage)

Study Scope
<stdyInfo> 2.2
Description: This section contains information about the data collection's scope across several dimensions, including substantive content, geography, and time.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: Subject Information, Abstract, Summary Data Description, Notes


Subject Information
<subject> 2.2.1
Description: Subject information describing the data collection's intellectual content.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: Keyword, Topic Classification


Keyword
<keyword> 2.2.1.1
Description: Words or phrases that describe salient aspects of a data collection's content. Can be used for building keyword indexes and for classification and retrieval purposes. A controlled vocabulary can be employed. Maps to Dublin Core Subject. The vocab attribute is provided for specification of the controlled vocabulary in use, e.g., LCSH, MeSH, etc. The vocabURI attribute specifies the location for the full controlled vocabulary.
Examples:
<keyword>quality of life</keyword>
<keyword>family</keyword>
<keyword>career goals</keyword>

Optional
Repeatable
Attributes: ID, xml:lang, source, vocab, vocabURI
Contains: #PCDATA, Link to other element(s) within the codebook.


Topic Classification
<topcClas> 2.2.1.2
Description: The classification field indicates the broad substantive topic(s) that the data cover. Library of Congress subject terms may be used here. The vocab attribute is provided for specification of the controlled vocabulary in use, e.g., LCSH, MeSH, etc. The vocabURI attribute specifies the location for the full controlled vocabulary. Maps to Dublin Core Subject.
Examples:
<topcClas ICPSR Subject Headings>Mass Political Behavior and Attitudes</topcClas>
<topcClas ICPSR Subject Headings>Social Indicators</topcClas>
<topcClas vocab='LOC Subject Headings'>Public opinion -- California -- Statistics</topcClas>
<topcClas vocab='LOC Subject Headings'>Elections -- California</topcClas>

Optional
Repeatable
Attributes: ID, xml:lang, source, vocab, vocabURI
Contains: #PCDATA, Link to other element(s) within the codebook.


Abstract
<abstract> 2.2.2
Description: An unformatted summary describing the purpose, nature, and scope of the data collection, special characteristics of its contents, major subject areas covered, and what questions the PIs attempted to answer when they conducted the study. A listing of major variables in the study is important here. In cases where a codebook contains more than one abstract (for example, one might be supplied by the data producer and another prepared by the data archive where the data are deposited), the source and date attributes may be used to distinguish the abstract versions. Maps to Dublin Core Description. Inclusion of this element is recommended. Date attribute should follow ISO convention of YYYY-MM-DD.
Example:
<abstract date = '1999-01-28' source='ICPSR'> Data on labor force activity for the week prior to the survey are supplied in this collection. Information is available on the employment status, occupation, and industry of persons 15 years old and over. Demographic variables such as age, sex, race, marital status, veteran status, household relationship, educational background, and Hispanic origin are included. In addition to providing these core data, the May survey also contains a supplement on work schedules for all applicable persons aged 15 years and older who were employed at the time of the survey. This supplement focuses on shift work, flexible hours, and work at home for both main and second jobs.</abstract>

Optional
Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Summary Data Description
<sumDscr> 2.2.3
Description: Information about a study's chronological and geographic coverage and unit of analysis.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: Time Period Covered, Date of Collection, Country, Geographic Coverage, Geographic Unit, Unit of Analysis, Universe, Kind of Data


Time Period Covered
<timePrd> 2.2.3.1
Description: The time period to which the data refer. This item reflects the time period covered by the data, not the dates of coding or making documents machine-readable or the dates the data were collected. Also known as span. Use the event attribute to specify "start", "end", or "single" for each date entered. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. Maps to Dublin Core Coverage. Inclusion of this element is recommended.
Examples:
<timePrd event='start' date='1998-05-01'>May 1, 1998</timePrd>
<timePrd event='end' 'date=1998-05-31'>May 31, 1998</timePrd>

Optional
Repeatable
Attributes: ID, xml:lang, source, event, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Date of Collection
<collDate> 2.2.3.2
Contains the date(s) when the data were collected. Use the event attribute to specify "start", "end", or "single" for each date entered to distinguish between, for example, the first day of collection (start), only day of collection (single), and last day of collection (end). The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. Maps to Dublin Core Coverage. Inclusion of this element in the codebook is recommended.
Example:
<collDate event='single' date='1998-11-10'>10 November 1998</collDate>

Optional
Repeatable
Attributes: ID, xml:lang, source, event, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Country
<nation> 2.2.3.3
Description: Indicates the country or countries covered in the file. Attribute "abbr" may be used to match the attributes given to agencies, etc. and to provide an equivalent to the TEI placePart entity, which adds "type" and "full" attributes. Maps to Dublin Core Coverage. Inclusion of this element is recommended.
Example:
<nation abbr='U.K.'>United Kingdom</nation>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr
Contains: #PCDATA, Link to other element(s) within the codebook.


Geographic Coverage
<geogCover> 2.2.3.4
Information on the geographic coverage of the data. Include the total geographic scope of the data, and any additional levels of geographic coding provided in the variables. Maps to Dublin Core Coverage. Inclusion of this element is recommended.
Example:
<geogCover>State of California</geogCover>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains #PCDATA.


Geographic Unit
<geogUnit> 2.2.3.5
Description: Lowest level of geographic aggregation covered by the data.
Example:
<geogUnit>state</geogUnit>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Unit of Analysis
<anlyUnit> 2.2.3.6
Description: Basic unit of analysis or observation that the file describes: individuals, families/households, groups, institutions/organizations, administrative units, etc. The "unit" attribute is included to permit the development of a controlled vocabulary for this element.
Example:
<anlyUnit>individuals</anlyUnit>
Optional
Repeatable
Attributes: ID, xml:lang, source, unit
Contains: #PCDATA, Link to other element(s) within the codebook.


Universe
<universe> 2.2.3.7
Description: A description of the population covered by the data in the file; the group of persons or other elements that are the object of the study and to which the study results refer. Age, nationality, and residence commonly help to delineate a given universe, but any of a number of factors may be involved, such as age limits, sex, marital status, race, ethnic group, nationality, income, veteran status, criminal convictions, etc. The universe may consist of elements other than persons, such as housing units, court cases, deaths, countries, etc. In general, it should be possible to tell from the description of the universe whether a given individual or element (hypothetical or real) is a member of the population under study. Also known as universe of interest, population of interest, and target population. A "level" attribute is included to permit coding of the level to which universe applies, i.e., the study level, the file level (if different from study), or the variable level. The "clusion" attribute provides for specification of groups included (I) in or excluded (E) from the universe.
Example:
For a universe that excludes persons living in institutions or military barracks:
<universe level='study' clusion='I'>The resident population of the United States.</universe>
<universe level='study' clusion='E'>Persons living in institutions and military barracks.</universe>

Optional
Repeatable
Attributes: ID, xml:lang, source, level, clusion
Contains: #PCDATA, Link to other element(s) within the codebook.


Kind of Data
<dataKind> 2.2.3.8
Description: The type of data included in the file: survey data, census/enumeration data, aggregate data, clinical data, event/transaction data, program source code, machine-readable text, administrative records data, experimental data, psychological test, textual data, coded textual, coded documents, time budget diaries, observation data/ratings, process-produced data, etc.
Example:
<dataKind>survey data</dataKind>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 2.2.4 (Generic element A.4)
Description: Used to indicate additional information regarding the scope of a data collection. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes>Data on employment and income refer to the preceding year, although demographic data refer to the time of the survey.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Study Level Methodology and Processing

Section 2.3 of the Study Description (2.0) Section

of the Data Documentation Initiative (DDI) DTD


Methodology and Processing's Place within the Document Structure

    Document
          |
          |---Document Description
          |---Study Description
          |               |
          |               |---Citation
          |               |---Study Scope
          |               |---METHODOLOGY AND PROCESSING 
          |               |---Data Access
          |               |---Other Study Description Materials
          |
          |---Data Files Description
          |---Variable Description
          |---Other Study-Related Materials

Methodology and Processing
<method> 2.3
Description: This section describes the methodology and processing involved in a data collection.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: Data Collection Methodology, Notes, Data Appraisal, Study Status


Data Collection Methodology
<dataColl> 2.3.1
Description: Information about the methodology employed in a data collection.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: Time Method, Data Collector, Frequency, Sampling Procedure, Major Deviations from Sample Design, Mode of Data Collection, Type of Research Instrument, Sources Statement, Characteristics of the Data Collection Situation, Actions to Minimize Losses, Control Operations, Weighting, Cleaning Operations


Time Method
<timeMeth> 2.3.1.1
The time method or time dimension of the data collection. The "method" attribute is included to permit the development of a controlled vocabulary for this element.
Examples:
<timeMeth>panel survey</timeMeth>
<timeMeth>cross-section</timeMeth>
<timeMeth>trend study</timeMeth>
<timeMeth>time-series</timeMeth>

Optional
Repeatable
Attributes: ID, xml:lang, source, method
Contains: #PCDATA, Link to other element(s) within the codebook.


Data Collector
<dataCollector> 2.3.1.2
Description: The entity (individual, agency, or institution) responsible for administering the questionnaire or interview or compiling the data. This refers to the entity collecting the data, not to the entity producing the documentation.
Example:
<dataCollector abbr='SRC' affil='University of Michigan'>Survey Research Center</dataCollector>

Optional
Repeatable
Attributes: ID, xml:lang, source, abbr, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Frequency of Data Collection
<frequenc> 2.3.1.3
Description: If the data collected include more than one point in time, indicate the frequency with which the data were collected. The "frequency" attribute is included to permit the development of a controlled vocabulary for this element.
Examples:
<frequenc>monthly</frequenc>
<frequenc>quarterly</frequenc>

Optional
Repeatable
Attributes: ID, xml:lang, source, freq
Contains: #PCDATA, Link to other element(s) within the codebook.


Sampling Procedure
<sampProc> 2.3.1.4
Description: The type of sample and sample design used to select the survey respondents to represent the population. May include reference to the target sample size and the sampling fraction.
Examples:
<sampProc>National multistage area probability sample</sampProc>
<sampProc>Simple random sample</sampProc>
<sampProc>Stratified random sample</sampProc>
<sampProc>Quota sample</sampProc>
<sampProc>The 8,450 women interviewed for the NSFG, Cycle IV, were drawn from households in which someone had been interviewed for the National Health Interview Survey (NHIS), between October 1985 and March 1987.</sampProc>
<sampProc>Samples sufficient to produce approximately 2,000 families with completed interviews were drawn in each state. Families containing one or more Medicaid or uninsured persons were oversampled.</sampProc>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.
Major Deviations from the Sample Design
<deviat> 2.3.1.5
Description: Show correspondence as well as discrepancies between the sampled units (obtained) and available statistics for the population (age, sex-ratio, marital status, etc.) as a whole.
Example:
<deviat>The suitability of Ohio as a research site reflected its similarity to the United States as a whole. The evidence extended by Tuchfarber (1988) shows that Ohio is representative of the United States in several ways: percent urban and rural, percent of the population that is African-American, median age, per capita income, percent living below the poverty level, and unemployment rate. Although results generated from an Ohio sample are not empirically generalizable to the United States, they may be suggestive of what might be expected nationally.</deviat>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Mode of Data Collection
<collMode> 2.3.1.6
Description: The method used to collect the data; instrumentation characteristics.
Examples:
<collMode>telephone interviews</collMode>
<collMode>face-to-face interviews</collMode>
<collMode>mail questionnaires</collMode>
<collMode>computer-aided telephone interviews (CATI)</collMode>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Type of Research Instrument
<resInstru> 2.3.1.7
Description: The type of data collection instrument used. "Structured" indicates an instrument in which all respondents are asked the same questions/tests, possibly with precoded answers. If a small portion of such a questionnaire includes open-ended questions, provide appropriate comments. "Semi-structured" indicates that the research instrument contains mainly open-ended questions. "Unstructured" indicates that in-depth interviews were conducted. The "type" attribute is included to permit the development of a controlled vocabulary for this element.
Example:
<resInstru>structured</resInstru>

Optional
Repeatable
Attributes: ID, xml:lang, source, type
Contains: #PCDATA, Link to other element(s) within the codebook.


Sources Statement
<sources> 2.3.1.8
Description of sources used for the data collection.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA or Data Sources, Origins of Sources, Characteristics of Sources Noted, Documentation/Access to Sources


Data Sources
<dataSrc> 2.3.1.8.1
Description: Used to list the book(s), article(s), serial(s), and/or machine-readable data file(s)--if any--that served as the source(s) of the data collection.
Examples:
<dataSrc> ''Voting Scores.'' CONGRESSIONAL QUARTERLY ALMANAC 33 (1977), 487-498.</dataSrc>
<dataSrc>United States Internal Revenue Service Quarterly Payroll File</dataSrc>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Origins of Sources
<srcOrig> 2.3.1.8.2
Description: For historical materials, information about the origin(s) of the sources and the rules followed in establishing the sources should be specified. May not be relevant to survey data.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Characteristics of Sources Noted
<srcChar> 2.3.1.8.3
Description: Assessment of characteristics and quality of source material. May not be relevant to survey data.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Documentation/Access to Sources
<srcDocu> 2.3.1.8.4
Description: Level of documentation of the original sources. May not be relevant to survey data.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Sources
<sources> 2.3.1.8.5
No element or attribute declaration here, as this element is simply a recursive declaration within sources 2.3.1.8.


Characteristics of the Data Collection Situation
<collSitu> 2.3.1.9
Description: Used to describe noteworthy aspects of the data collection situation. Include information on factors such as cooperativeness of respondents, duration of interviews, number of call-backs, etc.
Example:
<collSitu>There were 1,194 respondents who answered questions in face-to-face interviews lasting approximately 75 minutes each.</collSitu>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Actions to Minimize Losses
<actMin> 2.3.1.10
Description: Summary of actions taken to minimize data loss. Include information on actions such as follow-up visits, supervisory checks, historical matching, estimation, etc.
Example:
<actMin>To minimize the number of unresolved cases and reduce the potential nonresponse bias, four follow-up contacts were made with agencies that had not responded by various stages of the data collection process.</actMin>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Control Operations
<ConOps> 2.3.1.11
Description: Methods to facilitate data control performed by the primary investigator or by the data archive. Sepcify any special programs used for such operations. The "agency" attribute maybe used to refer to the agency that performed the control operation.
Example:
<ConOps source='ICPSR'>Ten percent of data entry forms were reentered to check for accuracy.</ConOps>

Optional
Repeatable
Attributes: ID, xml:lang, source,agency
Contains: #PCDATA, Link to other element(s) within the codebook.


Weighting
<weight> 2.3.1.12
Description: The use of sampling procedures may make it necessary to apply weights to produce accurate statistical results. Describe here the criteria for using weights in analysis of a collection. If a weighting formula or coefficient was developed, provide this formula, define its elements, and indicate how the formula is applied to data.
Example:
<weight>The 1996 NES dataset includes two final person-level analysis weights which incorporate sampling, nonresponse, and post-stratification factors. One weight (variable #4) is for longitudinal micro-level analysis using the 1996 NES Panel. The other weight (variable #3) is for analysis of the 1996 NES combined sample (Panel component cases plus Cross-section supplement cases). In addition, a Time Series Weight (variable #5) which corrects for Panel attrition was constructed. This weight should be used in analyses which compare the 1996 NES to earlier unweighted National Election Study data collections.</weight>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Cleaning Operations
<cleanOps> 2.3.1.13
Description: Methods used to "clean" the data collection, e.g., consistency checking, wildcode checking, etc.
Example:
<cleanOps>Checks for undocumented codes were performed, and data were subsequently revised in consultation with the principal investigator.</cleanOps>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 2.3.2 (Generic Element A.4)
Description: Used to indicate additional information about the methodology and processing involved in a collection. Include error notes here. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes>Undocumented codes were found in this data collection. Missing data are represented by blanks.</notes>
<notes>For this collection, which focuses on employment, unemployment, and gender equality, data from EUROBAROMETER 44.3: HEALTH CARE ISSUES AND PUBLIC SECURITY, FEBRUARY-APRIL 1996 (ICPSR 6752) were merged with an oversample.</notes>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Data Appraisal Information
<anlyInfo> 2.3.3
Description: Information on data appraisal.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: Response Rates, Estimates of Sampling Error, Other Forms of Data Appraisal


Response Rates
<respRate> 2.3.3.1
Description: The percentage of sample members who provided information.
Examples:
<respRate>For 1993, the estimated inclusion rate for TEDS-eligible providers was 91 percent, with the inclusion rate for all treatment providers estimated at 76 percent (including privately and publicly funded providers).</respRate>
<respRate>The overall response rate was 82%, although retail firms with an annual sales volume of more than $5,000,000 were somewhat less likely to respond.</respRate>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Estimates of Sampling Error
<EstSmpErr> 2.3.3.2
Description: Measure of how precisely one can estimate a population value from a given sample.
Example:
<EstSmpErr> To assist NES analysts, the PC SUDAAN program was used to compute sampling errors for a wide-ranging example set of proportions estimated from the 1996 NES Pre-election Survey dataset. For each estimate, sampling errors were computed for the total sample and for twenty demographic and political affiliation subclasses of the 1996 NES Pre-election Survey sample. The results of these sampling error computations were then summarized and translated into the general usage sampling error table provided in Table 11. The mean value of deft, the square root of the design effect, was found to be 1.346. The design effect was primarily due to weighting effects (Kish, 1965) and did not vary significantly by subclass size. Therefore the generalized variance table is produced by multiplying the simple random sampling standard error for each proportion and sample size by the average deft for the set of sampling error computations.</EstSmpErr>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Other Forms of Data Appraisal
<dataAppr> 2.3.3.3
Description: Other issues pertaining to data appraisal. Describe here issues such as response variance, nonresponse rate and testing for bias, interviewer and response bias, confidence levels, question bias, etc.
Examples:
<dataAppr>These data files were obtained from the United States House of Representatives, who received them from the Census Bureau accompanied by the following caveats: ''The numbers contained herein are not official 1990 decennial Census counts. The numbers represent estimates of the population based on a statistical adjustment method applied to the official 1990 Census figures using a sample survey intended to measure overcount or undercount in the Census results. On July 15, 1991, the Secretary of Commerce decided not to adjust the official 1990 decennial Census counts (see 56 Fed. Reg. 33582, July 22, 1991). In reaching his decision, the Secretary determined that there was not sufficient evidence that the adjustment method accurately distributed the population across and within states. The numbers contained in these tapes, which had to be produced prior to the Secretary's decision, are now known to be biased. Moreover, the tapes do not satisfy standards for the publication of Federal statistics, as established in Statistical Policy Directive No. 2, 1978, Office of Federal Statistical Policy and Standards. Accordingly, the Department of Commerce deems that these numbers cannot be used for any purpose that legally requires use of data from the decennial Census and assumes no responsibility for the accuracy of the data for any purpose whatsoever. The Department will provide no assistance in interpretation or use of these numbers.''</dataApp>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Class or Status of the Study
<stdyClas> 2.3.4
Description: Generally used to give the data archive's class or study status number, which indicates the processing status of the study. May also be used as a text field to describe processing status.
Examples:
<stdyClas>ICPSR Class II</stdyClas>
<stdyClas>DDA Class C</stdyClas>
<stdyClas>Available from the DDA. Being processed. </stdyClas>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type
Contains: #PCDATA, Link to other element(s) within the codebook.


Data Access

Section 2.4 of the Study Description (2.0)

of the Data Documentation Initiative (DDI) DTD


Data Access's Place within the Document Structure

    Document
          |
          |---Document Description
          |---Study Description
          |               |
          |               |---Citation
          |               |---Study Scope
          |               |---Methodology and Processing 
          |               |---DATA ACCESS
          |               |---Other Study Description Materials
          |
          |---Data Files Description
          |---Variable Description
          |---Other Study-Related Materials

Data Access

<dataAccs> 2.4
This section describes access conditions and terms of use for the data collection. In cases where access conditions differ across individual files or variables, multiple access conditions can be specified. The access conditions applying to a study, file, variable group, or variable can be indicated by an IDREF attribute on the study (2.0), file (3.0), variable group (4.1), or variable (4.2) elements called "access".
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Data Collection Availability, Data Use Statement, Notes


Data Collection Availability
<setAvail> 2.4.1
Information on availability and storage of the collection. The "media" attribute may be used in combination with any of the subelements. See Location of Data Collection below.
Optional
Repeatable
Attributes: ID, xml:lang, source, media
Contains Elements: Location of Data Collection, Original Archive Where Collection Stored Availability Status, Extent of Collection, Completeness of Collection Stored, Number of Files, Notes


Location of Data Collection
<accsPlac> 2.4.1.1
Location where the data collection is currently stored. Use the URI attribute to provide a URN or URL for the storage site or the actual address from which the data may be downloaded.
Examples:
<setAvail media='CDROM'>
<accsPlac URL='http://www.icpsr.umich.edu'>Inter-university Consortium for Political and Social Research</accsPlac>

</setAvail>
<setAvail media='online'>
<accsPlac URL='http://www.ssd.gu.se/'>Swedish Social Science Data Service </accsPlac>
</setAvail>

Optional
Repeatable
Attributes: ID, xml:lang, source, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Original Archive Where Collection Stored
<origArch> 2.4.1.2
Archive from which the data collection was obtained; the originating archive.
Example:
<origArch>Zentralarchiv fuer empirische Sozialforschung</origArch>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Availability Status
<avlStatus> 2.4.1.3
Statement of collection availability. An archive may need to indicate that a collection is unavailable because it is embargoed for a period of time, because it has been superseded, because a new edition is imminent, etc. It is anticipated that a controlled vocabulary will be developed for this element.
Example:
<avlStatus>This collection is superseded by CENSUS OF POPULATION, 1880 [UNITED STATES]: PUBLIC USE SAMPLE (ICPSR 6460).</avlStatus>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Extent of Collection
<collSize> 2.4.1.4
Summarizes the number of physical files that exist in a collection, recording the number of files that contain data and noting whether the collection contains machine-readable documentation and/or other supplementary files and information such as data dictionaries, data definition statements, or data collection instruments.
Example:
<collSize>1 data file + machine-readable documentation (PDF) + SAS data definition statements</collSize>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Completeness of Collection Stored
<complete> 2.4.1.5
This item indicates the relationship of the data collected to the amount of data coded and stored in the data collection. Information as to why certain items of collected information were not included in the data file stored by the archive should be provided.
Example:
<complete>Because of embargo provisions, data values for some variables have been masked. Users should consult the data definition statements to see which variables are under embargo. A new version of the collection will be released by ICPSR after embargoes are lifted.</complete>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Number of Files
<fileQnty> 2.4.1.6
Total number of physical files associated with a collection.
Example:
<fileQnty> 5 files</fileQnty>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 2.4.1.7 (Generic element A.4)
Indicate additional information regarding data availability. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes> Data from the Bureau of Labor Statistics used in the analyses for the final report are not provided as part of this collection.</notes>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.
Data Use Statement
<useStmt> 2.4.2
Information on terms of use for the data collection.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Confidentiality Declaration, Special Permissions, Restrictions, Access Authority, Citation Requirement, Deposit Requirement, Access Conditions , Disclaimer


Confidentiality Declaration
<confDec> 2.4.2.1
This element is used to determine if signing of a confidentiality declaration is needed to access a resource. The "required" attribute is used to aid machine processing of this element, and the default specification is "yes". The "formNo" attribute indicates the number or ID of the form that the user must fill out. The "URI" attribute may be used to provide a URN or URL for online access to a confidentiality declaration form.
Examples:
<confDec formNo='1'>To download this dataset, the user must sign a declaration of confidentiality.</confDec>

<confDec URI='http://www.icpsr.umich.edu/HMCA/CTSform/contents.html'> To obtain this dataset, the user must complete a Restricted Data Use Agreement.</confDec>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, required, formNo, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Special Permissions
<specPerm> 2.4.2.2
This element is used to determine if any special permissions are required to access a resource. The "required" attribute is used to aid machine processing of this element, and the default specification is "yes". The "formNo" attribute indicates the number or ID of the form that the user must fill out. The "URI" attribute may be used to provide a URN or URL for online access to a special permissions form.
Example:
<specPerm formNo='4'>The user must apply for special permission to use this dataset locally and must complete a confidentiality form.</specPerm>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, required, formNo, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Restrictions
<restrctn> 2.4.2.3
Any restrictions on access to or use of the collection such as privacy certification or distribution restrictions should be indicated here. These can be restrictions applied by the author, producer, or disseminator of the data collection. If the data are restricted to only a certain class of user, specify which type.
Examples:
<restrctn> In preparing the data file(s) for this collection, the National Center for Health Statistics (NCHS) has removed direct identifiers and characteristics that might lead to identification of data subjects. As an additional precaution NCHS requires, under Section 308(d) of the Public Health Service Act (42 U.S.C. 242m), that data collected by NCHS not be used for any purpose other than statistical analysis and reporting. NCHS further requires that analysts not use the data to learn the identity of any persons or establishments and that the director of NCHS be notified if any identities are inadvertently discovered. ICPSR member institutions and other users ordering data from ICPSR are expected to adhere to these restrictions.</restrctn>

<restrctn> ICPSR obtained these data from the World Bank under the terms of a contract which states that the data are for the sole use of ICPSR and may not be sold or provided to third parties outside of ICPSR membership. Individuals at institutions that are not members of the ICPSR may obtain these data directly from the World Bank.</restrctn>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Access Authority
<contact> 2.4.2.4 (Generic element A.6.4.2)
Contact person or organization (with full address and telephone number, if available) that controls access to a collection, if different from the data distributor. The "URI" attribute should be used to indicate a URN or URL for the homepage of the contact individual. Similarly, the "email" attribute is used to indicate an email address for the contact individual.
Example:
<contact affil='University of Copenhagen' URI='http://www.etc.' email='smith@etc.'>The data are available from the principal investigators, Dr. Smith and Dr. Jones, at the Sociological Institute, Linnesgade 22, 4. DK-1361 Copenhagen K.</contact>
Optional
Repeatable
Attributes: ID, xml:lang, source, affiliation, URI, email
Contains: #PCDATA, Link to other element(s) within the codebook.


Citation Requirement
<citReq> 2.4.2.5
Text of requirement that a data collection should be cited properly in articles or other publications that are based on analysis of the data.
Example:
<citReq>Publications based on ICPSR data collections should acknowledge those sources by means of bibliographic citations. To ensure that such source attributions are captured for social science bibliographic utilities, citations must appear in footnotes or in the reference section of publications.</citReq>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Deposit Requirement
<deposReq> 2.4.2.6
Information regarding user responsibility for informing archives of their use of data through providing citations to the published work or providing copies of the manuscripts.
Example:
<deposReq> To provide funding agencies with essential information about use of archival resources and to facilitate the exchange of information about ICPSR participants' research activities, users of ICPSR data are requested to send to ICPSR bibliographic citations for, or copies of, each completed manuscript or thesis abstract. Please indicate in a cover letter which data were used.</deposReq>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Access Conditions
<conditions> 2.4.2.7
Indicate any additional information that will assist the user in understanding the access conditions of the data collection.
Example:
<conditions>The data are available without restriction. Potential users of these datasets are advised, however, to contact the original principal investigator Dr. J. Smith (Institute for Social Research, The University of Michigan, Box 1248, Ann Arbor, MI 48106), about their intended uses of the data. Dr. Smith would also appreciate receiving copies of reports based on the datasets.</conditions>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Disclaimer
<disclaimer> 2.4.2.8
Information regarding responsibility for uses of the data collection.
Example:
<disclaimer>The original collector of the data, ICPSR, and the relevant funding agency bear no responsibility for uses of this collection or for interpretations or inferences based upon such uses.</disclaimer>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 2.4.3 (Generic element A.4)
Indicate within this item any additional information about access and data use. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Examples:
<notes>Users should note that this is a beta version of the data. The investigators therefore request that users who encounter any problems with the dataset contact them at the above address.</notes>

Optional
Repeatable
Attributes: ID, xml:lang, source, type, subject, level, responsibility
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Other Study Description Materials

Section 2.5 of the Study Description (2.0)

of the Data Documentation Initiative (DDI) DTD


Other Study Description Material's Place within the Document Structure

    Document
          |
          |---Document Description
          |---Study Description
          |               |
          |               |---Citation
          |               |---Methodology and Processing 
          |               |---Data Access
          |               |---OTHER STUDY DESCRIPTION MATERIALS
          |
          |---Data Files Description
          |---Variable Description
          |---Other Study-Related Materials

The Role of Other Study Description Materials

  • This section describes other materials that are related to the study description that are primarily descriptions of the content and use of the study, such as appendices, sampling information, weighting details, methodological and technical details, publications based upon the study content, related studies or collections of studies, etc.
  • This section may point to other materials related to the description of the study through use of the generic citation element (A.6), which is available for each element in this section.
  • Note that Section 5.0, Other Study-Related Materials, should be used for materials used in the production of the study or useful in the analysis of the study. The materials in Section 5.0 may be entered as PCDATA (ASCII text) directly into the document (through use of the txt element). That section may also serve as a "container" for other machine-readable materials by providing a brief description of the study-related materials accompanied by the "type" and "level" attributes further defining the materials. Other Study-Related Materials in Section 5.0 may include: questionnaires, coding notes, SPSS/SAS/STATA setups (and others), user manuals, continuity guides, sample computer software programs, glossaries of terms, interviewer/project instructions, maps, database schema, data dictionaries, show cards, coding information, interview schedules, missing values information, frequency files, variable maps, etc.

Other Study Description Materials

<othrStdyMat> 2.5
Description: Other materials relating to the study description.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: Related Material, Related Study, Related Publication, Other References Note


Related Material
<relMat> 2.5.1
Description: Describes materials related to the study description, such as appendices, additional information on sampling found in other documents, etc. Can take the form of bibliographic citations. This element can contain either PCDATA or a citation or both, and there can be multiple occurrences of both the citation and PCDATA within a single element. May consist of a single URI or a series of URIs comprising a series of citations/references to external materials which can be objects as a whole (journal articles) or parts of objects (chapters or appendices in articles or documents).
Examples:
<relMat> Full details on the research design and procedures, sampling methodology, content areas, and questionnaire design, as well as percentage distributions by respondent's sex, race, region, college plans, and drug use, appear in the annual ISR volumes MONITORING THE FUTURE: QUESTIONNAIRE RESPONSES FROM THE NATION'S HIGH SCHOOL SENIORS.</relMat>

<relMat>Current Population Survey, March 1999: Technical Documentation includes an abstract, pertinent information about the file, a glossary, code lists, and a data dictionary. One copy accompanies each file order. When ordered separately, it is available from Marketing Services Office, Customer Service Center, Bureau of the Census, Washington, D.C. 20233. </relMat>

<relMat>A more precise explanation regarding the CPS sample design is provided in Technical Paper 40, The Current Population Survey: Design and Methodology. Chapter 5 of this paper provides documentation on the weighting procedures for the CPS both with and without supplement questions.</relMat>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains #PCDATA or Citation structure (See 2.1 of Study Description Section), Link to other element(s) within the codebook.


Related Study
<relStdy> 2.5.2
Description: Information on the relationship of the current data collection to others (e.g., predecessors, successors, other waves or rounds) or to other editions of the same file. This would include the names of additional data collections generated from the same data collection vehicle plus other collections directed at the same general topic. Can take the form of bibliographic citations.
Example:
<relStdy>ICPSR distributes a companion study to this collection titled FEMALE LABOR FORCE PARTICIPATION AND MARITAL INSTABILITY, 1980: [UNITED STATES] (ICPSR 9199).</relStdy>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains #PCDATA or Citation structure (See 2.1 of Study Description Section), Link to other element(s) within the codebook.


Related Publication
<relPubl> 2.5.3
Description: Bibliographic and access information about articles and reports based on the data in this collection. Can take the form of bibliographic citations.
Examples:
<relPubl>Economic Behavior Program Staff. SURVEYS OF CONSUMER FINANCES. Annual volumes 1960 through 1970. Ann Arbor, MI: Institute for Social Research.</relPubl>

<relPubl>Data from the March Current Population Survey are published most frequently in the Current Population Reports P- 20 and P- 60 series. These reports are available from the Superintendent of Documents, U. S. Government Printing Office, Washington, DC 20402. They also are available on the INTERNET at http:// www. census. gov. Forthcoming reports will be cited in Census and You, the Monthly Product Announcement (MPA), and the Bureau of the Census Catalog and Guide. </relPubl>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains #PCDATA or Citation structure (See 2.1 of Study Description Section), Link to other element(s) within the codebook.


Other References Note
<othRefs> 2.5.4
Description: Indicate here other pertinent references. Can take the form of bibliographic citations.
Example:
<othRefs>Part II of the documentation, the Field Representative's Manual, is provided in hardcopy form only.</othRefs>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains #PCDATA or Citation structure (See 2.1 of Study Description Section), Link to other element(s) within the codebook.


Data Files Description

Section 3.0 of the Data Documentation Initiative (DDI) DTD


Data Files Description's Place within the Document Structure

    Document
          |
          |---Document Description
          |---Study Description
          |---DATA FILES DESCRIPTION
          |---Variable Description
          |---Other Study-Related Materials

The Role of Data File Description

The File Description consists of information about the particular data file(s) containing numeric and/or numeric + textual information that the DDI-compliant file describes. This section consists of items describing the characteristics and contents of file(s) that comprise the study as described in the Study Description. There may be multiple file descriptions if there are multiple files in the collection.

Data File Description

<fileDscr> 3.0
Description: This section can be repeated for collections with multiple files.

  • The "URI" attribute may be a URN or a URL that can be used to retrieve the file.
  • The "sdatrefs" are summary data description references that record the ID values of all elements within the summary data description section of the Study Description that might apply to the file. These elements include: time period covered, date of collection, nation or country, geographic coverage, geographic unit, unit of analysis, universe, and kind of data.
  • The "methrefs" are methodology and processing references that record the ID values of all elements within the study methodology and processing section of the Study Description that might apply to the file. These elements include information on data collection and data appraisal (e.g., sampling, sources, weighting, data cleaning, response rates, and sampling error estimates).
  • The "pubrefs" attribute provides a link to publication/citation references and records the ID values of all citations elements within Section 2.5 or Section 5.0 that pertain to this file.
  • "Access" records the ID values of all elements in Section 2.4 of the document that describe access conditions for this file.
Remarks: When a codebook documents two different physical instantiations of a data file, e.g., logical record length (or OSIRIS) and card-image version, the Data File Description (3.0) should be repeated to describe the two separate files. An ID should be assigned to each file so that in the Variable section (4.0) the location of each variable on the two files can be distinguished using the unique file IDs.
Examples:
<fileDscr ID='card' URI='www.icpsr.umich.edu/cgi-bin/archive.prl?path=ICPSR&num=7728'/>
<fileDscr ID='lrecl' URI='www.icpsr.umich.edu/cgi-bin/archive.prl?path=ICPSR&num=7728'/>
Optional
Repeatable
Attributes: ID, xml:lang, source, URI, sdatrefs, methrefs, pubrefs, access
Contains Elements: File Description, Notes


File Description
<fileTxt> 3.1
Description: Information about the data file.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: File Name, Contents of File, File Structure , File Dimensions, Type of File, Data Format, Place of File Production , Extent of Processing Checks, Processing Status, Missing Data , Software Used to Produce the File , Version Statement


File Name
<fileName> 3.1.1
Description: Contains a short title that will be used to distinguish a particular file/part from other files/parts in the data collection.
Example:
<fileName ID='File1'>Second-Generation Children Data </fileName>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


File Contents
<fileCont> 3.1.2
Description: Abstract or description of the file. A summary describing the purpose, nature, and scope of the data file, special characteristics of its contents, major subject areas covered, and what questions the PIs attempted to answer when they created the file. A listing of major variables in the file is important here. In the case of multi-file collections, this uniquely describes the contents of each file.
Example:
<fileCont>Part 1 contains both edited and constructed variables describing demographic and family relationships, income, disability, employment, health insurance status, and utilization data for all of 1987. </fileCont>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


File Structure
<fileStrc> 3.1.2
Description: Type of file structure. Use attribute of "type" to indicate hierarchical, rectangular, or relational (the default is rectangular).
Remarks: If the file is rectangular, skip to File Dimensions (3.1.4).
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Record Group, Notes


Record or Record Group
<recGrp> 3.1.3.1
Description: Used to describe record groupings if the file is hierarchical or relational. The attribute "recGrp" allows a record group to indicate subsidiary record groups that nest underneath; this allows for the encoding of a hierarchical structure of record groups. The attribute "rectype" indicates the type of record, e.g., "'A' records" or "Household records." "Keyvar" is an IDREF that provides the link to other record types. In a hierarchical study consisting of individual and household records, the "keyvar" on the person record will indicate the household to which it belongs. The "recidvar" is the unique ID of the record group itself.
Example:
<fileStrc type='hierarchical'>
<recGrp rectype='A'>CPS Person-Level Records</recGrp>
</fileStrc>
Optional
Repeatable
Attributes: ID, xml:lang, source, recGrp, rectype, keyvar, recidvar
Contains Elements: Record Label, Record Dimensions


Record Label
<labl> 3.1.3.1.1 (Generic element A.2)
Description: A more descriptive specification of record group. A "level" attribute is included to permit coding of the level to which the label applies, i.e., the study level, the file level (if different from study), the record level, the variable group level, or the variable level. A "vendor" attribute is provided to allow for specification of different labels for use with different vendors' software.
Example:
<fileStrc type='hierarchical'><recGrp rectype='A' keyvar='H-SEQ' recidvar='PRECORD'> <labl>Person (A) Record</labl></recGrp></fileStrc>
Optional
Repeatable
Attributes: ID, xml:lang, source, level, vendor
Contains: #PCDATA, Link to other element(s) within the codebook.


Record Dimensions
<recDimnsn> 3.1.3.1.2
Description: Information about the physical characteristics of the record. The "level" attribute on this element should be set to "record."
Optional
Not Repeatable
Attributes: ID, xml:lang, source, level
Contains Elements: Variable Quantity (of Record), Record Quantity (of Record), Logical Record Length (of Record)


Variable Quantity (of Record)
<varQnty> 3.1.3.1.2.1
Description: Number of variables on the record.
Example:
<recGrp><recDimnsn level='record'><varQnty>27</varQnty> </recDimnsn></recGrp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Record Quantity (of Record)
<caseQnty> 3.1.3.1.2.2
Description: Number of records of this type.
Example:
<recGrp><recDimnsn><caseQnty>1011</caseQnty> </recDimnsn></recGrp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Logical Record Length (of Record)
<logRecL> 3.1.3.1.2.3
Description: Logical record length of record, i.e., number of characters of data in the record.
Example:
<recGrp><recDimnsn><logRecL>27</logRecL> </recDimnsn></recGrp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 3.1.3.2 (Generic element A.4)
Description: Indicate any additional information regarding this record type. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes>The number of arrest records for an individual is dependent on the number of arrests an offender had.</notes>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, responsibility
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


File Dimensions
<dimensns> 3.1.4
Description: Dimensions of the overall file.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Overall Case Count, Overall Variable Count, Logical Record Length, Records Per Case, Total Number of Records


Overall Case Count
<caseQnty> 3.1.4.1
Description: Number of cases or observations in the entire file.
Remarks: To be used for rectangular files only.
Example:
<dimensns><caseQnty>205</caseQnty></dimensns>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Overall Variable Count
<varQnty> 3.1.4.2
Description: Number of variables in the entire file.
Remarks: To be used for rectangular files only.
Example:
<dimensns><varQnty>88</varQnty></dimensns>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Logical Record Length
<logRecL> 3.1.4.3
Description: Logical record length of the file, i.e., number of characters.
Remarks: To be used for rectangular files or if all records in a hierarchical file are the same length.
Example:
<dimensns><logRecL>125</logRecL></dimensns>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Records per Case
<recPrCas> 3.1.4.4
Description: Records per case in the file.
Remarks: To be used for card-image data or other files in which there are multiple records per case.
Example:
<dimensns><recPrCas>5</recPrCas></dimensns>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Total Number of Records
<recNumTot> 3.1.4.5
Description: Overall record count in the file.
Remarks: To be used in instances such as files with multiple cards/decks or records per case.
Example:
<dimensns>recNumTot>2400</recNumTot></dimensns>

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Type of File
<fileType> 3.1.5
Description: Types of data files include raw data (ASCII, EBCDIC, etc.) and software-dependent files such as SAS datasets, SPSS export files, etc. If the data are of mixed types (e.g., ASCII and packed decimal), state that here. The "charset" attribute allows one to sepcify the character set used in the file, e.g., US-ASCII, EBCDIC, UNICODE UTF-8, etc.
Remarks: Note that the element Variable Format (4.2.23) permits specification of the data format at the variable level.
Example:
<fileType charset='us-ascii'>ASCII data file</fileType>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, charset
Contains: #PCDATA, Link to other element(s) within the codebook.


Data Format
<format> 3.1.6
Description: Physical format of the data file: Logical record length format, card-image format (i.e., data with multiple records per case), delimited format, free format, etc.
Example:
<format>comma-delimited</format>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Place of File Production
<filePlac> 3.1.7
Description: Indicate whether file was produced at an archive or produced elsewhere.
<filePlac>Washington, DC: United States Department of Commerce, Bureau of the Census</filePlace>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Extent of Processing Checks
<dataChck> 3.1.8
Description: Indicate here at the file level the types of checks and operations performed on the data file. A controlled vocabulary may be developed for this element in the future. The following examples are based on ICPSR's Extent of Processing scheme:
Examples:
<dataChck>The archive produced a codebook for this collection.</dataChck>
<dataChck>Consistency checks were performed by Data Producer/ Principal Investigator.</dataChck>
<dataChck>Consistency checks performed by the archive.</dataChck>
<dataChck>The archive generated SAS and/or SPSS data definition statements for this collection.</dataChck>
<dataChck>Frequencies were provided by Data Producer/Principal Investigator.</dataChck>
<dataChck>Frequencies provided by the archive.</dataChck>
<dataChck>Missing data codes were standardized by Data Producer/ Principal Investigator.</dataChck>
<dataChck>Missing data codes were standardized by the archive.</dataChck>
<dataChck>The archive performed recodes and/or calculated derived variables. </dataChck>
<dataChck>Data were reformatted by the archive.</dataChck>
<dataChck>Checks for undocumented codes were performed by Data Producer/Principal Investigator.</dataChck>
<dataChck>Checks for undocumented codes were performed by the archive.</dataChck>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Processing Status
<ProcStat> 3.1.9
Description: Processing status of the file. Some data producers and social science data archives employ data processing strategies that provide for release of data and documentation at various stages of processing.
Examples:
<ProcStat>Available from the DDA. Being processed.</ProcStat>
<ProcStat>The principal investigator notes that the data in Public Use Tape 5 are released prior to final cleaning and editing, in order to provide prompt access to the NMES data by the research and policy community.</ProcStat>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Missing Data
<dataMsng> 3.1.10
Description: This element can be used to give general information about missing data, e.g., that missing data have been standardized across the collection, missing data are present because of merging, etc.
Examples:
<dataMsng>Missing data are represented by blanks.</dataMsng>
<dataMsng>The codes "-1" and "-2" are used to represent missing data.</dataMsng>

Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Software Used to Produce the File
<software> 3.1.11 (Generic element A.6.3.5)
Description: Software that created the file. A "version" attribute permits specification of the software version number. The "date" attribute is provided to enable specification of the date (if any) of the software release. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<software version='6.12'>The SAS transport file was generated by the SAS CPORT procedure.</software>
Optional
Repeatable
Attributes: ID, xml:lang, source, date, version
Contains: #PCDATA, Link to other element(s) within the codebook.


Version (of File) Statement
<verStmt> 3.1.12 (Generic element A.6.6)
Description: Version statement for the data file, if one of a multi-file collection.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Version, Version Responsibility Statement, Notes


Version
<version> 3.1.12.1 (Generic element A.6.6.1)
Description: Also known as release or edition. If there have been substantive changes in the file since its creation, this statement should be used. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<version type='edition' date='1999-02-05'>First ICPSR Edition</version>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Responsibility Statement
<verResp> 3.1.12.2 (Generic element A.6.6.2)
Description: Used to indicate the organization or person responsible for the version of the file.
Example:
<verResp>Inter-university Consortium for Political and Social Research</verResp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook. (Or alternatively, all of the elements available under the general responsibility statement above.)


Notes
<notes> 3.1.12.3 (Generic element A.4)
Description: Used to indicate additional information regarding the version or version responsibility statement, in particular to indicate what makes a new version different from its predecessor. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes>Data for all previously-embargoed variables are now available in this version of the file.</notes>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.

Notes
<notes> 3.2 (Generic element A.4)
Description: Additional information about the data file not covered in other elements. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<notes>There is a restricted version of this file containing confidential information, access to which is controlled by the principal investigator.</notes>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, responsibility
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Variable Description

Section 4.0 of the Data Documentation Initiative (DDI) DTD


Variable Description's Place within the Document Structure

    Document
          |
          |---Document Description
          |---Study Description
          |---Data Files Description
          |---VARIABLES DESCRIPTION
          |---Other Study-Related Materials

Variables Description

Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements:
Variable Group
Variable


Variable Group
<varGrp> 4.1
Description: A group of variables that may share a common subject, arise from the interpretation of a single question, or are linked by some other factor.

  • The "type" of group attribute refers to the general type of grouping of the variables, e.g., subject, multiple response.
  • The "var" reference is used to indicate all the constituent variable IDs in the group.
  • The "varGrp" reference is used to indicate all the subsidiary variable groups which nest underneath the current varGrp. This allows for encoding of a hierarchical structure of variable groups.
  • The "name" is the unique ID for the group.
  • The "sdatrefs" are summary data description references that record the ID values of all elements within the summary data description section of the Study Description that might apply to the group. These elements include: time period covered, date of collection, nation or country, geographic coverage, geographic unit, unit of analysis, universe, and kind of data.
  • The "methrefs" are methodology and processing references which record the ID values of all elements within the study methodology and processing section of the Study Description which might apply to the group. These elements include information on data collection and data appraisal (e.g., sampling, sources, weighting, data cleaning, response rates, and sampling error estimates).
  • The "pubrefs" attribute provides a link to publication/citation references and records the ID values of all citations elements within Section 2.5 or Section 5.0 that pertain to this variable group.
  • "Access" records the ID values of all elements in Section 2.4 of the document that describe access conditions for this variable group.

Remarks: Variable groups are created this way in order to permit variables to belong to multiple groups, including multiple subject groups such as a group of variables on sex and income, or to a subject and a multiple response group, without causing overlapping groups. Variables that are linked by use of the same question need not be identified by a Variable Group element because they are linked by a common unique question identifier in the Variable element. Note that as a result of the strict sequencing required by XML, all Variable Groups must be marked up before the Variable element is opened. That is, the mark-up author cannot mark up a Variable Group, then mark up its constituent variables, then mark up another Variable Group.
Specific variable groups, included within the 'type' attribute, are:

  • Section: Questions which derive from the same section of the questionnaire, e.g., all variables located in Section C.
  • Multiple response: Questions where the respondent has the opportunity to select more than one answer from a variety of choices, e.g., what newspapers have you read in the past month (with the respondent able to select up to five choices).
  • Grid: Sub-questions of an introductory or main question but which do not constitute a multiple response group, e.g., I am going to read you some events in the news lately and you tell me for each one whether you are very interested in the event, fairly interested in the fact, or not interested in the event.
  • Display: Questions which appear on the same interview screen (CAI) together or are presented to the interviewer or respondent as a group.
  • Repetition: The same variable (or group of variables) which are repeated for different groups of respondents or for the same respondent at a different time.
  • Subject: Questions which address a common topic or subject, e.g., income, poverty, children.
  • Version: Variables, often appearing in pairs, which represent different aspects of the same question, e.g., pairs of variables (or groups) which are adjusted/unadjusted for inflation or season or whatever, pairs of variables with/without missing data imputed, and versions of the same basic question.
  • Iteration: Questions that appear in different sections of the data file measuring a common subject in different ways, e.g., a set of variables which report the progression of respondent income over the life course.
  • Analysis: Variables combined into the same index, e.g., the components of a calculation, such as the numerator and the denominator of an economic statistic.
  • Pragmatic: A variable group without shared properties.
  • Record: Variable from a single record in a hierarchical file.
  • File: Variable from a single file in a multifile study.
  • Randomized: Variables generated by CAI surveys produced by one or more random number variables together with a response variable, e.g, random variable X which could equal 1 or 2 (at random) which in turn would control whether Q.23 is worded "men" or "women", e.g., would you favor helping [men/women] laid off from a factory obtain training for a new job?
  • Other: Variables which do not fit easily into any of the categories listed above, e.g., a group of variables whose documentation is in another language.
Optional
Repeatable
Attributes: ID, xml:lang, source, type, var, varGrp, name, sdatrefs, methrefs, pubrefs, access
Contains Elements: Variable Group Label, Variable Group Text, Variable Group Definition, Variable Group Universe, Variable Group Notes


Variable Group Label
<labl> 4.1.1 (Generic element A.2)
Description: A short description of the variable group. A "level" attribute is included to permit coding of the level to which the label applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. Vendor attribute provided to allow for specification of different labels for use with different vendors' software.
Examples:
<varGrp><labl>Study Procedure Information</labl></varGrp>
<varGrp><labl>Political Involvement and National Goals</labl></varGrp> <varGrp><labl> level='record'>Household Variable Section </labl></varGrp>
Optional
Repeatable
Attributes: ID, xml:lang, source, level, vendor
Contains: #PCDATA, Link to other element(s) within the codebook.


Variable Group Text
<txt> 4.1.2 (Generic element A.3)
Description: Lengthier description of variable group. A "level" attribute is included to permit coding of the level to which the text applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level.
Example:
<varGrp type='subject'><txt>The following five variables refer to respondent attitudes toward national environmental policies: air pollution, urban sprawl, noise abatement, carbon dioxide emissions, and nuclear waste.</txt></varGrp>
Optional
Repeatable
Attributes: ID, xml:lang, source, level
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table. (Optional references to individual variable IDs within the text.)


Variable Group Definition
<defntn> 4.1.3
Description: Rationale for why the variables are grouped in this way.
Example:
<varGrp><defntn>The following eight variables were only asked in Ghana.</defntn></varGrp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Variable Group Universe
<universe> 4.1.4 (Reference element 2.2.3.7)
Description: The group of persons or other elements that are the object of the variable group and to which any analytic results refer. Age, nationality, and residence commonly help to delineate a given universe, but any of a number of factors may be involved, such as sex, race, income, veteran status, criminal convictions, etc. The universe may consist of elements other than persons, such as housing units, court cases, deaths, countries, etc. In general, it should be possible to tell from the description of the universe whether a given individual or element (hypothetical or real) is a member of the population under study. A "level" attribute is included to permit coding of the level to which universe applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. The "clusion" attribute provides for specification of groups included (I) in or excluded (E) from the universe.
Remarks: If all the variables described in the data documentation relate to the same population, e.g., the same set of survey respondents, this element and its complement at the variable level (Variable Universe 4.2.12) would be unnecessary. In this case, universe can be fully described at the level of the study (2.2.3.7).
Examples:
<varGrp><universe clusion='I'>Individuals 15-19 years of age. </universe></varGrp>
<varGrp><universe clusion='E'>Individuals younger than 15 and older than 19 years of age.</universe></varGrp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, level, clusion
Contains: #PCDATA, Link to other element(s) within the codebook.


Variable Group Notes
<notes> 4.1.5 (Generic element A.4)
Description: Used to indicate additional information about the variable group. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Examples:
<varGrp><notes>This variable group was created for the purpose of combining all derived variables.</notes></varGrp>
<varGrp><notes source='archive' resp='John Data'>This variable group and all other variable groups in this data file were organized according to a schema developed by the adhoc advisory committee. </notes></varGrp>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.

Variable
<var> 4.2
Description: This element describes all of the features of a single variable in a social science data file. This element includes the following attributes:

  • The attribute "name" is a unique ID for the variable. Following the rules of many statistical analysis systems such as SAS and SPSS, names are usually up to eight characters long.
  • "Wgt" indicates whether the variable is a weight.
  • "Wgt-var" is a reference to the weight variable for this variable.
  • "Qstn" is a reference to the question ID for the variable.
  • "Files" is the IDREF identifying the file(s) to which the variable belongs.
  • "Vendor" is the origin of the proprietary format and includes SAS, SPSS, ANSI, and ISO.
  • "Dcml" refers to the number of decimal points in the variable.
  • "Intrvl" (interval) type options are discrete or continuous.
  • "Rectype" refers to the record type to which the variable belongs.
  • The "sdatrefs" are summary data description references which record the ID values of all elements within the summary data description section of the Study Description which might apply to the group. These elements include: time period covered, date of collection, nation or country, geographic coverage, geographic unit, unit of analysis, universe, and kind of data.
  • The "methrefs" are methodology and processing references which record the ID values of all elements within the study methodology and processing section of the Study Description which might apply to the group. These elements include information on data collection and data appraisal (e.g., sampling, sources, weighting, data cleaning, response rates, and sampling error estimates).
  • The "pubrefs" attribute provides a link to publication/citation references and records the ID values of all citations elements within Section 2.5 or Section 5.0 that pertain to this variable.
  • "Access" records the ID values of all elements in Section 2.4 of the document that describe access conditions for this variable.

Optional
Repeatable
Attributes: ID, xml:lang, source, name, wgt, wgt-var, qstn, files, vendor, dcml, intrvl, rectype, sdatrefs, methrefs, pubrefs, access
Contains Elements: Location, Label, Imputation, Security, Embargo, Response Unit, Analysis Unit, Question, Range of Valid Data Values, Range of Invalid Data Values, Undocumented Codes, Universe, Total Responses, Summary Statistics, Variable Text, Standard Categories, Category Group, Category, Coding Instructions, Version (of Variable) Statement, Concept, Derivation, Variable Format, Notes


Location
<location> 4.2.1
Description: This is an empty element containing only the attributes listed below. Attributes include "StartPos" (starting position of variable), "EndPos" (ending position of variable), "width" (number of columns the variable occupies), "RecSegNo" (the record segment number, deck or card number the variable is located on), and "fileid" (an IDREF link to the fileDscr element for the file that this location is within).

Remarks: The fileid is necessary in cases where the same variable may be coded in two different files, e.g., a logical record length type file and a card image type file. Note that if there is no width or ending position, then the starting position should be the ordinal position in the file, and the file would be described as free-format.

Examples:
<var><location StartPos='55' EndPos='57' width='3' RecSegNo='1' fileid='CARD-IMAGE' ></location></var>
<var files='File2'><location StartPos='25' EndPos='25' width='1' RecSegNo='A'></location></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, StartPos, EndPos, width, RecSegNo, fileid
Empty element.


Variable Label
<labl> 4.2.2 (Generic element A.2)
Description: A descriptive phrase which defines the variable. The length of this phrase may depend on the statistical analysis system used (e.g., some version of SAS permit 40-character labels while some versions of SPSS permit 120 characters. A "level" attribute is included to permit coding of the level to which label applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. Vendor attribute provided to allow for specification of different labels for use with different vendors' software.
Remarks:Whenever possible this element should be used instead of 4.2.15 (Variable Text, 'txt' ) in order to facilitate the creation of statistical analysis software labels.
Example:
<var><labl>Why No Holiday-No Money</labl></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level, vendor
Contains: #PCDATA, Link to other element(s) within the codebook.


Imputation
<imputation> 4.2.3
Description: According to the Statistical Terminology glossary maintained by the National Science Foundation, this is "the process by which one estimates missing values for items that a survey respondent failed to provide," and if applicable in this context, it refers to the type of procedure used.
Example:
<var><imputation>This variable contains values that were derived by substitution.</imputation></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Security
<security> 4.2.4
Description: Provides information regarding levels of access to the variable, e.g., public, subscriber, need to know. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<var><security date='1998-05-10'> This variable has been recoded for reasons of confidentiality. Users should contact the archive for information on obtaining access.</security></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, date
Contains: #PCDATA, Link to other element(s) within the codebook.


Embargo
<embargo> 4.2.5
Description: Provides information on variables which are not currently available because of policies established by the principal investigators and/or data producers. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute. An "event" attribute is provided to specify "notBefore" or "notAfter" ("notBefore" is the default). A "format" attribute is provided to ensure that this information will be machine-processable and specifies a format for the embargo element. The format attribute could be used to specify other conventions for the way that information within the embargo element is set out, if there were agreed-upon, commonly used conventions for encoding embargo information created in the future.
Example:
<var><embargo event='notBefore' date='2001-09-30'> This data associated with this variable will not become available until September 30, 2001, because of embargo provisions established by the data producers. </embargo></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, date, event, format
Contains: #PCDATA, Link to other element(s) within the codebook.


Response Unit
<respUnit> 4.2.6
Description: Provides information regarding who provided the information contained within the variable, e.g., respondent, proxy, interviewer.
Example:
<var><respUnit> Respondent </respUnit></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Analysis Unit
<anlysUnt> 4.2.7
Description: Provides information regarding whom or what the variable describes.
Example:
<var><anlysUnt> This variable reports election returns at the constituency level. </anlysUnt></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Question
<qstn> 4.2.8
Description: The question element may have mixed content. The element itself may contain text for the question, with the subelements being used to provide further information about the question. Alternatively, the question element may be empty and only the subelements used. The element has a unique question ID attribute which can be used to link a variable with other variables where the same question has been asked. This would allow searching for all variables that share the same question ID perhaps because the questions was asked several times in a panel design. The attributes for this element include:

  • a "qstn" ID, a unique identifier for the question
  • "Var", a reference to IDs of all variables relating to question
  • "seqNo", the sequence number of the question, and
  • "sdatrefs", summary data description references which record the ID values of all elements within the summary data description section of the Study Description which might apply to the group. These elements include: time period covered, date of collection, nation or country, geographic coverage, geographic unit, unit of analysis, universe, and kind of data.

Example:
<var><qstn ID='Q125'>When you get together with your friends, would you say you discuss political matters frequently, occasionally, or never?</qstn></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, qstn, var, seqNo, sdatrefs
May include mixed #PCDATA content, Link to other element(s) within the codebook.
Contains Elements: Pre-Question Text, Literal Question, Post-Question Text, Forward Progression, Back Flow, Interviewer Instructions


Pre-Question Text
<preQTxt> 4.2.8.1
Description: Text describing a set of conditions under which a question might be asked.
Example:
<var><qstn><preQTxt>For those who did not go away on a holiday of four days or more in 1985... </preQTxt></qstn></var>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Literal Question
<qstnLit> 4.2.8.2
Description: Text of the actual, literal question asked.
Example:
<var><qstn><qstnLit>Why didn't you go away in 1985?</qstnLit></qstn></var>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Post-Question Text
<postQTxt> 4.2.8.3
Description: Text describing what occurs after the literal question has been asked.
Example:
<var><qstn><postQTxt>The next set of questions will ask about your financial situation.</postQTxt> </qstn></var>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Forward Progression
<forward> 4.2.8.4
Description: Contains a reference to IDs of possible following questions. The "qstn" IDREF may be used to specify the IDs.
Example:
<var><qstn><forward qstn='Q120 Q121 Q122 Q123 Q124'> If yes, please ask questions 120-124.</forward></qstn></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, qstn
Contains: #PCDATA, Link to other element(s) within the codebook.


Backflow
<backward> 4.2.8.5
Description: Contains a reference to IDs of possible preceding questions. The "qstn" IDREF may be used to specify the IDs.
Examples:
<var><qstn><backward qstn='Q12 Q13 Q14 Q15'>For responses on a similar topic, see questions 12-15.</backward></qstn> </var>
<var><qstn><backward qstn='Q143'> </backward></qstn> </var>
Repeatable
Attributes: ID, xml:lang, source, qstn
Contains: #PCDATA, Link to other element(s) within the codebook.


Interviewer Instructions
<ivuInstr> 4.2.8.6
Description: Specific instructions to the individual conducting an interview.
Example:
<var><qstn><ivuInstr> Please prompt the respondent if they are reticent to answer this question. </ivuInstr></qstn></var>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Range of Valid Data Values
<valrng> 4.2.9
Description: Values for a particular variable that represent legitimate responses.
Example:
<valrng>
<range UNITS='INT' maxExclusive='95' min='05' max='80'>
</range>
<key>
05 (PSU) Parti Socialiste Unifie et extreme gauche (Lutte Ouvriere) [United Socialists and extreme left (Workers Struggle)]
50 Les Verts [Green Party]
80 (FN) Front National et extreme droite [National Front and extreme right]
95 Would vote blank
</key> </valrng>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Variable Range, Variable Item, Range Key, Notes


Variable Range

<range> 4.2.9 (Generic element A.8)
Description: This is the actual range. The "UNITS" attribute of Range permits the specification of integer/real numbers. The "min" and "max attributes specify values which are considered part of the range. The "minExclusive" and "maxExclusive" attributes specify values which are not considered part of the range. For example, x < 1 or 10 <= x < 20 would be expressed as <range maxExclusive='1' /><range min='10' maxExclusive='20' />. This is an empty element consisting only of its attributes.
Optional
Repeatable
Attributes: ID, xml:lang, source, UNITS, min, minExclusive, max, maxExclusive
Empty element.


Variable Item

<item> 4.2.9 (Generic element A.9)
Description: The counterpart to Range; used to encode individual values. This is an empty element consisting only of its attributes. The "UNITS" attribute of Range permits the specification of integer/real numbers.
Optional
Repeatable
Attributes: ID, xml:lang, source, UNITS, VALUE
Empty element.


Range Key

<key> 4.2.9 (Generic element A.10)
Description: This element permits a listing of the category values and labels. While this information is coded separately in the Category element, there may be some value in having this information in proximity to the range of valid and invalid values. A table is permissible in this element.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Notes
<notes> 4.2.9 (Generic element A.4)
Description: Used to indicate additional information regarding the variable range. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<valrng>< notes subject='political party' >Starting with Euro-Barometer 2 the coding of this variable has been standardized following an approximate ordering of each country's political parties along a "left" to "right" continuum in the first digit of the codes. Parties coded 01-39 are generally considered on the "left", those coded 40-49 in the "center", and those coded 60-89 on the "right" of the political spectrum. Parties coded 50-59 cannot be readily located in the traditional meaning of "left" and "right". The second digit of the codes is not significant to the "left-right" ordering. Codes 90-99 contain the response "other party" and various missing data responses. Users may modify these codings or part of these codings in order to suit their specific needs. </notes></valrng>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Range of Invalid Data Values
<invalrng> 4.2.10
Description: Values for a particular variable that represent missing data, not applicable responses, etc.
Example:
<invalrng>
<range UNITS='INT' minExclusive='0' min='98' max='99'>
</range>
<key>
0 No answer 98 DK
99 Inappropriate
</key> </invalrng>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Variable Range, Variable Item, Range Key, Notes


Variable Range

<range> 4.2.10 (Generic element A.8)
Description: This is the actual range. The "UNITS" attribute of Range permits the specification of integer/real numbers. For example, x < 1 or 10 <= x < 20 would be expressed as <range maxExclusive='1' /><range min='10' maxExclusive='20' />.
Optional
Repeatable
Attributes: ID, xml:lang, source, UNITS, min, minExclusive, max, maxExclusive
Empty element.


Variable Item

<item> 4.2.10 (Generic element A.9)
Description: The counterpart to Range; used to encode individual values. This is an empty element consisting only of its attributes. The "UNITS" attribute of Range permits the specification of integer/real numbers.
Optional
Repeatable
Attributes: ID, xml:lang, source, UNITS, VALUE
Empty element.


Range Key

<key> 4.2.10 (Generic element A.10)
Description: This element permits a listing of the category values and labels. While this information is coded separately in the Category element, there may be some value in having this information in proximity to the range of valid and invalid values. A table is permissible in this element.
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Notes
<notes> 4.2.10 (Generic element A.4)
Description: Used to indicate additional information regarding the variable range. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<invalrng><notes>Codes 90-99 contain the response "other party" and various missing data responses. </notes></invalrng>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Undocumented Codes
<undocCod> 4.2.11
Description: Values whose meaning is unknown.
Example:
<var><undocCod>Responses for categories 9 and 10 are unavailable.</undocCod></var>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Universe
<universe> 4.2.12 (Reference element 2.2.3.7)
Description: The group of persons or other elements that are the object of the variable and to which any analytic results refer. Age, nationality, and residence commonly help to delineate a given universe, but any of a number of factors may be involved, such as sex, race, income, veteran status, criminal convictions, etc. The universe may consist of elements other than persons, such as housing units, court cases, deaths, countries, etc. In general, it should be possible to tell from the description of the universe whether a given individual or element (hypothetical or real) is a member of the population under study. A "level" attribute is included to permit coding of the level to which universe applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. The "clusion" attribute provides for specification of groups included (I) in or excluded (E) from the universe.
Examples:
<var><universe clusion='I'>Individuals 15-19 years of age. </universe></var>
<var><universe clusion='E'>Individuals younger than 15 and older than 19 years of age.</universe></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level, clusion
Contains: #PCDATA, Link to other element(s) within the codebook.


Total Responses
<TotlResp> 4.2.13
Description: The number of responses to this variable. This element might be used if the number of responses does not match added case counts. It may also be used to sum the frequencies for variable categories.
Example:
<var><TotlResp>There are only 725 responses to this question since it was not asked in Tanzania.</TotlResp></var>
Optional
Not repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Summary Statistics
<sumStat> 4.2.14
Description: One or more statistical measures which describe the responses to a particular variable and may include one or more standard summaries, e.g., minimum and maximum values, etc. This variable includes the following attributes:

  • "Wgtd" refers to whether weighted or not.
  • "Weight" is the name of weight variable if one is used.
  • "Statistic type" can denote mean, median, mode, valid cases, invalid cases, minimum, maximum, or standard deviation.

Examples:
<var><sumStat type='min'>0</sumStat></var>
<var><sumStat type='max'>9</sumStat></var>
<var><sumStat type='median'>4</sumStat></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, wgtd, weight, type
Contains: #PCDATA, Link to other element(s) within the codebook.


Variable Text
<txt> 4.2.15 (Generic element A.3)
Description: An extended description, beyond that provided in Variable Name and Label, of the variable. A "level" attribute is included to permit coding of the level to which the text applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level.
Example:
<var><txt>Support for European Economic Community Index - constructed from Q. 246 and Q. 248.</txt></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Standard Categories
<stdCatgry> 4.2.16
Description: Standard category group used in a variable, like industry codes, employment codes, or social class codes. The attribute of "date" is provided to indicate the version of the code in place at the time of the study. The attribute of "URI" is provided to indicate a URN or URL that can be used to obtain the electronic form of the category group.
Example: <var><stdCatgry date='1981' source='producer' >Census of Population, Classified Index of Industries and Occupations </stdCatgry></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, date, URI
Contains: #PCDATA, Link to other element(s) within the codebook


Category Group
<catgryGrp> 4.2.17
Description: A description of response categories that might be grouped together. The attribute "missing" indicates whether this category group contains missing data or not. The attribute "missType" is used to specify the type of missing data, e.g., inap., don't know, no answer, etc. A controlled vocabulary for "missType" will be developed in the future. The "catgry" attribute permits specification of constituent categories in the group. The "catGrp" attribute is used to indicate all the subsidiary category groups which may nest underneath the current category group, thereby permitting the encoding of hierarchical structures of category groups.
Optional
Repeatable
Attributes: ID, xml:lang, source, missing, missType, catgry, catGrp
Contains Elements: Category Group Label, Category Group Text


Category Group Label
<labl> 4.2.17.1 (Generic element A.2)
Description: A short description of the category group. A "level" attribute is included to permit coding of the level to which the label applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. Vendor attribute provided to allow for specification of different labels for use with different vendors' software.
Example:
<var><catgryGrp missing='N' catgry='supervisors, farm workers; farm workers; marine life cultivation workers; nursery workers; animal caretakers, except farm; timber cutting and logging occupations; hunters and trappers' catGrp='Farm occupations, except managerial' ><labl>Other Agricultural and Related Occupations </labl></catgryGrp></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level, vendor
Contains: #PCDATA, Link to other element(s) within the codebook.


Category Group Text
<txt> 4.2.17.2 (Generic element A.3)
Description: A fuller description of the category group. A "level" attribute is included to permit coding of the level to which the text applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level.
Example:
<var><catgryGrp><txt>When the respondent indicated his political party reference, his response was coded on a scale of 1-99 with parties with a left-wing orientation coded on the low end of the scale and parties with a right-wing orientation coded on the high end of the scale. Categories 90-99 were reserved miscellaneous responses.</txt></catgryGrp></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Category
<catgry> 4.2.18
Description: A description of a particular response. The attribute "missing" indicates whether this category group contains missing data or not. The attribute "missType" is used to specify the type of missing data, e.g., inap., don't know, no answer, etc. A controlled vocabulary for "missType" will be developed in the future. The attribute "country" allows for the denotation of country-specific category values. Users should employ the ISO3166 standard for the designation of country codes.
Optional
Repeatable
Attributes: ID, xml:lang, source, missing, missType, country
Contains Elements: Category Value, Category Label, Category Text, Category Statistic,


Category Value
<catValu> 4.2.18.1
Description: The explicit response.
Example:
<var><catgryGrp><catgry missing='Y' missType='inap'><catValu>9 </catValu> </catgry></catgryGrp></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Category Label
<labl> 4.2.18.2 (Generic element A.2)
Description: A short description of the response. A "level" attribute is included to permit coding of the level to which the text applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. Vendor attribute provided to allow for specification of different labels for use with different vendors' software.
Remarks:Whenever possible this element should be used instead of 4.2.18.3 (Category Text, 'txt' ) in order to facilitate the creation of statistical analysis software labels.
Examples:
<var><catgryGrp><catgry><labl>Better</labl> </catgry></catgryGrp></var>
<var><catgryGrp><catgry><labl>About the same</labl> </catgry></catgryGrp></var>
<var><catgryGrp><catgry><labl>Inap.</labl> </catgry></catgryGrp></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level, vendor
Contains: #PCDATA, Link to other element(s) within the codebook.


Category Text
<txt> 4.2.18.3 (Generic element A.3)
A fuller description of the response or an elaboration on the response. A "level" attribute is included to permit coding of the level to which the text applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level.
Example:
<var><catgryGrp><catgry><txt>Inap., question not asked in Ireland, Northern Ireland, and Luxembourg.</txt></catgry></catgryGrp></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, level
Contains: #PCDATA, Link to other element(s) within the codebook.


Category Statistic
<catStat> 4.2.18.4
Description: May include frequencies, percentages, or crosstabulation results which define the category; often appears in a table. The attribute "type" refers to "frequency", "percent", or "crosstab". The URI attribute can be used to link to a table.
Example:
<var><catgryGrp><catgry><catStat type='freq'>256 </catStat></catgry></catgryGrp></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, type, URI
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Coding Instructions
<codInstr> 4.2.19
Description: Any special instructions to those who converted information from one form to another for a particular variable. This might include the reordering of numeric information into another form or the conversion of textual information into numeric information.
Example:
<var><codInstr>Use the standard classification tables to present responses to the question: What is your occupation? into numeric codes.</codInstr></var>
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA or a table.


Version (of Variable) Statement
<verStmt> 4.2.20 (Generic element A.6.6)
Description: Version statement for the variable, if it has undergone changes.
Optional
Repeatable
Attributes: ID, xml:lang, source
Contains Elements: Version, Version Responsibility Statement, Notes


Version
<version> 4.2.20 (Generic element A.6.6.1)
Description: Also known as release or edition. If there have been substantive changes in the variable since its creation, this statement should be used. The ISO standard for dates (YYYY-MM-DD) is recommended for use with the date attribute.
Example:
<var><verStmt><version type='version' date='1999-01-25'>Second version of V25</version></verStmt> </var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type (release, version, edition), date
Contains: #PCDATA, Link to other element(s) within the codebook.


Version Responsibility Statement
<verResp> 4.2.20 (Generic element A.6.6.2)
Description: Used to indicate the organization or person responsible for the version of the variable.
Example:
<var><verStmt><verResp>Zentralarchiv fuer Empirische Sozialforschung</verResp></verStmt></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, affiliation
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 4.2.20 (Generic element A.4)
Used to indicate additional information regarding the version or the version responsibility statement, in particular to indicate what makes a new version of a variable different from its predecessor. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<var><verStmt><notes>The labels for categories 01 and 02 for this variable, were inadvertently switched in the first version of this variable and have now been corrected.</notes></verStmt></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Concept
<concept> 4.2.21
Description: The general subject to which this variable may be seen as pertaining. This element serves the same purpose as the keywords and topic classification elements, but at the variable level. The "vocab" attribute is provided to indicate the controlled vocabulary, if any, used in the element, e.g., LCSH (Library of Congress Subject Headings), MeSH (Medical Subject Headings), etc. The "vocabURI" attribute specifies the location for the full controlled vocabulary.
Remarks: The actual category reference should be included in the general text.
Examples:
<var><concept>Income</concept></var>
<var><concept vocab='LCSH' vocabURI= 'http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html' source='archive' >SF: 311-312 draft horses</concept></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, vocab, vocabURI
Contains: #PCDATA, Link to other element(s) within the codebook.


Derivation
<derivation> 4.2.22
Description: Used only in the case of a derived variable, this element provides both a description of how the derivation was performed and the command used to generate the derived variable, as well as a specification of the other variables in the study used to generate the derivation. The "var" attribute provides the ID values of the other variables in the study used to generate this derived variable.
Optional
Not Repeatable
Attributes: ID, xml:lang, source, var
Contains Elements: Derivation Description, Derivation Command


Derivation Description
<drvdesc> 4.2.22.1
Description: A textual description of the way in which this variable was derived to display to users.
Examples:
<var><deriv><drvdesc> VAR215.01 "Outcome of first pregnancy" (1988 NSFG=VAR611 PREGOUT1) If R has never been pregnant (VAR203 PREGNUM EQ 0) then OUTCOM01 is blank/inapplicable. Else, OUTCOM01 is transferred from VAR225 OUTCOME for R's 1st pregnancy. </drvdesc></deriv></var>
Optional
Not Repeatable
Attributes: ID, xml:lang, source
Contains: #PCDATA, Link to other element(s) within the codebook.


Derivation Command
<drvcmd> 4.2.22.2
Description: The actual command used to generate the derived variable. The "syntax" attribute is used to indicate the command language employed (e.g., SPSS, SAS, Fortran, etc.)
Example:
<var><dervi><drvcmd><txt> syntax='SPSS' >RECODE V1 TO V3 (0=1) (1=0) (2=-1) INTO DEFENSE WELFARE HEALTH. </drvcmd></deriv></var>
Optional
Not Repeatable
Attributes: ID,xml:lang, source, syntax
Contains: #PCDATA, Link to other element(s) within the codebook.


Variable Format
<varFormat> 4.2.23
Description: The technical format of the variable in question. Attributes for this element include: "type," which signifies if the variable is character or numeric; "formatname," which in some cases may provide the name of the particular, proprietary format actually used; "schema," which identifies the vendor or standards body which defined the format among a list which includes SAS, SPSS, IBM, ANSI, ISO, XML-data or other; "category," which describes what kind of data the format represents and includes date, time, currency, or "other" conceptual possibilities; and "URI," which supplies a network identifier for the format definition.
Examples:
<var><varFormat type='numeric' schema='SAS' formatname='DATEw' category=date >The number in this variable is stored in the form 'ddmmmyy' in SAS format. </varFormat></var>

<var> <varFormat type='numeric' formatname='date.iso8601' schema='XML-Data' category='date' URI='http://www.w3.org/TR/1998/NOTE-XML-data/'> 19541022 </varFormat> </var>

Optional
Not Repeatable
Attributes: ID,xml:lang, source, type, formatname, schema, category, URI
Contains: #PCDATA, Link to other element(s) within the codebook.


Notes
<notes> 4.2.24 (Generic element A.4)
Description: Used to indicate additional information regarding the variable. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<var><notes>This variable was created by recoding location of residence to Census regions.</notes></var>
Optional
Repeatable
Attributes: ID, xml:lang, source, type, subject, level, resp
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Other Study-Related Materials

Section 5.0 of the Data Documentation Initiative (DDI) DTD


Other Study-Related Material's Place within the Document Structure


    Document
          |
          |---Document Description
          |---Study Description
          |---Data Files Description
          |---Variable Description
          |---OTHER STUDY-RELATED MATERIALS

The Role of Other Study-Related Materials

  • This section allows for the inclusion of other materials that are related to the study as identified and labeled by the DTD users (encoders). The materials may be entered as PCDATA (ASCII text) directly into the document (through use of the "txt" element). This section may also serve as a "container" for other machine-readable materials such as data definition statements by providing a brief description of the study-related materials accompanied by the attributes "type" and "level" defining the material further. The "URI" attribute may be used to indicate the location of the other study-related materials.
  • Other Study-Related Materials may include: questionnaires, coding notes, SPSS/SAS/STATA setups (and others), user manuals, continuity guides, sample computer software programs, glossaries of terms, interviewer/project instructions, maps, database schema, data dictionaries, show cards, coding information, interview schedules, missing values information, frequency files, variable maps, etc.
  • Note that Section 2.5, Other Study Description Materials, should be used for materials that are primarily descriptions of the content and use of the study, such as appendices, sampling information, weighting details, methodological and technical details, publications based upon the study content, related studies or collections of studies, etc. This section, 5.0 Other Study-Related Materials, is intended to include or to link to materials used in the production of the study or useful in the analysis of the study.
Other Study-Related Materials
<otherMat> 5.0 (Generic element A.1)
Description: Other materials related to the study.
Example:
<otherMat type='SAS data definition statements' level='study' URI='http:// www.icpsr.umich.edu'><labl>SAS Data Definition Statements for ICPSR 6837</labl></otherMat>

Optional
Repeatable
Attributes: ID, xml:lang, source, type, level, URI
Contains Elements:
Label
Text
Notes
Table
Citation


Label
<labl> 5.1 (Generic element A.2)
Description: Short description of the other material. A "level" attribute is included to permit coding of the level to which the label applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level. Vendor attribute provided to allow for specification of different labels for use with different vendors' software.
Example:
<otherMat type='SAS data definition statements' level='study' URI='http:// www.icpsr.umich.edu'><labl>SAS Data Definition Statements for ICPSR 6837</labl></otherMat>

Optional
Repeatable
Attributes: ID, xml:lang, source, level, vendor
Contains: #PCDATA, Link to other element(s) within the codebook.
Text
<txt> 5.2 (Generic element A.3)
Description: Lengthier description of other material. A "level" attribute is included to permit coding of the level to which the text applies, i.e., the study level, the file level (if different from study), the record group, the variable group, or the variable level.
Example:
<otherMat URI="http://www.icpsr.umich.edu/.."><txt>This is a PDF version of the original questionnaire provided by the principal investigator.</txt></otherMat>

<otherMat><txt>Glossary of Terms. Below are terms that may prove useful in working with the technical documentation for this study.. </txt></otherMat>

Optional
Repeatable
Attributes: ID, xml:lang, source, level
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Notes
<notes> 5.3 (Generic element A.4)
Description: Used to indicate additional information about the other material. "Notes" sections appear in several places in the DTD. The attributes for notes permit a controlled vocabulary to be developed (type and subject), the level of the DTD to which the note refers to be identified (study, file, variable, etc.), and the author of the note to be indicated (resp).
Example:
<otherMat><txt>This is a PDF version of the original questionnaire provided by the principal investigator.</txt>
<notes>Users should be aware that this questionnaire was modified during the CAI process.</notes></otherMat>

Optional
Not Repeatable
Attributes: ID, xml:lang, source, type, subject, level, responsibility
Contains: #PCDATA, Link to other element(s) within the codebook, reference to a table.


Table
<table> 5.4
Description: Tables may be inserted in Section 5. In XML editor software, the table capability will be activated in element 5.0. Machine-readable frequency tables, for example, could be appended to the DDI document in this section.


Citation
<citation> 5.5 (Generic element A.1)
Description: The citation for the other material. This element encodes the bibliographic information describing the other material, including title information, statement of responsibility, production and distribution information, series and version information, text of a preferred bibliographic citation, and notes (if any). It uses generic element A.6, found at the end of the DTD. A MARCURI attribute is provided to link to the MARC record for this citation.
Optional
Not Repeatable
Attributes: ID, xml:lang, source, MARCURI
Contains Elements: The full tree for the citation element is omitted for reasons of space. See Section 2.1, Citation of Study.


Other Study-Related Materials
<otherMat> 5.6
Other materials related to the study. Note: This element (5.6) is recursively defined to Other Material above (5.0).