OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) Technical Committee

The original Call For Participation for this TC may be found at https://www.oasis-open.org/news/announcements/call-for-participation-oasis-lexicographic-infrastructure-data-model-and-api-lexi.

  1. Name of the TC

    Lexicographic Infrastructure Data Model and API (LEXIDMA)

  2. Statement of Purpose

    This committee’s high level purpose is to create an open standards based framework for internationally interoperable lexicographic work. This TC will be describing and defining standard serialization independent interchange objects based predominantly on state of the art in the lexicographic industry. Defining specific serializations, transaction models, standard interfaces, and web services based on the defined objects and object models is also in scope as far as it facilitates the high level purpose set out here. This TC aims to develop the lexicographic infrastructure as part of a broader ecosystem of standards employed in Natural Language Processing (NLP), language services, and Semantic Web.

    Business Benefits

    The key business benefit LEXIDMA deliverables aim for is to provide a simple, modular, and easy to adopt data model that will be attractive for all lexicographic industry actors across companies and academia as well as geographic locations. Adoption of that model will facilitate exchange of lexicographic and linguistic corpus data globally and also enable effective exchange with adjacent industries such as language services, terminology management, or technical writing. Semantic interoperability of lexicographic data should help the global lexicographic industry to surpass its current model of creating and curating lexicographic deliverables (such as prominently multi- and monolingual dictionaries) and corpora in linguistically a geographically demarcated silos and create a truly global market for lexicographic data exchange across and among languages and locales.

  3. Scope of Work

    The following items belong to the Scope of Work and are expected to be refined as the TC gains additional insights into evolving and culturally diverse lexicographic use cases. Members will gather insights and requirements from consultations with the wider community of industry stakeholders, annual symposia, questionnaires, etc. and use these insights to produce concrete technical deliverables.

    • Define and maintain a serialization independent Data Model for globally applicable use cases in lexicography.
    • Define and maintain XML, JSON, RDF, and other serializations, as industry or academic needs arise, of the said lexicographic data model.
    • Define specific standard Application Interfaces (API) and abstract service architectures for various serializations of the lexicographic data model in concert with other related standards and formats (such as TEI, LMF, RDF, JSON-LD, XLIFF, ITS, TBX, etc.) and prominent data models in adjacent industries and verticals, such as terminology management, translation services, web publishing, etc.
    • Define and describe lossless or nearly lossless mappings between the lexicographic data model and its native normative serializations (developed by this committee) with other common industry and academic serializations such as, prominently, Ontolex-Lemon and TEI Lex-0, define those mappings both in an abstract way and for specific serializations as the need arises.
    • Define and describe informative best practices and abstract services architecture recommendations with regards to usage of the LEXIDMA TC normative deliverables in the lexicographic industry and adjacent industries, terminology management, translation services, web publishing, etc.
  4. Deliverables

    The following are high priority technical goals that should be addressed by development of one or more deliverables on OASIS standards track or as committee notes within 24 months from TC initiation:

    • Serialization independent Data Model for Lexicography (DMLex)
    • XML serialization of DMLex
    • JSON serialization of DMLex
    • RDF serialization of DMLex
    • Informative Ontolex-Lemon mapping
    • Informative TEI Lex-0 mapping

    Work on the following may start during the work on addressing of the above high priority goals deliverables or later on given the general sense of urgency for those within the lexicographic industry:

    • Reference architecture
    • APIs with various bindings
  5. IPR Mode

    This TC will operate under the Non-Assertion IPR mode as defined in Section 10.3 of the OASIS IPR Policy document.

  6. Audience

    The expected audience for the work of the LEXIDMA TC includes but is not limited to:

    • Lexicographers
    • Terminologists
    • Multilingual content and software architects and strategists, multilingual content publishers
    • NLP services architects and developers
    • Owners and managers of lexicographic content
    • Software providers for lexicography, corpus management, etc. including producers of language technology components
    • Technical communicators employing lexicographic tools or linguistic corpora in the process of multilingual publishing of their content
    • Translation service providers and freelance translators who need to use lexicographic tools or products in order to deliver their services
  7. Language

    English (UK spelling)