[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: DITA-aware Importer Implemented for XIRUSS-T
I have implemented a basic DITA-aware importer for the XIRUSS-T system and provided enough packaging and documentation that you should be able to run it if you have ANT or Eclipse installed. The XIRUSS-T system is a GPL-licensed open-source project for demonstration and educational purposes *only*. It is not a product is expressly not appropriate for production use. This code is relevant to the DITA project for at least the following reasons: - It demonstrates (I hope) why having schemas bound to namespaces is important. Without this, content management systems (and any other generalized XML-aware system) cannot reliably associate document-type-specific processing with documents because there's no other way to know for sure that a given document is governed by a particular document type unless a human provides that association on a case-by-case basis. In the case of XIRUSS-T it is the use of DITA-specific namespaces that allows my code to reliably and automatically bind DITA documents to the DITA-aware importer in order to import maps and conrefing documents as complete compound documents. - It provides an example of generic, DITA-aware content management functionality that takes direct advantage of the DITA architecture mechanism. I haven't had a chance to test it yet, but the DITA importer should handle any document that derives from the DITA map or topic document types. - It provides an open-source sandbox with which others can experiment with more sophisticated or specialized DITA-aware processing. For example, it would probably be useful to build additional modules that support the processing of related links and other sophisticated linking features of DITA. Likewise, if you've created a task-specific specialized DITA-based document type it should be fairly clear how to quickly specialize the existing code to support your unique import requirements. The overall XIRUSS-T project site is http://xiruss-t.sourceforge.net. From there you can download the code distribution. To try it, just download the Zip file, unpack it somewhere (e.g., into a directory called "xiruss-t") and either do "ant xirussRunner" to start the XIRUSS-T server or set up an Eclipse package per the instructions on the Web site. When you start the server it automatically imports a bunch of files, including a couple of DITA maps that in turn refer to a bunch of topics, one of which does a conref=. The result is that all the files rooted at the maps are imported and show up in the repository with the appropriate dependencies captured. NOTE: This uses my modified namespace-based DITA schemas and instances. However, all I've changed is to add namespace declarations. Otherwise the documents are pure DITA. The XIRUSS system provides a general importer framework and that framework is used to implement the DITA-specific importer. The main classes involved are com.innodata.xiruss.bos.xml.dita.DitaBosMember, com.innodata.xiruss.bos.xml.XmlBosMember (of which DitaBosMember is a subclass), and com.innodata.xiruss.bos.BosMemberFactory, which constructs DitaBosMember objects for XML documents that use one of the DITA namespaces I've defined (either DITA/map or DITA/base). It is the DitaBosMember class that encapsulates knowledge of DITA topicref elements and conref= attributes. To see the results of the import, start the server and then open a Web browser to "http://localhost:9090/". You will see three links: resources, branches, and Repository dump. If you select branches and then the "dita stuff" branch, you will see two snapshots. The first snapshot reflects the import of "simple.ditamap", the second the import of "hierarchy.ditamap". If you navigate to a snapshot and then a version you can see the content of a file as the browser renders it (i.e., as an XML file). If you go to the repository dump you can see all the information and meta-information actually stored in the repository. The main repository page shows all the resources in the repository, all the versions (each version is associated with exactly one resource, a resource may have many versions), the branches, and the repository schema registry, which maps namespaces to schema instances. You should see the two DITA-related namespaces (which I made up for this experiment) mapped to map.xsd and ditabase.xsd (you'll have to navigate to the resource and then the version to see the original filename). If, through the repository dump, you navigate to the "dita stuff" branch and then to one of the snapshots, you can see the results of the dependencies established during import, which are used to reflect the "where-used" information. For example, if you find the entry for "organizing.xml" you'll see that it's used by "changingtheoil.xml", reflecting the conref= I created in changingtheoil. Likewise, if you go to the second snapshot within the "dita stuff" branch, you'll see that each of the topics is shown as being used by hierarchy.ditamap or simple.ditamap. Note that in the second snapshot some files appear to be in the repository twice. This is because on import you have to explicitly indicate that a given file is in fact a new version of an existing resource. I haven't done this for the import of hierarchy.ditamap [in practice this would be done either through use-case- or business-rule-specific heuristics, such as filename matching, CVS-like conventions, or through an interactive user interface for doing imports]. If you navigate to a version you will see all the properties of that version. For XML documents this includes XML-specific properties such as the root element type, namespaces used, and governing schema, if any. You will also see any dependencies from the version to other resources (in XIRUSS dependencies are always from versions to resources). For example, if you go to the version for hierarchy.ditamap you will see that there is one use-by-reference dependency for each topic referenced from the map, as well as a governed-by dependency to its governing schema. Also, for each version you can see the source bytes and, if the version is a text object, the text. Finally, if you examine any XML versions that include links, you will notice that the link address URLs have been rewritten to point to resources in the repository, reflecting the location of the target as imported. URLs of the form "res_00000026~onSnapshot" are relative URLs that are resolved relative to a specific snapshot. For example, to resolve a resource to a version on snapshot snap_00000007 you would use this fully-qualified URL: http://localhost:9090/snap_00000007/res_00000026~onSnapshot (Try this is almost any XML editor--it should just work, including fetching the schema. Unfortunately I haven't yet implemented rewriting of stylesheet PIs to point into the repository, so browsers may complain when you try to view some XML files.) Cheers, Eliot -- W. Eliot Kimber Professional Services Innodata Isogen 9030 Research Blvd, #410 Austin, TX 78758 (512) 372-8122 eliot@innodata-isogen.com www.innodata-isogen.com
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]