UIMA Teleconference September 28, 2007 1. Continuation of the PE metadata discussion Thomas' summary doc: we covered up to point 7 in the last telco. Issue 8 (5.5.1.5 page 58): Do we agree that extensions should be done in the explicit fashion via an extended ID and an EObject? Thilo: we will have to use this in Apache, and it seems obscure to the user why they have to go through this extra syntax. Thomas: intended for private extensions, e.g. for developers of component repositories who need to add component data that the apache implementation won't choke on. Meta-issue: do we even want to distinguish extensions from the base standard? Meta-meta-issue: is it the responsibility of the spec to even say anything about extensions? Approaches the spec can adopt re: extensions: 1) Nothing 2) Extend standard items 3) XMI extension 4) Extend the class model Action: improve write up, include in future discussion / vote Issue 9 (5.5. page 52) The compliance point in this document is slightly reworded compared to the spec (I corrected a typo and more explicitly distinguished components from frameworks). Do we agree that this form should be used? "A UIMA component may be required to publish Processing Element Metadata that conforms to this specification and a UIMA framework may be required to be able to consume metadata in this form." All in the telco agreed to accept this compliance point, but we need at least one more vote to achieve a majority (no quorum in meeting). 2. Discussion of Section 5.8 Aggregate Analytic Descriptor 1 (5.8 page 72) Discussion of potential correction: "An Aggregate Analytic" is a composition of two or more constituent analytics". There may be use case where even an aggregate of only one analytics may make sense. Consider the following example: A primitive PE is sufficient for a given analytic need but it needs to be (statically) reconfigured to do the job (e.g. in an extensive way by setting many switches or a significant way by providing a rule or model file to a generic annotator) Since PE descriptors do not allow to set configuration values there are two options for this: a) Making a copy of the PE descriptor and changing the default values or b) wrapping the PE into an aggregate containing the required setting and containing only a single delegate. Let's discuss if we think this is a valid use case, if we think b) is a valid way to address the use case and if we have any best practice recommendations for choosing a) or b). If b) is a valid way we should change the wording to "one or more" and explain the use case that can lead to single delegate aggregates. Also note: creating an aggregate provides another "level" for specifying behavioral metadata. Add a paragraph to the spec describing the use case; suggest using an aggregate of size 1. Alternative: configured PE descriptor, but so far we choose not to implement this (redundant and would have to be kept aligned with the aggregate descriptor anyway). 2 (5.8.1.3, page 73) Do we agree that mechanisms to specify inter-component dependencies beyond what is discussed in the chapter on "Composition" (input/output capabilities) are not going to be in the scope of this version of the spec? Answer: yes 3 Section 5.8.2.2. page 74/75 seems to imply that only wsdlUrl need to be supported. This means that to build an aggregate with two delegates 5 files need to be provided/configured (aggregate, 2 wsdl docs, 2 delegate descriptors). Inclusion of "native" delegates is a very important use case. It should be supported in an easy-to-use way. Shouldn't we allow for a more direct way or leave that to specific framework implementations. Adam: I think this raises a fundamental issue about the scope of the UIMA standard: Is the standard only about service interopability, or do we also say something about a standard format for "component descriptor files"? We touched on this a bit in the previous call as well. If the standard is about service interoperability, then deployed UIMA services would have to support a getMetaData call that returns the PE Metadata in the standard format. However, an implementation of a primitive component in some framework would not actually need to have a descriptor file on disk that contained the standard metadata. For example Apache UIMA could continue to use its own descriptor format, and a service wrapper could do the translation to the standard metadata format. A consequence of this as it relates to the point about aggregate descriptors is that there is no "primitive descriptor" that the aggregate could point to instead of the wsdl file. Aggregate descriptors are about composing UIMA services and so the only meaningful definition of a delegate is a wsdl file. (There is still the possiblity to allow an aggregate descriptor to point to another aggregate descriptor, in order to define a hierarchical aggregate, which still eventually bottoms out in wsdl files, but it's not clear this is necessary.) The advantages of defining this as a service interoperabilty specification only are that it would be a simpler spec and simpler to implement. This disadvantages are that it doesn't do much to encourage common tooling for editing UIMA standard descriptor files, or component repositories that would expect packaged components to include standard descriptor files as part of their packaging. An alternative view for the specification would say that we are concerned not only with service interoperability but also with common descriptor files. This could better facilitate common tooling and repository development. One way to do that would be for the spec to define a standard UIMA "Primitive Descriptor" file, which is an XML file with two top-level parts - the descriptive section (in standard PE metadata format) and the implementation section (which is not defined by the spec and varies between frameworks). Then, an Aggregate Descriptor could point to a Primitive Descriptor instead of a WSDL file. Such an Aggregate would only be runnable in a particular framework, but at least we'd have more agreement across frameworks about what descriptor files look like and where to look to find the standard UIMA metadata that must be common across frameworks. Apache UIMA could continue to supports its current descriptor format but also would need to support the standard one too (and ideally provide converters to translate existing descriptors to the new format.