[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]
Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile")
Unless I misread, I do not think you specified wildcards or "any" value, e.g. "*" in an ldap search filter. Where would wildcards fit into your search schemes ? There have been a couple of conversations in HE regarding a simple and generic search syntax, so this conversation is quite timely. On Tue, Apr 5, 2011 at 10:09 AM, Gary Cole <gary.cole@oracle.com> wrote: > Anil, after reflecting on this briefly, one thing may be misleading about > the way I ranked schemes. See inline below. > > On Apr 5, 2011, at 9:38 AM, Gary Cole wrote: > >> Anil, I think I can net this out based on my experience. (I've done this >> many times with many different back-ends.) >> >> 1) The biggest simplification you can make is constraining logical >> operators to 'AND' alone. >> - 'NOT' and 'OR' can be both complex and inefficient. >> - While people often think that they want 'NOT', they usually need only a >> NOT_EQUAL (NE or !=) matching-operator. >> - Most of the complexity and processing burden lies in evaluating >> arbitrarily nested clauses. >> - If you support only 'AND', then there's no need for a client to specify >> a logical operator, >> which simplifies both syntax and semantics. >> >> RECOMMENDATIONS: >> 1A) If you can get by with 'AND' alone, then by all means do so! This >> saves you the most. >> 1B) Next best approach is to support INNER-ANDS and one level of >> OUTER-ORS. >> -- Any nesting of ANDs and ORs can be put into this form. >> -- DBMS do this wherever possible in order to optimize evaluation. >> -- Technically, client could get the same result by issuing a >> separate query for each ORed set of clauses. >> 1C) Add a NE operator ("!=" or "<>") rather than supporting a logical >> operator 'NOT'. >> >> 2) As far as matching-operators: >> - EQUALS is most necessary and most efficient. >> - GT, GTE, LT, LTE are often helpful and are usually efficient. >> - STARTS_WITH is very commonly desired for string-valued attributes (and >> is efficient). >> - CONTAINS and ENDS_WITH are sometimes desired, but are usually >> inefficient. >> >> RECOMMENDATIONS: >> 2A) If you can get by with 'EQUALS' alone, then do so. Otherwise, specify >> GT, GTE, LT, LTE (and perhaps NE). >> 2B) Don't require CONTAINS or END_WITH unless you truly need them. >> 2C) May not need STARTS_WITH; a GTE will do roughly the same thing for >> string values. >> >> 3) With EQUALS, GT, GTE, LT, LTE, you must decide whether comparison is >> always LEXICAL or is ARITHMETIC for numeric values: >> - Lexically, "15" < "5" >> - Arithmetically, 15 > 5. >> Reviewing the approaches I've seen: >> 3A) Comparison is arithmetic when comparing a specified value to an >> attribute with a numeric syntax. >> 3B) Comparison is always lexical (i.e., we treat everything as a string). >> 3C) Define separate operators for lexical and arithmetic comparisons and >> validate >> (e.g., throw errors when an arithmetic operator is applied to a >> string-valued attribute). >> >> Collecting these into schemes ranked by levels of simplicity (which is a >> new thought exercise for me): >> 0. One attribute per query (no AND) and EQUALS is the only matching >> operator. >> 1. AND and EQUALS only. (AND is implicit). >> 2. AND and EQUALS only. (AND is explicit). >> 2. AND, EQUALS, GT, GTE, LT, LTE. >> 3. AND, EQUALS, GT, GTE, LT, LTE, NE. >> 4. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH. >> 5. AND, EQUALS, GT, GTE, LT, LTE, NE, STARTS_WITH, END_WITH, CONTAINS. > > >> 10. INNER ANDS and one level of OUTER ORs (plus all matching operators) > > Adding support for OUTER ORs increases complexity by at least one order of > magnitude. > >> 100. Nestable AND, OR and NOT (plus all matching operators). > > Adding support for arbitrarily nested ANDs, ORs and NOTs increases > complexity by at least two orders of magnitude (and by more if the back-end > does not natively support such constructs). > >> >> Details below after my signature. >> >> Gary >> >> <Details> >> >> Supporting multiple, arbitrarily nested logical operators is complex and >> very expensive: >> - Client must build (and must structure properly) a more complex request >> - Server must parse (and must evaluate properly) a more complex request >> - Implementing NOT on some back-ends (or for nested clauses) >> is very complex for (and imposes an inordinate processing burden on) the >> server. >> - Implementing OR on some back-ends is complex and imposes a significant >> processing burden on the server. >> - Clients can send inefficiently-structured queries, which may tempt the >> server to optimize queries, adding more complexity. >> >> Operators for matching deserve some thought: >> - EQ is always necessary and is simple semantically once you specify >> case-sensitivity (in this case, case-insensitive). >> ** Must specify case-sensitivity ** >> - GT is very helpful, especially for ordering results or retrieving in >> chunks. >> ** Must specify whether comparison is lexical and when (if ever) >> comparison is arithmetic. ** >> - LT is used less-often than GT, but if you're supporting GT doesn't add >> much difficulty. >> - GTE is turns out to be helpful when ordering results or retrieving in >> chunks. >> (Again, if you're supporting GT, GTE doesn't add much effort.) >> - LTE is the same. If you're supporting GT/GTE, might as well support >> LTE. >> - 'STARTS_WITH' is very commonly used with strings. Implementations are >> generally efficient. >> - 'CONTAINS' is next-most-frequently-requested, but implementation is >> usually inefficient. >> - 'ENDS_WITH' is sometimes requested, but implementation is almost as bad >> as CONTAINS. >> >> </Details> >> >> >> On Apr 5, 2011, at 6:32 AM, John, Anil wrote: >> >>> A correction to my orignal e-mail below: >>> >>> "...authoritative sources of data that have existing processes in place >>> for Attribute Management and as such Updates/Deletes etc are *NOT* >>> permitted" [via the SPML interface] >>> >>> Yes, this would be the minima needed to support conformance to the >>> profile. If you support more, that would be a good differentiator for the >>> implementation. >>> >>> As to support for logical operators beyond 'AND' and the set of >>> matching-operators, this is where I need a bit of help.. Ideally I would >>> like to have support for AND/OR/NOT combined with (=)/(>)/(<)/(>=)/(<=) >>> applied to a specific subset of case-insensitive attributes, but I don't >>> have a sense of how expensive the operations are to implement. Would >>> appreciate some feedback on that point. >>> >>> Regards, >>> >>> - Anil >>> >>> ________________________________________ >>> From: Gary Cole [gary.cole@oracle.com] >>> Sent: Monday, April 04, 2011 6:03 PM >>> To: John, Anil >>> Cc: Smith, Thomas C.; OASIS PSTC >>> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile") >>> >>> Thanks! I think I begin to understand. Let me play it back to you to be >>> sure I have it right. The goal is to simplify search() and constrain it so >>> that, rather than being open-ended, search becomes simpler to implement and >>> to test. The first dimension to constrain is that of supported logical >>> operators: AND would be the only operator. Another dimension to constrain >>> would be the set of queryable attributes: here a provider would need a way >>> to advertise the subset of attributes on which it supports search. >>> >>> Do I have that right? If so, let me ask a few more questions. >>> >>> You didn't mention constraining the set of matching-operators; would you >>> want to limit/require certain of these? I assume that you'd want equals >>> (=). Would you also want greater-than (>), less-than (<), >>> greater-than-or-equal-to (>=), less-than-or-equal-to (<=)? What about >>> startsWith, endsWith, contains? Could you get by with just equals and >>> startsWith? (The operators endsWith and contains tend to be rather >>> expensive and inefficient.) For matching-operators on alphabetic values, >>> there's the further issue of case-sensitivity. Would you be prefer to >>> specify case-insensitive matching or case-sensitive matching? >>> >>> Would this be specifying (in effect) minima? That is, you would not >>> object if a provider supported more in the way of search than you require, >>> as long as the specified behavior is supported in a standard way, right? >>> >>> On Apr 4, 2011, at 9:01 AM, John, Anil wrote: >>> >>> My perspective on this is being driven by the need to implement an >>> batch/occasionally-connected interface to an Attribute Provider (AP) that >>> uses SPML as the interface specification. The motivator for the “Read Only” >>> portion is that AP is fronting authoritative sources of data that have >>> existing processes in place for Attribute Management and as such >>> Updates/Deletes etc are permitted. >>> >>> The assumption in this case is that the “Attribute Contract” that is >>> supported by the AP is known and fixed.. i.e. There is a finite set of >>> attributes that are exposed via this interface and are advertised via the >>> listTargets operations >>> >>> At the same time, one of the items that came out of the Burton Group >>> discussions around SPML was that implementing a provider that that supported >>> all permutations of the ‘and’ or ‘or’ and ‘not’ operators combined with all >>> attributes was non-trivial which seems to have ended with little to no >>> support and *no way to verify what support existed* in individual products. >>> >>> So, what I’d like to see in this read only profile is a way to provide a >>> mechanism that limits Search using a combination of specific operators and >>> attributes. i.e. I will allow queries that allow only specific operators >>> combined with specific clauses. E.g. Allow only ‘and’ operations on >>> attributes X, Y and Z. >>> >>> The two use cases that are expected to be enabled by this are: >>> >>> 1) Ability to query the AP to retrieve attributes of multiple users >>> all in one shot, potentially for provisioning use cases >>> 2) Ability to do a one way synch from the AP to a local system.. >>> i.e. The AP will always be the master system that will overwrite the local >>> store >>> >>> The key here is to make sure that the profile itself is constrained >>> enough that it is implementable and testable. >>> >>> I am not sure if I answered your question to the level of detail you are >>> looking for but I am hoping that there is general interest in such a >>> capability. >>> >>> Regards, >>> >>> - Anil >>> >>> >>> >>> From: Gary Cole [mailto:gary.cole@oracle.com] >>> Sent: Tuesday, March 29, 2011 2:44 PM >>> To: John, Anil >>> Cc: Smith, Thomas C.; OASIS PSTC >>> Subject: Re: [provision] RE: Batch Lookup (was "Re: ReadOnlyProfile") >>> >>> In what ways would you constrain search? Would these constraints be >>> minima, maxima or both? >>> >>> A provider constrains the set of object-classes for which the provider >>> supports search in the listTargetsResponse. >>> >>> Search as specified in SPMLv2 is at heart a subset of LDAP search: >>> - scope: 'pso', 'oneLevel' or 'subTree' >>> - operators: 'and', 'or', and 'not' >>> - clauses: dependent on profile or provider, but examples show >>> <attribute-name> = <attribute-value>. >>> >>> The DSML profile looks especially LDAP-like, since DSML is basically >>> LDAP wrapped in a bunch of XML tags. Nothing in the specification of the >>> base protocol nor the DSML profile requires a provider to support fancier >>> search than it wishes to provide. The provider simply returns an error if: >>> >>> • The provider cannot evaluate an instance of {QueryClauseType} that >>> the instance of {SearchQueryType} contains. >>> >>> • The open content of the instance of {SearchQueryType} is too >>> complex for the provider to evaluate. >>> >>> In short, it's entirely up to the provider how fancy to get with >>> search(). We figured that market-pressures would incent each implementer to >>> support search appropriately in its provider. For example, SIM supported >>> search on every type of object. OIM 9.x supports DSML search for users. >>> >>> Gary >>> >>> On Mar 29, 2011, at 8:39 AM, John, Anil wrote: >>> >>> >>> Gary, >>> >>> >>> Search gives you by default the equivalent of a batch lookup. >>> >>> So it does, and a constrained-search profile would meet the functionality >>> we are looking for (based on your description below). Our perspective was >>> shaped in a lot of ways by the reluctance of product implementations to >>> implement anything beyond the basic operations on a SPML provider. Search >>> is an optional capability (and so is batch, but thought that it would be >>> easier to make a case for it). >>> >>> I would be interested to get the perspective of folks who are product >>> implementors to see what would be "easier" to implement for them going >>> forward. At the end of the road, we are looking for something that will >>> exist in real-life within products and not just as shelf-ware. >>> >>> Regards, >>> >>> - Anil >>> >>> ________________________________________ >>> From: Gary Cole [gary.cole@oracle.com<mailto:gary.cole@oracle.com>] >>> Sent: Tuesday, March 29, 2011 9:20 AM >>> To: John, Anil >>> Cc: Smith, Thomas C.; OASIS PSTC >>> Subject: Batch Lookup (was "Re: ReadOnlyProfile") >>> >>> Anil, >>> >>> On Mar 29, 2011, at 7:43 AM, John, Anil wrote: >>> >>> We also need to re-read the specs to see if there is overlap between >>> lookup() and search on what we need to accomplish. >>> >>> Remind me again please what you need to accomplish. I may be able to >>> help. >>> >>> For instance, I may be able to clarify something about your >>> requirements for "SPML Operations on an Attribute Service". You >>> originally thought that you needed "batch pull" capabilities because >>> SAML Attribute Query could not answer the following questions: >>> * "Give me the unique id's of all users with Attribute X" >>> * "For all users (whose unique id's I just got), give me listing of >>> attributes for each (in one shot)" >>> >>> SPML's Search Capability (section 3.6.7.1 of the main spec) gives you >>> all of that in one shot. You can request one search() and in that >>> request use the 'returnData' attribute to specify how much information >>> you want back for each matching object: nothing, identifier-only, >>> data (which would include all schema-defined attributes) or >>> everything, which would add capability-specific data to schema-defined >>> data. Another parameter allows you to specify which capabilities >>> interest you. In your case, you would specify "returnData='data'", so >>> that you would get all of the attributes. Or you could take the >>> default, which is 'everything'. Unless you have capability-specific >>> data, 'everything' is equivalent to 'data'. A client can also specify >>> a maximum limit on the number of matching objects to return. >>> >>> The Provider may send all of the matching objects in a single >>> SearchResult, or the provider may break the results into chunks that >>> the requestor can iterate. Logically, it's still part of a single >>> search result, although a series of iterate() requests may be >>> necessary to return all matching objects. >>> >>> So, please help me to understand what a batch operation would add to >>> this? Search gives you by default the equivalent of a batch lookup. >>> >>> Gary >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe from this mail list, you must leave the OASIS TC that >>> generates this mail. Follow this link to all your TCs in OASIS at: >>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe from this mail list, you must leave the OASIS TC that >>> generates this mail. Follow this link to all your TCs in OASIS at: >>> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe from this mail list, you must leave the OASIS TC that >> generates this mail. Follow this link to all your TCs in OASIS at: >> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >> > > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]