Using Bit Strings to optimize storage and retrieval of integer list types that have a length or maxLength facet

I have a few mandatory, limited-size, vocabulary-based, multi-valued properties that are part of my Base Document Type. In order to optimize document retrieval, I would like to minimize the number of tables that need to be joined. Normally, each of these multi-valued properties would be represented as a collection fragment table and therefore require an add'l table join on retrieval.

I have defined these limited-size vocabularies to have one-up integer IDs that represent a bit index. The Base Document Type properties that lookup into these vocabularies are defined as lists of integers with a maxLength facet. It would be ideal if Nuxeo provided an option to optimize storage and retrieval of these length-restricted, multi-valued integer properties using Bit strings. CMIS query “quantified comparison predicate” and “quantified in predicate” constructs could be translated into bitwise operators. NXQL queries could also be translated similarly.

Any thoughts or feedback regarding this optimization approach/technique?

0 votes

1 answers

2279 views

ANSWER



Nice idea yes.

Don't you mean xs:minInclusive and xs:maxInclusive instead of xs:maxLength though? Or xs:enumeration for strings?

However automatically turning those into bit strings (if there are less than 32 or 64 values) would be a bit harsh, I'd think of using some separate configuration flag (maybe in the repository config) to turn that on. For many users (and some kinds of indexing) having the values explicitly stored in the columns is still needed.

But no plans to work on this for now…

Another optimization that's more likely to make it into an upcoming version of Nuxeo is to use array types for databases that support it (PostgreSQL, maybe Oracle) to store the a of values in a single row, and use specialized indexing and operators (<@ in PostgreSQL) to get to the values.

0 votes



ron
The xs:maxLength constraint applies to the xs:list itself. As you suggest, it would also make sense to constrain the integers in the list using the xs:maxInclusive constraint. In this case, these two constraints would have the same value, the maximum entries in the list and greatest ID value.

The array-based optimization you proposed is a more generic solution that may approach the performance of using bit strings. Does a JIRA exist for your proposed optimization?

07/18/2012

Understood. No, we don't have a JIRA issue for the array optimisation yet.
07/18/2012