Using Bit Strings to optimize storage and retrieval of integer list types that have a length or maxLength facet
I have a few mandatory, limited-size, vocabulary-based, multi-valued properties that are part of my Base Document Type. In order to optimize document retrieval, I would like to minimize the number of tables that need to be joined. Normally, each of these multi-valued properties would be represented as a collection fragment table and therefore require an add'l table join on retrieval.
I have defined these limited-size vocabularies to have one-up integer IDs that represent a bit index. The Base Document Type properties that lookup into these vocabularies are defined as lists of integers with a maxLength facet. It would be ideal if Nuxeo provided an option to optimize storage and retrieval of these length-restricted, multi-valued integer properties using Bit strings. CMIS query “quantified comparison predicate” and “quantified in predicate” constructs could be translated into bitwise operators. NXQL queries could also be translated similarly.
Any thoughts or feedback regarding this optimization approach/technique?
Nice idea yes.
Don't you mean xs:minInclusive
and xs:maxInclusive
instead of xs:maxLength
though? Or xs:enumeration
for strings?
However automatically turning those into bit strings (if there are less than 32 or 64 values) would be a bit harsh, I'd think of using some separate configuration flag (maybe in the repository config) to turn that on. For many users (and some kinds of indexing) having the values explicitly stored in the columns is still needed.
But no plans to work on this for now…
Another optimization that's more likely to make it into an upcoming version of Nuxeo is to use array types for databases that support it (PostgreSQL, maybe Oracle) to store the a of values in a single row, and use specialized indexing and operators (<@
in PostgreSQL) to get to the values.
The array-based optimization you proposed is a more generic solution that may approach the performance of using bit strings. Does a JIRA exist for your proposed optimization?