BLOBS in postgresql?

As the DBA for our postgresql server, I noticed that the database held BLOBS for use by the nuxeo service. Where and how are these blobs used? Why is nuxeo storing blobs in the database server, when the files themselves are stored locally or via S3? I see that these blobs have ID numbers, but these numbers show up only in the nxp_logs_extinfo table, and no where else. There's no reference to these AFAIK anywhere within the schema.

Are these blobs artifacts of an older version or possibly misconfigured system? Is there a way to determine what they actually are or mean?

The blob entries look like this:

nuxeo=# select loid,substring(encode(data,'escape') for 60) from pg_catalog.pg_largeobject ;
  loid  |                                   substring
--------+--------------------------------------------------------------------------------
 217768 | \254\355\000\x05sr\000\x17java.util.LinkedHashSet\330l\327Z\225\33
 217769 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217770 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217771 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217772 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217773 | \254\355\000\x05sr\000\x11java.util.HashMap\x05\x07\332\301\303\x16`\321\x03\0
 217774 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 217775 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298409 | \254\355\000\x05sr\000\x17java.util.LinkedHashSet\330l\327Z\225\33
 298410 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298411 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298412 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298413 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298414 | \254\355\000\x05sr\000\x11java.util.HashMap\x05\x07\332\301\303\x16`\321\x03\0
 298415 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
 298416 | \254\355\000\x05sr\000\x17java.util.LinkedHashMap4\300N\\\x10l\300\37
(16 rows)

Here's the correlation to the “documents”. I don't understand this, but it does look like some kind of internal object created by Nuxeo on behalf of users:

nuxeo=# select log_extinfo_blob,mapkey, log_event_category,log_event_comment,log_doc_path,log_doc_type,log_date from nxp_logs_extinfo,nxp_logs_mapextinfos,nxp_logs where discriminator = 'BLOB' and log_extinfo_id = info_fk and nxp_logs.log_id = log_fk ;
 log_extinfo_blob |      mapkey       | log_event_category | log_event_comment |                           log_doc_path
                        | log_doc_type  |        log_date
------------------+-------------------+--------------------+-------------------+------------------------------------------
------------------------+---------------+-------------------------
           217772 | workflowVariables | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:31:01.374
           217773 | data              | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:31:01.374
           217774 | nodeVariables     | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:31:01.374
           217775 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/04/19
/ParallelDocumentReview | DocumentRoute | 2016-04-19 18:31:01.385
           217768 | actors            | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:30:27.521
           217769 | workflowVariables | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:30:27.521
           217770 | nodeVariables     | Routing            |                   | /task-root/Task2556.1461083427150
                        | RoutingTask   | 2016-04-19 18:30:27.521
           217771 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/04/19
/ParallelDocumentReview | DocumentRoute | 2016-04-19 18:30:27.671
           298409 | actors            | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:08.527
           298410 | workflowVariables | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:08.527
           298411 | nodeVariables     | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:08.527
           298412 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/09/20
/ParallelDocumentReview | DocumentRoute | 2016-09-20 13:33:08.55
           298413 | workflowVariables | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:17.962
           298414 | data              | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:17.962
           298415 | nodeVariables     | Routing            |                   | /task-root/Task2556.1474371187953
                        | RoutingTask   | 2016-09-20 13:33:17.962
           298416 | workflowVariables | Routing            |                   | /document-route-instances-root/2016/09/20
/ParallelDocumentReview | DocumentRoute | 2016-09-20 13:33:17.993
(16 rows)
0 votes

2 answers

724 views

ANSWER



Ok thanks for the data and indeed there are blobs stored, which is something I wasn't expecting.

The blobs come from the workflow audit. The workflow logs audit event properties at each step, and some of these properties are not simple strings or integers and have to be stored as serialized Java objects.

1 votes



I'll open a ticket for improvement of this, we should store that as JSON in a text field. https://jira.nuxeo.com/browse/NXP-20975
11/09/2016

Thanks. I was going to suggest JSON as alternative, especially since newer PG handles json natively (or should I say, intelligently)
11/09/2016

FYI I discovered this while doing a dump of the database by a non-nuxeo user. I wanted a non-nuxeo user to ensure the user had only read-only access to the data. PG handles blobs a bit archaically, and the grants had to be handled especially.
11/09/2016

We probably won't use native JSON as this goes through a Hibernate/JPA abstraction layer and has to work on several different SQL databases. Also keep in mind that the default for recent versions of Nuxeo is to store the audit data in Elasticsearch and not the SQL database.
11/09/2016


There shouldn't be any blob in a PostgreSQL database used by Nuxeo unless you use nuxeo-core-binarymanager-sql. Please give details about the blobs you see with examples if possible.

1 votes



I updated my question with the relevant data.
11/09/2016