lazy binaries from cloud providers (s3) are read prematurely

we're using the cloud plugin to read blobs from s3

we noticed a loads of queries for metadata toward our s3 bucket, for example if we ask the document model for one attachment and load the files schema the cloud provider goes and read the metadata for all attached files

This is forced by the BinaryBlobProvider at method readBlob (line 73) where it asks for the LazyBinary length, making it less lazy than it could be.

We tried reading attachments bot from a document model and from the document.get operation, with the same result.

is there any other operation that fetches a Blob without going through the readBlob calling getLength()?

0 votes

1 answers

1651 views

ANSWER



Hi,

What Nuxeo version are you using? There have been fixes very recently to improve the way with deal with lazy binaries and the fetching of the length metadata in some cases (NXP-18369).

To answer your last question, at this time the getLength() is always called, but usually this hits the local cache of S3 files so doesn't need to use the network as often as one could fear. There could still be improvements though, tell me if you still see the issue with a version of Nuxeo marked as fixed in the above ticket.

0 votes



yeah we noticed that the local caches were using up space and recently expanded the server disks for that, we then moved to having the clients fetch the binaries themselves. I'm on 7.3, will try 10 but I'll also try changing the binary provider. It has a reference to the binary so instead of caching the length there I'll just change it to get the length from the binary only when asked.

what's the process for getting a patch posted to the nuxeo dev team?

12/14/2015

The preferred way is for you to send us pull requests via github. For patches that add non-trivial functionality and aren't just small fixes we'll ask you to sign a small contributor agreement.
12/14/2015