Nuxeo and s3 storage blob provider integration support

Greetings-

Thanks so much for the support of my random questions, its been helpful so far. While deploying Nuxeo in AWS, we're exploring integrating s3 as a blob provider store for Nuxeo.

I'm inquiring on how to get nuxeo to store my asset in s3, right now it seems to be putting it in MongoDB's NoSQL store. Is this desired behavior? Is it only possible to do one or the other (s3 or mongoDB store?) or is it to do both (store in both s3 AND mongoDB?).

Do I need to mount the bucket with something like s3fuse to the OS to a directory path that is defined in nuxeo.conf for repository.binary.store? I've been reading the source here: https://github.com/nuxeo/cloudbinarymanager/blob/master/src/main/java/org/nuxeo/ecm/core/storage/sql/S3BinaryManager.java and it looks like it interacts with the AWS API to use the bucket, so I can only assume mounting it with some external fuse mount isn't needed.

Any direction is appreciated here.

current setup:

  • Nuxeo-Platform: v8.10 (6 nodes 3 AZ's)
  • Amazon-s3-Online-Storage Plugin(manually installed): 1.7.3
  • MongoDB: v3.2 ( 3 nodes, 1 Primary, 2 secondaries, 3 AZ)
  • RDS PostgreSQL: v9.5.4 (2 nodes, 1 Primary, 1 secondary, 2 AZ)
  • ElasticCache Redis: v2.8.23 (3 nodes, 3 AZ)
  • ElasticSearch: v2.3.5 (13 nodes, 3 Master, 3 Ingest, 3 Client, 4 Data, 3AZ)

nuxeo.conf contents:

##-----------------------------------------------------------------------------
## S3 Bucket settings
##-----------------------------------------------------------------------------
# set default binary manager
nuxeo.core.binarymanager=org.nuxeo.ecm.core.storage.sql.S3BinaryManager
# s3 endpoint
nuxeo.s3storage.endpoint=https://s3-us-west-2.amazonaws.com
# s3 bucket name
nuxeo.s3storage.bucket=some-valid-bucket-name
# s3 region
nuxeo.s3storage.region=us-west-2
# s3 sub directory name (Tier)
nuxeo.s3storage.bucket_prefix=nuxeo/ci/
# Files retrieved from S3 are cached locally for speed.
# bytes or with the standard KB, MB, GB or TB suffixes,
nuxeo.s3storage.cachesize=100MB
# maximum number of files in the cache
nuxeo.s3storage.cachecount=10000
# minimum age (in seconds) a file should have before being eligible for purge (the age is the time since last file access).
nuxeo.s3storage.cacheminage=3600
# configure downloads to be directly served to the user from S3 without going through Nuxeo
# nuxeo.s3storage.directdownload=true
# expire time is expressed in seconds (the default is one hour) and determines how long the generated S3 URLs are valid.
# Having short-lived URLs is better for security, but too short an expiration time could be problematic
# nuxeo.s3storage.directdownload.expire=3600
# internal S3 connection pool settings
# default 50 max connections
nuxeo.s3storage.connection.max=50
nuxeo.s3storage.connection.retry=3
nuxeo.s3storage.connection.timeout=50000
nuxeo.s3storage.socket.timeout=50000

##-----------------------------------------------------------------------------
## Clustering settings
##-----------------------------------------------------------------------------
repository.clustering.enabled=true
repository.clustering.id=167773811
repository.clustering.delay=1
repository.clustering.invalidation=redis
nuxeo.db.validationQuery=SELECT 1
### If clustering is activated, set
repository.binary.store=/opt/nuxeo/binaries

/opt/nuxeo/current/nxserver/config/default-repository-config.xml:

<component name="default-repository-config">
  <extension target="org.nuxeo.ecm.core.blob.BlobManager" point="configuration">
    <blobprovider name="default">
      <class>org.nuxeo.ecm.core.storage.sql.S3BinaryManager</class>
      <property name="path">/opt/nuxeo/binaries</property>
      <property name="key"></property>
    </blobprovider>
  </extension>
  <extension target="org.nuxeo.ecm.core.storage.mongodb.MongoDBRepositoryService"
      point="repository">
    <repository name="default" label="label.default.repository">
   <server>mongodb://primarydb01.somecompanyurl.internal,secondarydb01.somecompanyurl.internal,secondarydb02.somecompanyurl.internal/?maxPoolSize=200</server>
      <dbname>nuxeo</dbname>
      <fulltext disabled="false" searchDisabled="false" />
      <cache enabled="true" maxSize="1000" concurrencyLevel="10" ttl="10" />
      <clustering id="167773811" enabled="true">
        <invalidatorClass>org.nuxeo.ecm.core.redis.contribs.RedisDBSClusterInvalidator</invalidatorClass>
      </clustering>
    </repository>
  </extension>
</component>

server.log entries after importing an asset, note the ID.

2017-04-04 15:34:56,240 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:e55408da-a06b-48aa-814a-ab36395ed16d:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl]
2017-04-03 15:34:56,444 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:e55408da-a06b-48aa-814a-ab36395ed16d:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl]
2017-04-03 15:34:56,575 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:e55408da-a06b-48aa-814a-ab36395ed16d:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl]
2017-04-03 15:34:56,595 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:e55408da-a06b-48aa-814a-ab36395ed16d:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl]
2017-04-03 15:34:56,618 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:e55408da-a06b-48aa-814a-ab36395ed16d:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl]

My selecting the ID from the mongoDB to find the asset.

mongos> db.default.find({ "ecm:fulltextJobId" :"e55408da-a06b-48aa-814a-ab36395ed16d" })
{ "_id" : ObjectId("58deaf60eb4c4b0b6a4f5ad2"), "ecm:racl" : [ "Administrator", "members" ], "icon" : "/icons/image.gif", "dc:creator" : "Administrator", "ecm:parentId" : "ada5c892-ad99-498d-a919-466deac42ea5", "ecm:ancestorIds" : [ "00000000-0000-0000-0000-000000000000", "e464b908-3607-4ed4-8f23-69b914a00a8e", "2fc5d556-93dc-4392-8901-674de94a7e31", "ada5c892-ad99-498d-a919-466deac42ea5" ], "dc:modified" : ISODate("2017-04-03T19:34:56.635Z"), "ecm:minorVersion" : NumberLong(0), "dc:lastContributor" : "Administrator", "content" : { "data" : "3d4bf419cd8101dcbb3475fa4f7b7138", "name" : "tumblr_ni1rrfEuwn1rsxqqio1_500.gif", "mime-type" : "image/gif", "length" : NumberLong(1962696) }, "ecm:name" : "somethingelse", "ecm:majorVersion" : NumberLong(0), "ecm:lifeCyclePolicy" : "default", "size" : NumberLong(1962696), "dc:created" : ISODate("2017-04-03T19:34:56.043Z"), "dc:title" : "somethingelse", "ecm:primaryType" : "Picture", "ecm:id" : "e55408da-a06b-48aa-814a-ab36395ed16d", "ecm:lifeCycleState" : "project", "ecm:mixinTypes" : [ "Thumbnail" ], "views" : [ { "filename" : "Thumbnail_tumblr_ni1rrfEuwn1rsxqqio1_500.jpg", "width" : NumberLong(83), "description" : "Thumbnail size", "title" : "Thumbnail", "content" : { "data" : "2dadd7007b09362037c9497bbdedb93b", "name" : "Thumbnail_tumblr_ni1rrfEuwn1rsxqqio1_500.jpg", "mime-type" : "image/jpeg", "length" : NumberLong(4912), "digest" : "2dadd7007b09362037c9497bbdedb93b" }, "height" : NumberLong(100), "info" : { "colorSpace" : "sRGB", "depth" : NumberLong(8), "width" : NumberLong(83), "format" : "JPEG", "height" : NumberLong(99) } }, { "filename" : "Small_tumblr_ni1rrfEuwn1rsxqqio1_500.jpg", "width" : NumberLong(468), "description" : "Small size", "title" : "Small", "content" : { "data" : "1e89002b9f3bbee19af3ee7eec801045", "name" : "Small_tumblr_ni1rrfEuwn1rsxqqio1_500.jpg", "mime-type" : "image/jpeg", "length" : NumberLong(154654), "digest" : "1e89002b9f3bbee19af3ee7eec801045" }, "height" : NumberLong(560), "info" : { "colorSpace" : "sRGB", "depth" : NumberLong(8), "width" : NumberLong(468), "format" : "JPEG", "height" : NumberLong(560) } }, { "filename" : "Medium_tumblr_ni1rrfEuwn1rsxqqio1_500.jpg", "width" : NumberLong(495), "description" : "Medium size", "title" : "Medium", "content" : { "data" : "1f7c81b5e56561c43ecd424137ceba50", "name" : "Medium_tumblr_ni1rrfEuwn1rsxqqio1_500.jpg", "mime-type" : "image/jpeg", "length" : NumberLong(173691), "digest" : "1f7c81b5e56561c43ecd424137ceba50" }, "height" : NumberLong(592), "info" : { "colorSpace" : "sRGB", "depth" : NumberLong(8), "width" : NumberLong(495), "format" : "JPEG", "height" : NumberLong(592) } }, { "filename" : "FullHD_tumblr_ni1rrfEuwn1rsxqqio1_500.", "width" : NumberLong(495), "description" : "Full HD size", "title" : "FullHD", "content" : { "data" : "1f7c81b5e56561c43ecd424137ceba50", "name" : "FullHD_tumblr_ni1rrfEuwn1rsxqqio1_500.", "mime-type" : "image/jpeg", "length" : NumberLong(173691) }, "height" : NumberLong(592), "info" : { "colorSpace" : "sRGB", "depth" : NumberLong(8), "width" : NumberLong(495), "format" : "JPEG", "height" : NumberLong(592) } }, { "filename" : "OriginalJpeg_tumblr_ni1rrfEuwn1rsxqqio1_500.", "width" : NumberLong(495), "description" : "Original jpeg image", "title" : "OriginalJpeg", "content" : { "data" : "1f7c81b5e56561c43ecd424137ceba50", "name" : "OriginalJpeg_tumblr_ni1rrfEuwn1rsxqqio1_500.", "mime-type" : "image/jpeg", "length" : NumberLong(173691) }, "height" : NumberLong(592), "info" : { "colorSpace" : "sRGB", "depth" : NumberLong(8), "width" : NumberLong(495), "format" : "JPEG", "height" : NumberLong(592) } } ], "dc:contributors" : [ "Administrator" ], "ecm:fulltextSimple" : "icons image gif administrator administrator 3d4bf419cd8101dcbb3475fa4f7b7138 tumblr ni1rrfeuwn1rsxqqio1 500 gif image gif somethingelse somethingelse thumbnail tumblr ni1rrfeuwn1rsxqqio1 500 jpg thumbnail size thumbnail 2dadd7007b09362037c9497bbdedb93b thumbnail tumblr ni1rrfeuwn1rsxqqio1 500 jpg image jpeg 2dadd7007b09362037c9497bbdedb93b srgb jpeg small tumblr ni1rrfeuwn1rsxqqio1 500 jpg small size small 1e89002b9f3bbee19af3ee7eec801045 small tumblr ni1rrfeuwn1rsxqqio1 500 jpg image jpeg 1e89002b9f3bbee19af3ee7eec801045 srgb jpeg medium tumblr ni1rrfeuwn1rsxqqio1 500 jpg medium size medium 1f7c81b5e56561c43ecd424137ceba50 medium tumblr ni1rrfeuwn1rsxqqio1 500 jpg image jpeg 1f7c81b5e56561c43ecd424137ceba50 srgb jpeg fullhd tumblr ni1rrfeuwn1rsxqqio1 500 full hd size fullhd 1f7c81b5e56561c43ecd424137ceba50 fullhd tumblr ni1rrfeuwn1rsxqqio1 500 image jpeg srgb jpeg originaljpeg tumblr ni1rrfeuwn1rsxqqio1 500 original jpeg image originaljpeg 1f7c81b5e56561c43ecd424137ceba50 originaljpeg tumblr ni1rrfeuwn1rsxqqio1 500 image jpeg srgb jpeg srgb gif administrator", "ecm:fulltextBinary" : "", "ecm:fulltextJobId" : "e55408da-a06b-48aa-814a-ab36395ed16d", "info" : { "colorSpace" : "sRGB", "depth" : NumberLong(8), "width" : NumberLong(495), "format" : "GIF", "height" : NumberLong(592) } }
mongos>

However I never see the s3 bucket being leveraged for store.

0 votes

1 answers

1325 views

ANSWER



Your S3 configuration seems correct.

You have in your document for instance "content" : { "data" : "3d4bf419cd8101dcbb3475fa4f7b7138", "name" : "tumblr_ni1rrfEuwn1rsxqqio1_500.gif", "mime-type" : "image/gif", "length" : NumberLong(1962696) } and the data value corresponds to the key used to store the binary itself in the S3 bucket.

You should make sure you're checking the right bucket.

0 votes



That is the thing, is I've looked at /all/ the buckets under our account I don't find that value anywhere. I look at the bucket that I have specified as the bucket prefix nuxeo.conf, and there is nothing there. This is in 8.10. I do see that the s3.binarymanager is initialised in the logs at startup.
04/07/2017

You should activate DEBUG logs for org.nuxeo.ecm.core.storage.sql.S3BinaryManager, you should then see logs like storing blob 3d4bf419cd8101dcbb3475fa4f7b7138 to S3 then either blob 3d4bf419cd8101dcbb3475fa4f7b7138 is already in S3 or stored blob 3d4bf419cd8101dcbb3475fa4f7b7138 to S3 in 123ms, and on the reading side fetching blob 3d4bf419cd8101dcbb3475fa4f7b7138 from S3 then fetched blob 3d4bf419cd8101dcbb3475fa4f7b7138 from S3 in 123ms
04/08/2017

Thank you for the response, sorry for the delay in getting back to you. I added DEBUG to the package you mentioned see below. I'm also including the output that is consistently produced when I import a random asset, thoughts?

&lt;category name=&quot;org.nuxeo.ecm.core.storage.sql.S3BinaryManager&quot;&gt;

`&lt;priority value=&quot;DEBUG&quot; /&gt;`

&lt;/category&gt;

&lt;category name=&quot;org.nuxeo&quot;&gt;

 `&lt;priority value=&quot;WARN&quot; /&gt;`

&lt;/category&gt;

2017-04-11 16:57:05,782 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:c06e0039-fce1-4185-a3ec-0e8c130ad7a8:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl] 2017-04-11 16:57:05,847 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:c06e0039-fce1-4185-a3ec-0e8c130ad7a8:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl] 2017-04-11 16:57:05,866 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:c06e0039-fce1-4185-a3ec-0e8c130ad7a8:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl] 2017-04-11 16:57:05,887 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:c06e0039-fce1-4185-a3ec-0e8c130ad7a8:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl] 2017-04-11 16:57:05,905 INFO [Nuxeo-Work-pictureViewsGeneration-1:default:c06e0039-fce1-4185-a3ec-0e8c130ad7a8:file:content:pictureView] [org.nuxeo.ecm.automation.core.impl.OperationServiceImpl]

04/11/2017

Sorry, you'll have to debug using breakpoints to know more. The call tree to look for when you do a CoreSession.saveDocument, which which what happens when a Blob is saved, is:

setValueComplex(T, Field, Object) : void - org.nuxeo.ecm.core.storage.BaseDocument
setValueBlob(T, Blob) : void - org.nuxeo.ecm.core.storage.BaseDocument
writeBlob(Blob, Document) : String - org.nuxeo.ecm.core.blob.BlobManagerComponent
writeBlob(Blob, Document) : String - org.nuxeo.ecm.core.storage.sql.S3BinaryManager
writeBlob(Blob, Document) : String - org.nuxeo.ecm.core.blob.binary.BinaryBlobProvider
getBinary(Blob) : Binary - org.nuxeo.ecm.core.blob.binary.AbstractBinaryManager
getBinary(InputStream) : Binary - org.nuxeo.ecm.core.blob.binary.CachingBinaryManager
storeFile(String, File) : void - org.nuxeo.ecm.core.storage.sql.S3BinaryManager.S3FileStorage

Breakpoint on each of those and find where there is a deviation from what is supposed to be executed.

04/11/2017