How are duplicate files stored when using Amazon S3?
I have been told that Nuxeo does NOT store multiple copies of the same document, and just uses links. I understand that Nuxeo VCS has a duplication checker.
We are using Nuxeo as the DM, with a PostGres DB running on Amazon cloud and using Amazon S3 for storage.
Under this configuration, does Nuxeo still just store just one copy of the document, or does it store multiple copies of a document?
What about in a multi-tenant environment using the same DM & Amazon-instance. If two users upload the same document, does each user get a complete copy, or is there just one copy shared by multiple users?
We have been told various versions of how this works, and would like to find out the real answer!
Yes, Nuxeo uses deduplication for any content storage backend. It's true for the standard filesystem-based storage, the Amazon S3 storage, or the RDBMS-based storage.
You can even plug your own storage backend if needed, and the (simple) BinaryManager APIs it needs to implement will make it automatically deduplicate content.
Deduplication is global to a given repository; as the standard multi-tenant configuration uses a single repository, if several users upload the same document, space for only a single one will be used in S3.