Massive (1000 000 docs) import of documents without ACLs with Nuxeo 5.6 and REST

Environment

  • OS: Windows server 2008R2 service pack 1
  • Java : JDK 1.7 64 bits
  • Nuxeo server: heap memory = 999 Mo
  • Nuxeo base
    • contents : 100 000 documents
    • search full index disabled
    • document created without ACLs

Scenario

a) After creation of 20 000 more documents

  • there is '2014-02-07 15:36:59,241 WARN [org.nuxeo.ecm.core.event.tx.PostCommitSynchronousRunner] PostCommitListeners are too slow'

  • the speed of the document creation is 12 100 docs/hour while it was 25 000 at the beginning.

b) Then after creation of 40 000 more documents

  • the speed of the document creation is 3500 docs/hour
  • there is a “java heap space out of memory”

Questions

1) How to avoid the message 'PostCommitListeners are too slow' and keep a speed at least 20 000 docs/hour to import 1 000 000 docs in a reasonable time ?

2) How to improve the speed of the document creation knowing that our software can supply Nuxeo with 500 000 docs/hour ?

3) How to avoid “java heap space out of memory” ?

Why Nuxeo uses so much memory while

  • we only create documents one by one
  • and search if document exists before creation ?
1 votes

1 answers

1829 views

ANSWER



You might want to take a look at Nuxeo's bulk importer to achieve your desired result. Or write your own custom importer for even more control over the import process, transactions, etc.

1 votes