Nuxeo cluster shared nuxeo.tmp.dir causing problems due to nuxeo-launcher jar naming contention
According to this answer, it is a best practice for nodes in a Nuxeo cluster to share their nuxeo.tmp.dir. When doing so, must each node in the cluster have its own tmpdir on the binary store filesystem? I am encountering nuxeo-launcher jar file naming collisions causing NFS stale file handle errors when multiple servers in a cluster share their tmpdir and I simultaneous invoke nuxeoctl operations (using Ansible) on all nodes in the cluster.
In cluster mode it's not recommended at all to share
nuxeo.tmp.dir, there are many libraries we don't control which could have a problem with it. This means in turn that you can't leverage the NXP-9361 no-copy optimizations…
On the other hand if the only problems you have are due to nuxeo-launcher jar file naming then we could fix this on our end and allow tmp sharing. Please open a JIRA ticket.
Edit: the simplest and surest way is probably to have a shared filesystem but make each node point its
nuxeo.tmp.dir to a different subdirectory in it.
In cluster mode, do you recommend nuxeo.tmp.dir be set to a cluster-node-unique directory on the shared file system in order to take advantage of NXP-9361? By default, java.io.tmpdir = nuxeo.tmp.dir, right?
Or in cluster mode, should java.io.tmpdir and nuxeo.tmp.dir be set independently? NXP-9361 says java.io.tmpdir should be on the shared file system. Should it be set to a cluster-node-unique directory there and nuxeo.tmp.dir be local?
nuxeoctl, the launcher in the tmp dir (which is there to allow the launcher to update itself) should be named
$RANDOMis randomly generated by bash and should be collision-free (although
mktempwould be better). Is that not the case for you? Please open a ticket if you have enough info for us to track this down.
nuxeo.tmp.dirshould be ok.
Since this configuration has the temp directory on the shared file system, I would expect the NXP-9361 optimization to be fully-functional, do you agree? In general, this seems like a safer configuration than trying to share a common nuxeo.tmp.dir across all nodes. What are your thoughts?
nuxeo.tmp.dirpoint to different parts of a shared filesystem depending on the node is a good way to solve the issue.
Even if nuxeo-launcher naming collisions were fixed, it seems risky for multiple nodes to share nuxeo.tmp.dir.