What is a typical update scenario with zero downtime look like for a nuxeo cluster
Greetings-
While evaluating the Nuxeo platform in AWS I've come across a few questions. I'm mostly curious of the following:
Is it possible to deploy Nuxeo updates with zero downtime in a clustered environment, think blue/green deployments.
How does a single Nuxeo cluster behave with a shared database and other Nuxeo servers with different versions of the Nuxeo app running at the same time (think rolling deployments), in other words can Nuxeo Platform handle two versions of Nuxeo running against a particular version of the database(s) at the same time?
What happens if there is a schema update during a Nuxeo update, which is ran against one node in a Nuxeo Cluster, and that schema is changed while another older Nuxeo app running on a given server is accessing that schema? Are these changes backward compatible?
Is it or will it be possible to have Nuxeo initiate an update on /all/ nodes by responding to an event on the message bus when a node in a given cluster is updated to a given patch? This stems from the current perception that each individual node must be updated on its own.
Zero-downtime upgrades are mostly down to proper orchestrating the shutdown, upgrade and restart of nodes and the coordination with the load balancer.
If a shared database is used by several nodes with different software versions, then this will work only if the configurations don't conflict with each other. This is true of any system, not only Nuxeo. In the case of schemas changes, adding or remove fields from a schema is not an issue as a node that doesn't know a field will ignore it on read even if it exists in the database, and a node that expects a field but does not see it in the database will read a null instead.
About your last question, we're working on having better automation for what you describe. It's doable “by hand” today but isn't completely automated and integrated into an easy-to-use tool. But an external orchestrator is usually the best solution for this, as it's usually the component that knows about software versions and that expects to update them.
'If a shared database is used by several nodes with different software versions, then this will work only if the configurations don't conflict with each other'
– How often will Nuxeo patches require a configuration update to say nuxeo.conf or some others configuration file that would cause a given nodes' configuration to diverge from how the other nodes are setup? I do know that Nuxeo will regenerate all of its configuration at restart if you have it set in nuxeo.conf.
Finally, for 'a node that expects a field but does not see it in the database will read a null instead'
– Is this null guaranteed to be handled in all cases? I realize that is a bold assertion to make.
As for the orchestrator, I am okay with this I just wanted to make sure I wasn't re-engineering the wheel if there was work already being made or done to make this a bit cleaner.
Reading nulls for undefined properties is regular Nuxeo behavior (for SQL storage for instance when there's no data in a table's row for that document we don't even create the row in some cases) and is always handled.