Batch uploading and attaching a document fails in a clustered configuration -- how can it be fixed?


I have been following the “Blob Upload for Batch Processing” directions with some success in development, but I have run into a very severe problem when attempting to use it in a production configuration in a two-node cluster.

I believe the root of the problem is described in that documentation. “The files attached to the batch are stored on a temporary disk storage (inside until the batch is executed or dropped.” – this means that batch-uploaded files are only available on the original node. So given the following configuration:

  • load balancer
  • node A
  • node B

If one uses to access the batch processing and document modification API as documented, then one is quite likely to upload the file to and attempt to use it from, which results in an error because does not have access to that file.

As a temporary workaround we can directly use the address for everything, but this is certainly not an ideal solution. What else can we do? We are using S3 binary storage on an EC2 instance, so sharing a directory isn't quite as straightforward as a shared NFS mount, and it's not even clear that should be shared anyway.

3 votes

0 answers



Have you set up session affinity at the load balancer level? If you're not using a session cookie for these uploads, maybe you can add a custom header and do affinity on that?

We have the load balancer set up for user logins, but this is a sessionless direct call to the REST API. We are considering just using IP affinity for the time being unless there is a better way.