Exception from Chronicle and java deadlock

Hi guys. Recently I faced an issue with Nuxeo on production. I'm still trying to reproduce it locally, but at the time without success. It started with the following messages in logs:

2021-03-18T08:28:17,702 WARN  [http-nio-0.0.0.0-9030-exec-32] [TableStoreWriteLock] Couldn't acquire write lock after 15000ms for the lock file:/var/lib/nuxeo/data/stream/audit/audit/P-00/metadata.cq4t, overriding the lock. Lock was held by PID 234
2021-03-18T08:28:17,702 WARN  [http-nio-0.0.0.0-9030-exec-32] [TableStoreWriteLock] Forced unlock for the lock file:/var/lib/nuxeo/data/stream/audit/audit/P-00/metadata.cq4t

With the following exception:

net.openhft.chronicle.core.StackTrace: null
at net.openhft.chronicle.queue.impl.table.AbstractTSQueueLock.forceUnlock(AbstractTSQueueLock.java:56) ~[chronicle-queue-5.17.2.jar:?]
at net.openhft.chronicle.queue.impl.single.TableStoreWriteLock.lock(TableStoreWriteLock.java:57) ~[chronicle-queue-5.17.2.jar:?]
at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.writingDocument(SingleChronicleQueueExcerpts.java:354) ~[chronicle-queue-5.17.2.jar:?]
at net.openhft.chronicle.wire.MarshallableOut.writeDocument(MarshallableOut.java:95) ~[chronicle-wire-2.17.7.jar:?]
at org.nuxeo.lib.stream.log.chronicle.ChronicleLogAppender.append(ChronicleLogAppender.java:298) ~[nuxeo-stream-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.audit.listener.StreamAuditEventListener.writeEntry(StreamAuditEventListener.java:132) ~[nuxeo-platform-audit-core-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.audit.listener.StreamAuditEventListener.lambda$writeEntries$1(StreamAuditEventListener.java:123) ~[nuxeo-platform-audit-core-10.10-HF10.jar:?]
at java.util.ArrayList.forEach(ArrayList.java:1257) [?:1.8.0_242]
at org.nuxeo.ecm.platform.audit.listener.StreamAuditEventListener.writeEntries(StreamAuditEventListener.java:123) [nuxeo-platform-audit-core-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.audit.listener.StreamAuditEventListener.afterCompletion(StreamAuditEventListener.java:108) [nuxeo-platform-audit-core-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.audit.listener.StreamAuditEventListener.handleEvent(StreamAuditEventListener.java:88) [nuxeo-platform-audit-core-10.10-HF10.jar:?]
at org.nuxeo.ecm.core.event.impl.EventServiceImpl.fireEvent(EventServiceImpl.java:242) [nuxeo-core-event-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.ui.web.auth.NuxeoAuthenticationFilter.sendAuthenticationEvent(NuxeoAuthenticationFilter.java:224) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.ui.web.auth.NuxeoAuthenticationFilter.logAuthenticationAttempt(NuxeoAuthenticationFilter.java:257) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.ui.web.auth.NuxeoAuthenticationFilter.doAuthenticate(NuxeoAuthenticationFilter.java:315) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.ui.web.auth.NuxeoAuthenticationFilter.doFilterInternal(NuxeoAuthenticationFilter.java:572) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.nuxeo.ecm.platform.ui.web.auth.NuxeoAuthenticationFilter.doFilter(NuxeoAuthenticationFilter.java:429) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.exceptionhandling.NuxeoExceptionFilter.doFilter(NuxeoExceptionFilter.java:67) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.encoding.NuxeoEncodingFilter.doFilter(NuxeoEncodingFilter.java:64) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:712) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:459) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:384) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:312) [catalina.jar:9.0.14]
at org.nuxeo.ecm.restapi.server.APIServlet.service(APIServlet.java:50) [nuxeo-rest-api-server-10.10-HF10.jar:?]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741) [servlet-api.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) [tomcat-websocket.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.elasticsearch.ElasticSearchFilter.doFilter(ElasticSearchFilter.java:55) [nuxeo-elasticsearch-core-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.requestcontroller.filter.NuxeoThreadTrackerFilter.doFilter(NuxeoThreadTrackerFilter.java:43) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.requestcontroller.filter.NuxeoStandbyFilter.doFilter(NuxeoStandbyFilter.java:67) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.core.management.jtajca.internal.Log4jWebFilter.doFilter(Log4jWebFilter.java:69) [nuxeo-core-management-jtajca-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.ui.web.rest.FancyURLFilter.doFilter(FancyURLFilter.java:120) [nuxeo-platform-ui-web-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.requestcontroller.filter.NuxeoThreadTrackerFilter.doFilter(NuxeoThreadTrackerFilter.java:43) [nuxeo-platform-web-coat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]mmon-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.requestcontroller.filter.NuxeoStandbyFilter.doFilter(NuxeoStandbyFilter.java:67) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.core.management.jtajca.internal.Log4jWebFilter.doFilter(Log4jWebFilter.java:69) [nuxeo-core-management-jtajca-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.ui.web.rest.FancyURLFilter.doFilter(FancyURLFilter.java:120) [nuxeo-platform-ui-web-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.requestcontroller.filter.NuxeoRequestControllerFilter.doFilter(NuxeoRequestControllerFilter.java:134) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.requestcontroller.filter.NuxeoCorsCsrfFilter.doFilter(NuxeoCorsCsrfFilter.java:255) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.exceptionhandling.NuxeoExceptionFilter.doFilter(NuxeoExceptionFilter.java:67) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.nuxeo.ecm.platform.web.common.encoding.NuxeoEncodingFilter.doFilter(NuxeoEncodingFilter.java:64) [nuxeo-platform-web-common-10.10-HF10.jar:?]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:9.0.14]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:9.0.14]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199) [catalina.jar:9.0.14]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [catalina.jar:9.0.14]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490) [catalina.jar:9.0.14]
at org.apache.catalina.valves.rewrite.RewriteValve.invoke(RewriteValve.java:276) [catalina.jar:9.0.14]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [catalina.jar:9.0.14]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) [catalina.jar:9.0.14]
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:668) [catalina.jar:9.0.14]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) [catalina.jar:9.0.14]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [catalina.jar:9.0.14]
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408) [tomcat-coyote.jar:9.0.14]
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) [tomcat-coyote.jar:9.0.14]
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:834) [tomcat-coyote.jar:9.0.14]
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1417) [tomcat-coyote.jar:9.0.14]
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-coyote.jar:9.0.14]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242]
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:9.0.14]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]

And finally even the deadlock:

"http-nio-0.0.0.0-9030-exec-42":
  waiting to lock monitor 0x00007fa1d40eac98 (object 0x00000005d11797d8, a org.nuxeo.lib.stream.log.chronicle.ChronicleRetentionListener),
  which is held by "http-nio-0.0.0.0-9030-exec-2"
"http-nio-0.0.0.0-9030-exec-2":
  waiting to lock monitor 0x00007fa2200017a8 (object 0x00000005d117a030, a net.openhft.chronicle.queue.impl.WireStorePool),
  which is held by "AuditLogWriterPool-00,in:35998,inCheckpoint:35998,out:0,lastRead:1615833806658,lastTimer:1615885427776,wm:211790568696184833,loop:73444949,timer"
"AuditLogWriterPool-00,in:35998,inCheckpoint:35998,out:0,lastRead:1615833806658,lastTimer:1615885427776,wm:211790568696184833,loop:73444949,timer":
  waiting to lock monitor 0x00007fa1d40eac98 (object 0x00000005d11797d8, a org.nuxeo.lib.stream.log.chronicle.ChronicleRetentionListener),
  which is held by "http-nio-0.0.0.0-9030-exec-2"

Previously I've seen the warning like in the first block and thought they were related to the extremely slow IO the system had at the moment. But it is just a guess. Looks like the write lock for the audit log was missed somehow and Nuxeo was unable to release it.

Any ideas/suggestions?

2 votes

1 answers

1134 views

ANSWER



Hello,

Do you know if there is any pending works in the queue? The only walkaround coming to my mind is to remove the audit folder from Chronicle Queue, at the expense of losing some audit history:

$ cd /var/lib/nuxeo/data/stream/audit/
$ tar cvfz audit.tgz audit/
$ tar tvfz audit.tgz
$ rm -rf audit/
0 votes