nuxeo-plattform-ocr and image pdfs
I have installed the nuxeo-plattform-ocr plugin ( https://github.com/nuxeo/nuxeo-platform-ocr#readme ) and is working very nice, but I am not able to run the OCR inside image PDFs.
Is there any plugin to do this?
Ruben Bahntje Ushuaia - Argentina
Great to learn that you could install this addon successfully despite the list of non trivial dependencies to build from source :)
To make it work on PDF files it would require to first extract the image files (e.g. JPEG files) included inside. If you are a Java developer, this should be doable with the http://pdfbox.apache.org/ , e.g. you can take class from the PDFBox source tree as an example.
The source code of the OCR plugin is not too complicated to dive into and I can probably assist you on the nuxeo-dev mailing list or better directly through the inline review system on pull request directly on github.