I have a (mostly) working Apache Tika implementation in a SpringBoot service.
From the third-party application, I pass the binary PDF file content first wrapped by a Base64Encode function, then wrapped again by a URLEncode function (just to be sure...) to the service (then decode both steps in the service to get the PDF binary file back after this transmission). Once I have the binary file re-constructed in the service, I then extract the text content and return it as the service's Response using the Tika library.
However, I found a "problem" PDF that gives me an odd Spring message. (Extracting the PDF in standalone Tika works fine.)
The error is a 400. Bad Request. "org.springframework.http.converter.HttpMessageNotReadableException".
I tried to set a break-point right in the first line of code in the controller but the 400 problem happens before that.
The Charles HTTP Proxy reports a "broken pipe".
Again, most files, including PDFs work fine, so I'm baffled what's going on with this one. This particular "problem" PDF does have lots of symbols, equations, and graphs, but it works standalone so I'm not sure what to try next.
I did upgrade SpringBoot to 1.5.15.RELEASE, but that made no difference.
Are you able to upgrade to Spring Boot 2.0.x? Maybe this will get you better/different results.
I think that, without too many modifications many Spring Boot 1.5 apps can be migrated to Spring Boot 2.0.
So your problem happens after you encode it and send it to the endpoint? That would mean the problem is happening when you are preparing it to be sent to the endpoint, or that the endpoint is lacking in some way. I doubt that upgrading to spring 2 would change anything.
If making a change to your spring is the solution, then it's probably adding a converter or looking into that area, because that's where the exception is.
Je sais ce que je sais
posted 11 months ago
Thanks for the replies!
As I mentioned, most of the PDFs work fine in Springboot. The Apache Tika framework is the "converter". The PDF binary data comes encoded as text so Spring shouldn't be coughing up a hairball on this Request.
Additionally, as I didn't mention initially, the exact same logic but using Spark Java REST framework works perfectly.
Thus, it's not the encoding.
I tried to upgrade to SpringBoot 2.0.4.RELEASE this morning, but my project would no longer build. I'll work on that and see if that helps.
This problem seems to be possibly related to the size of the request as the same thing happens with a huge text file (Spring only).
posted 11 months ago
Turns out I had a redundant Content-Type as URL Encoded.
Since I was already URLEncoding the Base64Encoded file, this extra Request header made some files not extract correctly.