• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Display Word attachments

 
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would like to display microsoft word attachment contents to the user, and in the code below, i am trying to display the doc contents with poi but i get the following error:
BASE64Decoder: Error in encoded stream: needed 4 valid base64 characters but only got 3 before EOF, the 10 most recent characters were: "//////////"
Thank you for the reply.


 
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And was that exception thrown by that code? If so, which line? There's nothing there which is obviously doing Base64 decoding.
 
Sheriff
Posts: 22784
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This is not the solution, but a side note which may help you prevent some errors when you get the base64 problem out of the way.
You only return the first paragraph this way. Any following paragraph will be ignored. I think you want to use a StringBuilder, add all paragraphs to that, then call toString() on it to return its contents as a string.
This construct will call processPart for each part, but throw away the return values. Again, use a StringBuilder to add all contents.
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey Rob , thanks for the note , I appended the contents of pg and processPart(pt) to StringBuilder and then called toString() as return value.
There are two errors that I encountered on the line : HWPFDocument hwpfDoc = new HWPFDocument(p.getInputStream());

The first error is the one that's now showing up as the exception.
It seems poi is not able to fetch the inputstream of the part.
Is there any alternative method to fetch the inputstream from the word attachment and pass it on to the WordExtractor or any other method to display word attachment contents?.
Thanks for the reply.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It looks like you're trying to create an HWPFDocument from the attachment, and that fails because the attachment isn't something that HWPFDocument can handle. Perhaps the attachment is corrupted in some way, or not even a valid Word document?

Anyway the way that I would display a Word attachment to a user is this: I would just copy the contents of the attachment and let the user open it with Word. I'm sure that as a user I wouldn't want all the formatting and other non-text information just thrown away.
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@paul, could you provide some details on how to copy the contents of the attachment to a file?, in this case, a word file.
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You've got the InputStream right there. Just copy the bytes from there to a file. That's all. You should be able to deal with the details of that, no?
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@paul, i am able to get the word file attachment from the inputstream, but the generated file contains lots of non-formatted data along with the formatted text.
And microsoft word is not able to open the file stating that it is corrupt (I had to use wordpad). is there any way to generate a formatted file? or does poi support the generation of a formatted file from the inputstream?
 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If it's not a valid Word document, then it isn't surprising that you couldn't read it using the HWPF software.

And if it is a Word document, then it would naturally contain data other than the text. That's the whole point of using Word instead of Notepad, Word allows you to use different fonts and embed images and a thousand other non-text things.

Anyway your first step is to find out whether you really have a valid Word document (use your regular mail client instead of JavaMail to read it). If it is, then your next step is to make sure you aren't corrupting it yourself by converting the bytes from the InputStream into chars.
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@paul, i downloaded the word file from gmail and it is valid. my question is how to properly fetch the word contents from the inputstream of the Message part?
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@paul, the word file attachment is in my gmail inbox and it just contains plain text and no other data. My question is how to properly fetch contents of a word file from the inputstream of the Message part? , so that the file does not contain any other garbage data along with the original content. I am using the code below to fetch word attachments.





 
Paul Clapham
Marshal
Posts: 28226
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, that can't be right. Word files never just contain text and nothing else. At the very least there's always a whole lump of control information at the beginning before the actual text appears.

But anyway, your code is okay. Except for not closing the file after you finish writing data to it, you should definitely do that. Have you compared the output of your code to the result of displaying the data from the mail client?
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@paul, yes, word files contain data structures containing info about the content along with the original content. By plain text, i meant , the file does not contain images or other stuff, just text. The good news is , on fetching some more word attachments from other messages, it turns out that, i was able to get doc files like resumes with proper formatting.Its just doc files other than resumes that showed up with garbage content.
 
vignesh krishnan
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@paul, Thanks for all the help.
 
reply
    Bookmark Topic Watch Topic
  • New Topic