File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other JSE/JEE APIs and the fly likes Display Word attachments Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Other JSE/JEE APIs
Bookmark "Display Word attachments " Watch "Display Word attachments " New topic
Author

Display Word attachments

vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
I would like to display microsoft word attachment contents to the user, and in the code below, i am trying to display the doc contents with poi but i get the following error:
BASE64Decoder: Error in encoded stream: needed 4 valid base64 characters but only got 3 before EOF, the 10 most recent characters were: "//////////"
Thank you for the reply.



Learn as if you will live forever
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18911
    
    8

And was that exception thrown by that code? If so, which line? There's nothing there which is obviously doing Base64 decoding.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19762
    
  20

This is not the solution, but a side note which may help you prevent some errors when you get the base64 problem out of the way.
You only return the first paragraph this way. Any following paragraph will be ignored. I think you want to use a StringBuilder, add all paragraphs to that, then call toString() on it to return its contents as a string.
This construct will call processPart for each part, but throw away the return values. Again, use a StringBuilder to add all contents.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
Hey Rob , thanks for the note , I appended the contents of pg and processPart(pt) to StringBuilder and then called toString() as return value.
There are two errors that I encountered on the line : HWPFDocument hwpfDoc = new HWPFDocument(p.getInputStream());

The first error is the one that's now showing up as the exception.
It seems poi is not able to fetch the inputstream of the part.
Is there any alternative method to fetch the inputstream from the word attachment and pass it on to the WordExtractor or any other method to display word attachment contents?.
Thanks for the reply.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18911
    
    8

It looks like you're trying to create an HWPFDocument from the attachment, and that fails because the attachment isn't something that HWPFDocument can handle. Perhaps the attachment is corrupted in some way, or not even a valid Word document?

Anyway the way that I would display a Word attachment to a user is this: I would just copy the contents of the attachment and let the user open it with Word. I'm sure that as a user I wouldn't want all the formatting and other non-text information just thrown away.
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
@paul, could you provide some details on how to copy the contents of the attachment to a file?, in this case, a word file.
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18911
    
    8

You've got the InputStream right there. Just copy the bytes from there to a file. That's all. You should be able to deal with the details of that, no?
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
@paul, i am able to get the word file attachment from the inputstream, but the generated file contains lots of non-formatted data along with the formatted text.
And microsoft word is not able to open the file stating that it is corrupt (I had to use wordpad). is there any way to generate a formatted file? or does poi support the generation of a formatted file from the inputstream?
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18911
    
    8

If it's not a valid Word document, then it isn't surprising that you couldn't read it using the HWPF software.

And if it is a Word document, then it would naturally contain data other than the text. That's the whole point of using Word instead of Notepad, Word allows you to use different fonts and embed images and a thousand other non-text things.

Anyway your first step is to find out whether you really have a valid Word document (use your regular mail client instead of JavaMail to read it). If it is, then your next step is to make sure you aren't corrupting it yourself by converting the bytes from the InputStream into chars.
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
@paul, i downloaded the word file from gmail and it is valid. my question is how to properly fetch the word contents from the inputstream of the Message part?
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
@paul, the word file attachment is in my gmail inbox and it just contains plain text and no other data. My question is how to properly fetch contents of a word file from the inputstream of the Message part? , so that the file does not contain any other garbage data along with the original content. I am using the code below to fetch word attachments.





Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18911
    
    8

Well, that can't be right. Word files never just contain text and nothing else. At the very least there's always a whole lump of control information at the beginning before the actual text appears.

But anyway, your code is okay. Except for not closing the file after you finish writing data to it, you should definitely do that. Have you compared the output of your code to the result of displaying the data from the mail client?
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
@paul, yes, word files contain data structures containing info about the content along with the original content. By plain text, i meant , the file does not contain images or other stuff, just text. The good news is , on fetching some more word attachments from other messages, it turns out that, i was able to get doc files like resumes with proper formatting.Its just doc files other than resumes that showed up with garbage content.
vignesh krishnan
Greenhorn

Joined: Feb 03, 2010
Posts: 10
@paul, Thanks for all the help.
 
 
subject: Display Word attachments