wood burning stoves 2.0*
The moose likes Other Open Source Projects and the fly likes MS-Word Docs Access Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of The Java EE 7 Tutorial Volume 1 or Volume 2 this week in the Java EE forum
or jQuery UI in Action in the JavaScript forum!
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "MS-Word Docs Access" Watch "MS-Word Docs Access" New topic
Author

MS-Word Docs Access

Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Hi All,
I have two problem regarding word docs file.

1. I want to read the word count of the file. There are two type of word count ,one which is in summary of the doc file and the other one in the Tools-WordCount when word file is open.
I want to access the word count as same of the Tools-WordCount. Word is having the problem that its both count may vary due to some problem.

2. Is there a way so that we can check whether there are temporary file present or not in the directory as the temp file will create when the files are open and some file remains even after the word file closed. So here in this question I want to ask that I want to check first whether the file is open or not and the other one is is there a temp file present or not.


Please answer the question if any body have solution.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
As to #1, that's a document property, so it's covered by your other question.

You can use the XWPFWordExtractor class to extract all text, and then count the words.

As to #2, I don't fully understand what you're asking. What is "the directory" - the temp directory? The current working directory? Some other directory?
To check whether there's any particular file in it you;d need to know the name of the file; I don't think you would know that for some temporary file from some other process.
What is the purpose of wanting to do this?


Ping & DNS - my free Android networking tools app
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Let me try the first one and for the other one the directory is the same working directory and the files are which are automatically created when the document will open.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
the files are which are automatically created when the document will open.

Which application automatically creates files - Word? The POI library doesn't.

Why does it matter what Word does when it opens a file? Or are you trying to open a file using POI at the same time as Word?
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
XWPFWordExtractor class will do the function in the same manner as we have already done with HWPFDocument.
But it is reading the word summary properties but no one gives the solution for the document tools--->WordCount when the file is open as both the count are different.
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
No I am not trying to open the file with the poi.
But it is the general scenario that if you open a word document it will create a temporary hidden file and it will delete as soon as we close the main word file but there may be a scenario that this file will not delete automatically then what can we do in that case.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
Why do you need to do anything in that case? If Word has written the file to disk (and closed it), then the word count should have been updated during writing.

Are you saying the word count is still incorrect, even after Word has closed the file?
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
yes thats what I want to explain .
You can try it on your system that first loop into its property--summary--advanced--word count

and then open the doc and then calculate it through the tools---word count...

There are many case in which the count vary and the variation are more than 1000 and more......
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
Well, if Word isn't even consistent with itself, then there's nothing much POI can do about it.
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
But My problem is I have to sync myself with the count in the tools---WordCount option
Is it possible that we can calculate the word count with the input stream that we use and then count the word.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
You can always open the document, iterate through all the Ranges, count the words contained in them, and see if that provides a more accurate count.
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Any suggestion or way to procced in this way .
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
The source of the XWPFWordExtractor class gives some clues: http://svn.apache.org/repos/asf/poi/trunk/src/ooxml/java/org/apache/poi/xwpf/extractor/XWPFWordExtractor.java
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
This is not helping me a lot. But thanks.

But is there not a way to sync up both the count.

Thanks
Kushagra
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
This is not helping me a lot.

Well, it shows how to access the textual contents of a file. Once you have that, it shouldn't be hard to count the words contained in that text.
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Hi Ulf,
I use the code in the file but it gives me following exception
org.openxml4j.exceptions.InvalidOperationException: Can't open the specified file:

At this line I got this exception
POIXMLTextExtractor extractor = new XWPFWordExtractor(POIXMLDocument.openPackage(args));
Now this is the same exception that I got before .
So what should I do now.
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Here args is the path of the file.

Thanks
Kushagra
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
HI Ulf,

Following are my observation because of which my word count is differ from the actual word count when it opens.

1. Document Properties
2. Personal Information
3. Headers, Footers & Watermarks
4. Comment
5. Revision
6. Version
7. Annotation
8. Custom XMl
9. Hidden Text

Word count in the summary information will include all the above values
If we can control all these values then we can easily handle the word count and it will become equal.
So is there a way that we can handle all the above things.

As with the help of POI I found few to handle.

Thanks
Kushagra
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Can we resolve the issue that is actually from the microsoft side itself

Thanks
Kushagra
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
We can do any thing with MS-Word 2003 on the same if I only want to process the ms word 2003 file?


Thanks
Kushagra
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
I don't understand either of your last two questions; can you rephrase them, possibly providing more details?
[ December 23, 2008: Message edited by: Ulf Dittmer ]
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Actually my requirement is two process both the version of ms-office.

Thats why I am asking for ms-word2003 too.

Thanks
Kushagra
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 41574
    
  54
I don't think you can have a single code base that can process both the XML and the binary formats; you'll need to use either the XWPF or the HWPF classes, and those are specific to one format.

On the Excel side there's some glue code (in the org.apache.poi.ss package) that unifies the APIs for both formats, but as far as I know, there's nothing comparable for Word documents.
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Hi All,

I have few finding over here. Actually we can open the ms word document by few method.
1. Runtime.getRuntime().exec("path of winword.exe"+ "file name");
In this scenario I want to open the document with visible=false. As I don't want user to see that I open the document. and after that want to close the same doc.

2. Use JACOB(Java/COM Bridge)
by this scenario I want to simply open the word doc with same visible=false and then calculate the word count if possible.

So, Out of these which one should I follow to resolve the issue as I want to read simply the word count that we can calculate as tools-->word count.
As word count from the properties is coming a little bit different from the inner(actual) one and user want the actual word count.

Any suggestion or any other way to resolve the same issue please help me out.


Thanks
kushagra
Kushagra Bindal
Ranch Hand

Joined: Oct 15, 2008
Posts: 156
Hi All,

Or there is a way so that we can access the word through JNI.

As I am in need of a solution for window as well as for Mac.

So please suggest me accordingly.

Thanks
Kushagra
 
wood burning stoves
 
subject: MS-Word Docs Access