aspose file tools*
The moose likes Java in General and the fly likes Java And OpenDocument Files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Java And OpenDocument Files" Watch "Java And OpenDocument Files" New topic
Author

Java And OpenDocument Files

Arthur Buliva
Ranch Hand

Joined: Mar 08, 2006
Posts: 101
Hi Friends,

Now, lets say you have a file, lets call it document.odt

I have noticed that .odt files are not "stand alone" files like .txt or .dat files. Rather they are like zipped files with other defining files in there.

Before I get out of track, let me go to my question.

document.odt contains some text, say "Hello World". I want to edit this text through Java and save the resulting text. How can this be achieved?

Thanks and regards.
Joe Ess
Bartender

Joined: Oct 29, 2001
Posts: 8996
    
    9

OpenOffice has a Java API.


[How To Ask Questions On JavaRanch]
Arthur Buliva
Ranch Hand

Joined: Mar 08, 2006
Posts: 101
And how can I use the API to open/edit files through a TextArea on my Java application?
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42912
    
  68
Accessing ODF files through the OO Java API and using the data in GUI elements are separate activities. You'll need to dig into the API to figure out how to access the data that you need for the GUI. The AccessingFileFormats wiki page links to a number of articles about the OO Java API and the ODF file format.
Arthur Buliva
Ranch Hand

Joined: Mar 08, 2006
Posts: 101
Dear Dittmer,

I am simply lost in the array of suggestions in those pages...
Jesper de Jong
Java Cowboy
Saloon Keeper

Joined: Aug 16, 2005
Posts: 14428
    
  23

The OpenOffice Java API
  • OpenOffice can read a number of file formats, and makes them accessible through its API. A starting point might be this article and of course the OO developer site
  • Some introductory information about the OO file format can be found here and here - basic Java code for reading OO files is here
  • Reading an OpenOffice file is not as simple as reading a plain text file, simply because OpenOffice contains a lot more features than a plain text editor.
    [ October 31, 2007: Message edited by: Jesper Young ]

    Java Beginners FAQ - JavaRanch SCJP FAQ - The Java Tutorial - Java SE 8 API documentation
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    So whats your suggestion on how to open them?
    Ulf Dittmer
    Marshal

    Joined: Mar 22, 2005
    Posts: 42912
        
      68
    Either use the OO Java API to convert the file to some other format that you do know how to open, or -based on the articles linked above- write code that opens and processes the files.

    From my cursory look at the articles OO files appear to be zipped-up XML files. Since Java has APIs for dealing with ZIP and XML files, getting at the actual contents shouldn't be too hard. Making sense of those is a different matter, of course - I'd recommend to start with a simple document, to see if you can manage to extract whatever information you need.
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    I guess this leads to the next question, my good people.

    How do I use the Java API to zip/unzip file packages?
    Ulf Dittmer
    Marshal

    Joined: Mar 22, 2005
    Posts: 42912
        
      68
    Examples of using the ZIP API -as well as just about all other java.* classes- can be found at the Developer's Almanac: http://www.exampledepot.com/egs/java.util.zip/pkg.html
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101


    from the reference you gave me has solved my most immediate problem. Thanks!
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    This is a sample output:



    and the code



    retrieves only the first entry whereas I would want the file called content.xml to be extracted, if not the entire odt file, onto a specified folder. How or what do I need to modify in the code?
    Joanne Neal
    Rancher

    Joined: Aug 05, 2005
    Posts: 3742
        
      16
    You need a loop. ZipInputStream.getNextEntry() returns null when there are no more entries, so you can use that as the loop controller.
    If you only want to output certain files, then you need to check the name of each ZipEntry object before you write it out. Check the API docs to see if there is a method that gets the name of the ZipEntry object.


    Joanne
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    My redo of the code is




    What could be the issue here as it returns lots of null and empty folders?
    Joanne Neal
    Rancher

    Joined: Aug 05, 2005
    Posts: 3742
        
      16
    You create a new FileOutputStream object every time through the loop, but you only close the last one after you exit the loop. Put the out.close() call inside the loop.
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    After doing that it renders the files as folders/directories instead of just files.
    Ulf Dittmer
    Marshal

    Joined: Mar 22, 2005
    Posts: 42912
        
      68
    After doing that it renders the files as folders/directories instead of just files.

    It looks as if the code is supposed to extract the files that are port of the ODF file, and write them to disk, each in the physical directory where it would logically be inside of the ODF file. Doesn't it do that? If not, what does it do, and where does it go wrong?
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    Yes, this is what this code is supposed to do:

    From the list of files inside the ODT file, which are

    mimetype
    Configurations2/statusbar/
    Configurations2/accelerator/current.xml
    Configurations2/floater/
    Configurations2/popupmenu/
    Configurations2/progressbar/
    Configurations2/menubar/
    Configurations2/toolbar/
    Configurations2/images/Bitmaps/
    content.xml
    styles.xml
    meta.xml
    Thumbnails/thumbnail.png
    settings.xml
    META-INF/manifest.xml

    I need to extract the files as they are in the odt file. For instance, thumbnail.png is to be extracted in a folder called Thumbnails.

    So far, the code



    returns



    Sorry if I am going back on my progress but I hope its for the best interest of clarity here.

    Thanks.
    Ulf Dittmer
    Marshal

    Joined: Mar 22, 2005
    Posts: 42912
        
      68
    There are a couple of problems with the code.

    Firstly, you're creating the FileOutputStream (FOS) before creating the directories. That won't work, because the FOS constructor tries to access the file.

    Secondly, if the name does not contain a "/", then it's a top-level file, and no directories should be created.

    Thirdly, if the name ends with a "/", then it's an empty directory, and you should not open a FOS and try to copy bytes.

    Lastly, if the name contains a "/" somewhere in the middle, then it's a file, and mkdirs should only be called with the part up to the last "/".

    You can list the entry names your code should be expecting via "jar tf filename.odt"
    [ November 17, 2007: Message edited by: Ulf Dittmer ]
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    Eureka!

    Thanks Dittmer



    Has solved my problem
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101


    Has crudely but successfully 'ripped' up the xml file. Now it is to 'reverse engineer' this class to save the changes.
    Ulf Dittmer
    Marshal

    Joined: Mar 22, 2005
    Posts: 42912
        
      68
    I think you'd be better off with a real XML parser. Scanning an XML file like you showed above gets tricky in the presence of nested elements. Something like the following would do the trick (it doesn't extract the files, just reads them in-place, but it's easy to adapt to your situation).

    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    How is this class used?

    This is what am getting:

    Is there something am missing?
    Ulf Dittmer
    Marshal

    Joined: Mar 22, 2005
    Posts: 42912
        
      68
    The for loop should run from 0, not 1. Silly typo.
    Arthur Buliva
    Ranch Hand

    Joined: Mar 08, 2006
    Posts: 101
    Great


    is what I get.

    Now, my main aim so far of going through the unzipping process is the editing part. Am building a simple online odt editor. So on saving, I was thinking of working with the content.xml file, editing whatever I want, then zipping up the entire package and renaming the resulting file to a dot odt. Thus I would have achieved the intended result.
     
    With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
     
    subject: Java And OpenDocument Files