File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JSF and the fly likes Read large xml file for searching Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » JSF
Bookmark "Read large xml file for searching" Watch "Read large xml file for searching" New topic
Author

Read large xml file for searching

Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
Hi,

Can someone please help me I have a very large xml file i.e. > 12 MB. I have a search form where a user enters some data a I need to search for the content in that xml file and display it on the page in a datatable. Now as the file is huge what is the best approach to search the file. As x number of users may be doing a search at one time. The xml file is stored in a specific folder and that never changes.

Is it ok to read the file data and store it in a static hashset and keep it in memory and search on the hasset. Or what else can be a better way.

Thanks


MB<br />Sun Certified Programmer for Java2 Platform
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42276
    
  64
Can you cache it as a DOM Document in memory? Then you could run XPath or XQuery queries over it without having to do any file I/O or parsing.


Ping & DNS - my free Android networking tools app
Mike Boota
Ranch Hand

Joined: Jul 18, 2002
Posts: 82
But what if there are many users accessing the file and do I have to load it in memory for each request or have it loaded in memory once on server startup.

thanks
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42276
    
  64
Caching means that you'd load the document once, and then keep it in memory. Subsequent accesses are faster that way.
Rahul Bhattacharjee
Ranch Hand

Joined: Nov 29, 2005
Posts: 2308
Originally posted by Mike Boota:
But what if there are many users accessing the file and do I have to load it in memory for each request or have it loaded in memory once on server startup.


As already suggested , you can have the XML read onto a DOM and then use XPath to get the required values.Once you load it , all the XPath queries can be directed to the same DOM.No need to load it again and again.

One thing to note is , for a file of about 12 MB it would atleast take 18 MB (might be a few more MB's) of RAM.

So if the xml has some information specific to user and in production you will have many more users.In that case the size of the xml would be big and will take a lot of memory.Then you might have to think of something else.


Rahul Bhattacharjee
LinkedIn - Blog
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Read large xml file for searching