aspose file tools*
The moose likes XML and Related Technologies and the fly likes XML vs Database Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "XML vs Database" Watch "XML vs Database" New topic
Author

XML vs Database

ashish bhardwaj
Greenhorn

Joined: Jan 27, 2008
Posts: 3
I am looking for views on this:

We have data of size 10 TB(terabytes), stored in multiple disks. Metadata (data describing data like filename, its location, author, description etc.) can go in GB(gigabyes) say 5GB. To develop a web based application, should metadata be stored in xml files or in a database like oracle, mysql etc.

Since data is going to increase in future, scalability is required. Which approach will give better performance?

Thanks
Ashish
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

You want to extract data randomly from that 5 GB of metadata? Then don't make it a single XML document. That's completely unscalable. If it's amenable to being made into SQL data then I would do that. If it's less structured than that, then I don't know.
ashish bhardwaj
Greenhorn

Joined: Jan 27, 2008
Posts: 3
Hi Paul,
It will be like a user wants to find data matching a particular criteria e.g. all files generated between specified start date and end date,
extracting required data and analysing it to give statistics, generate plot etc.

Will database approach will give good performance?
As xml file will be larger, so can't use DOM, but Is using SAX parser scalable and gives good performance?



Thanks
Ashish
Nitesh Kant
Bartender

Joined: Feb 25, 2007
Posts: 1638

Probably performance is not the only criteria that should decide whether one should use database or not. There are whole lot of things database provides like transactions, management capablities, power of sql or similar query language, stored procedures and the list is long.
If you like it, there are many open source xml databases available today.
My personal opinion is having data stored in an xml and trying to read it yourself, for any big application will lead you into a point from where you will think "may be i should have used an xml database".
By whatever you have described, it looks like you are into some sort of reporting software that will definetly have a lot of different type of queries into the data and hence it will be a huge effort to write code to read xml without using xml database. It just makes the problem hugely complex if the size of data is of Terabytes or even GBs.
I would suggest(as Paul suggested) to go for database if possible otherwise atleast look at some good xml database to the least.
Do not venture into giving a shot at handling the xmls yourself!

[ January 29, 2008: Message edited by: Nitesh Kant ]
[ January 29, 2008: Message edited by: Nitesh Kant ]

apigee, a better way to API!
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18570
    
    8

Originally posted by ashish bhardwaj:
It will be like a user wants to find data matching a particular criteria e.g. all files generated between specified start date and end date,
extracting required data and analysing it to give statistics, generate plot etc.

Will database approach will give good performance?
If your data model can be described as a reasonably small collection of tables, then a database would be a good approach. Databases have indexes and views which mean you don't have to read the entire database to find one piece of information.
As xml file will be larger, so can't use DOM, but Is using SAX parser scalable and gives good performance?
The SAX method won't use up all your memory, but you still have to read the entire XML document even if you want to extract one data element. And if you have a query that can't easily be answered by a single sequential scan of the database, then you have some hard work to do (which almost certainly would have been one line of SQL).
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML vs Database