File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes JDBC and the fly likes Large amount of data Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Databases » JDBC
Bookmark "Large amount of data" Watch "Large amount of data" New topic
Author

Large amount of data

Zein Nunna
Ranch Hand

Joined: Mar 31, 2005
Posts: 245
Hello guys,

I have been curious as to how someone would store large amounts of data in a database.

I'm looking to store 'comments' my users may send to me, these could be upto 1000-1500 characters each, which I would have thought makes database operations pretty cumbersome. There would be approximatley 500+ comments a week (maybe going on to 500 a day, if I'm informed correctly).

In the worst case, I'd write to files, but if there's something better out there I'd rather use that.

How useful would something like 'blob' be for this type of operation?

I'm looking to use a MySQL DB with Tomcat.

Your thoughts would be much appreciated.
Thanks in advance.
Zein

PS: I know it what I've doesnt sound like 'Large amounts of data' indicated in the subject line, but it is hoped I'll be heading in that direction. Someday. :P
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42276
    
  64
My first reaction was "don't worry, this isn't even remotely a large data set", until I read the last sentence. Even so, 500 comments a day at 1K a piece shouldn't give MySQL trouble for a long time to come.

A blob wouldn't be the right data type, though, as it's for binary data. Regular text/varchar should do nicely.


Ping & DNS - my free Android networking tools app
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30753
    
156

Zein,
The individual fields aren't that large. It's just that there could be a lot of them. I like your disclaimer about the future by the way

Using a CLOB (character large object) would be overkill since each field isn't that large. One piece of advice to protect yourself as the database grows: create an index on the fields you typically use in the where clause of the query if they aren't the primary key. (like the user id and date). This will prevent the database having to access the memory blocks containing the larger portions of the data unless it is looking for that specific record.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Zein Nunna
Ranch Hand

Joined: Mar 31, 2005
Posts: 245
Hi guys,

Thank you for your comments. I suppose technology has moved on and I'm still stuck in the stone age :P :shocked: :P . I can see how the disclaimer is funny in most peoples world :P! Hehee, had me cracking up too now that I put into perspective.

Anyway, for the time being, the approach of just using a database to store the data, I suppose would work fine. What I really want to do is build a reasonably robust system, that doesn�t fold after 6-12 or even 18 months.

I�m aware that requirements will change over time, and if the database is going to form the backbone of the site, I would have thought it needs to be pretty robust. The last thing I�d want to do is go hammer and chisel and try modifying the database dramatically.

As it stands I can envisage requirements like, 'double the data fields legth to 2-3K', �I want to be able to colour my comment�, underline etc., possibly even add a picture (which would inevitably lead to video clips) coming my way. I don�t want to compromise the speed of the website too much when I'm coding these in.

Jeanne, I take your point when it comes to the �growth of data�. To clarify you're saying the 'large data' sets (in my world anyway) should be kept in separate tables, with ID's that access them as and when required. Wouldn�t this approach mean I end up breaking some database construction (normalisation) rules?

Also if the database is large, doesn�t this have an impact on all transactions coming to the database? Or is it, as suggested by Jeanne, just the data blocks that are being accessed.

Oh finally, if I have text that is underlined, coloured etc. that can�t be stored as plain text (I would have thought )? What data type would that have to be? Or is that an external file?

Thanks in advance for your thoughts :thumb: :thumb:
Regards
Zein
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18652
    
    8

Originally posted by Zein Nunna:
Oh finally, if I have text that is underlined, coloured etc. that can�t be stored as plain text (I would have thought )? What data type would that have to be? Or is that an external file?
How you store that data is going to depend on what tool you are providing to the users to mark up the text. You have a web application (you mentioned Tomcat) so I suppose right now you are just using an HTML textarea. If you want to provide formatting capabilities then you should first look at tools that do that. Then you would have to see what they produced and how they affect your database design.
Zein Nunna
Ranch Hand

Joined: Mar 31, 2005
Posts: 245
Thanks Paul,

I see what you're saying about formatting the text. I haven't considered a tool as yet. What I mentioned in the previous post was mere speculation. Great info for the future.

Regards
Zein
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30753
    
156

Zein,
It's good you are thinking ahead. You can always change the field to a CLOB later if need be for the "double the data fields length" requirement. For formatting, consider a UBB style formatting like we have here. I used that at work with things like [red]test[/red] and it works great. The nice part is that you can store the formatting as text and then just reply it in Java later. For true binary data like images and video clips, you will need to add another column to your table anyway of a BLOB type. So you don't need to design it in now.

I wasn't saying they should be in separate tables, although that isn't a bad idea. I was saying that is should be possible to get the rest of the row without having to access that memory area. If you have a lot of fields that you access in different ways, the separate table option sounds good. If you always get the whole row with all its potential attachments, you j ust need to have an index with the "lookup" column(s).

> Wouldn�t this approach mean I end up breaking some database construction
> (normalisation) rules?
I'd worry if you were de-normalizing and putting everything in one table. Splitting up tables rarely causes problems. When it does cause problems, it was because things were taken too far and everything is in a separate table.
Zein Nunna
Ranch Hand

Joined: Mar 31, 2005
Posts: 245
Thanks for your advice guys.

I've made enough mistakes where I've just coded the requirements at hand and not looked into the future at all. So I was hoping to avoid my past mistakes.

I'll take your input on board.

Regards
Zein
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Large amount of data