• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Large amount of data

 
Zein Nunna
Ranch Hand
Posts: 245
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello guys,

I have been curious as to how someone would store large amounts of data in a database.

I'm looking to store 'comments' my users may send to me, these could be upto 1000-1500 characters each, which I would have thought makes database operations pretty cumbersome. There would be approximatley 500+ comments a week (maybe going on to 500 a day, if I'm informed correctly).

In the worst case, I'd write to files, but if there's something better out there I'd rather use that.

How useful would something like 'blob' be for this type of operation?

I'm looking to use a MySQL DB with Tomcat.

Your thoughts would be much appreciated.
Thanks in advance.
Zein

PS: I know it what I've doesnt sound like 'Large amounts of data' indicated in the subject line, but it is hoped I'll be heading in that direction. Someday. :P
 
Ulf Dittmer
Rancher
Posts: 42967
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My first reaction was "don't worry, this isn't even remotely a large data set", until I read the last sentence. Even so, 500 comments a day at 1K a piece shouldn't give MySQL trouble for a long time to come.

A blob wouldn't be the right data type, though, as it's for binary data. Regular text/varchar should do nicely.
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34205
341
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Zein,
The individual fields aren't that large. It's just that there could be a lot of them. I like your disclaimer about the future by the way

Using a CLOB (character large object) would be overkill since each field isn't that large. One piece of advice to protect yourself as the database grows: create an index on the fields you typically use in the where clause of the query if they aren't the primary key. (like the user id and date). This will prevent the database having to access the memory blocks containing the larger portions of the data unless it is looking for that specific record.
 
Zein Nunna
Ranch Hand
Posts: 245
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi guys,

Thank you for your comments. I suppose technology has moved on and I'm still stuck in the stone age :P :shocked: :P . I can see how the disclaimer is funny in most peoples world :P! Hehee, had me cracking up too now that I put into perspective.

Anyway, for the time being, the approach of just using a database to store the data, I suppose would work fine. What I really want to do is build a reasonably robust system, that doesn�t fold after 6-12 or even 18 months.

I�m aware that requirements will change over time, and if the database is going to form the backbone of the site, I would have thought it needs to be pretty robust. The last thing I�d want to do is go hammer and chisel and try modifying the database dramatically.

As it stands I can envisage requirements like, 'double the data fields legth to 2-3K', �I want to be able to colour my comment�, underline etc., possibly even add a picture (which would inevitably lead to video clips) coming my way. I don�t want to compromise the speed of the website too much when I'm coding these in.

Jeanne, I take your point when it comes to the �growth of data�. To clarify you're saying the 'large data' sets (in my world anyway) should be kept in separate tables, with ID's that access them as and when required. Wouldn�t this approach mean I end up breaking some database construction (normalisation) rules?

Also if the database is large, doesn�t this have an impact on all transactions coming to the database? Or is it, as suggested by Jeanne, just the data blocks that are being accessed.

Oh finally, if I have text that is underlined, coloured etc. that can�t be stored as plain text (I would have thought )? What data type would that have to be? Or is that an external file?

Thanks in advance for your thoughts :thumb: :thumb:
Regards
Zein
 
Paul Clapham
Sheriff
Pie
Posts: 20976
31
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Zein Nunna:
Oh finally, if I have text that is underlined, coloured etc. that can�t be stored as plain text (I would have thought )? What data type would that have to be? Or is that an external file?
How you store that data is going to depend on what tool you are providing to the users to mark up the text. You have a web application (you mentioned Tomcat) so I suppose right now you are just using an HTML textarea. If you want to provide formatting capabilities then you should first look at tools that do that. Then you would have to see what they produced and how they affect your database design.
 
Zein Nunna
Ranch Hand
Posts: 245
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Paul,

I see what you're saying about formatting the text. I haven't considered a tool as yet. What I mentioned in the previous post was mere speculation. Great info for the future.

Regards
Zein
 
Jeanne Boyarsky
author & internet detective
Marshal
Posts: 34205
341
Eclipse IDE Java VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Zein,
It's good you are thinking ahead. You can always change the field to a CLOB later if need be for the "double the data fields length" requirement. For formatting, consider a UBB style formatting like we have here. I used that at work with things like [red]test[/red] and it works great. The nice part is that you can store the formatting as text and then just reply it in Java later. For true binary data like images and video clips, you will need to add another column to your table anyway of a BLOB type. So you don't need to design it in now.

I wasn't saying they should be in separate tables, although that isn't a bad idea. I was saying that is should be possible to get the rest of the row without having to access that memory area. If you have a lot of fields that you access in different ways, the separate table option sounds good. If you always get the whole row with all its potential attachments, you j ust need to have an index with the "lookup" column(s).

> Wouldn�t this approach mean I end up breaking some database construction
> (normalisation) rules?
I'd worry if you were de-normalizing and putting everything in one table. Splitting up tables rarely causes problems. When it does cause problems, it was because things were taken too far and everything is in a separate table.
 
Zein Nunna
Ranch Hand
Posts: 245
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your advice guys.

I've made enough mistakes where I've just coded the requirements at hand and not looked into the future at all. So I was hoping to avoid my past mistakes.

I'll take your input on board.

Regards
Zein
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic