I can probably think of 100 alternatives to this, most of them irrelevant to the program you are presumably trying to write.
My point is there is not enough information here to advise on a good way to do this (as opposed to just another way to do this); It will depend on lots of things. How often do you expect to do this? Are the strings all equally likely to be needed? Is this part of a program that is liable to need it to optimize memory use? Or I/O? Or CPU? Is it going to run for a long time, or does your program do this once and then go off and do something else most of the time?
Joined: Apr 11, 2005
Yes, good point, I thought I would omit the details for clarity...
Its an opening book for my chess engine. Each entry contains a 64bit zobrist key of the position together with
a small selection of possible moves and weights.
I can wrap each entry into 1 number of about 30 digits, or less in hexadecimal.
So the opening book file(s) will only be read at most 12 times (during the start of the game), say once a minute for 12 minutes.
But I would still like it to perform quickly, say under 1 second, just to keep things fast.
Have also just been looking at RandomAcessFile in java, and this might be a good way to do it.
Also, I don't really want to be loading the file into the java program cause I need as much memory as I can for other parts of the program that take up big arrays.
I think you'd be better of using a database rather than files for this. Databases are designed to quickly look up records in a large dataset.
Joined: May 29, 2005
While I agree that this is a major use of databases, I can see wanting to avoid a general-purpose relational database management system in this case.
RDBMs are built, necessarily, for the general case; they occupy large amounts of memory and take up extra processing power in order to make things flexible. They are good at what they do in general, and it is possible that this would be a good, or at least a possible, solution to this problem. But I would worry about saddling my heap with the objects generated by the RDBMS, which I could not control, for a chess-playing program.
A chess-playing program is one of these things that occupies all your available memory and processing power and screams for more. I would be careful about putting an RDBMS in one; if I did, I would be careful to abstract all use of it so I could replace it with a special-purpose equivalent with a minimum of trouble.
I've not done anything significant with random-access files in java, but from reading the runtime javadoc it appears they may suit your case. You will need some way to translate your key into the position you want to seek, and of course you want to minimize seeks. If it were me, I would do tests on multiple seeks in different size files, preferably on the most likely target OS, to try to determine if the splitting into different files made sense.
I would guess that opening a file would be expensive compared to seeking in one that was open, and that reading would be less expensive than either of those.
A relational database is not necessarily a huge piece of software that uses massive amounts of disk space and / or processing power.
You could use something like HSQLDB or Apache Derby, both small relational database systems that you can even run embedded in your application (which means that the database server runs in the same JVM as your application, not as a separate process that you have to connect to).
Certainly there are smaller and larger RDBM systems, and I have not made any survey of which ones are and are not large and so forth. I have some points that I still think are relevant here, however:
1. Any RDBM system is general purpose, and in order to maintain general-purpose flexibility, a system usually has to use more CPU cycles, memory, etc., in comparison with special-purpose code.
2. An RDBMS that is regarded as "small" is usually being compared to other RDBMS, not to doing the same job for a specific purpose with code crafted for that purpose.
3. The purpose for which the OP wants this is VERY limited for an RDBMS, and it does not seem difficult to fulfill the purpose without an RDBMS.
4. If you use any RDBMS, you lose *some* control over the use of CPU and memory that you can keep better if you craft the code for your specific purpose.
5. The program the OP is writing has EXTREME needs in both CPU and memory use. So it makes sense to examine carefully any commitment made in either of these areas at the outset.
As I said, some rdbms *might* fulfill what he needs, but I would make more sure than usual that I could detach the entire RDBMS and replace it with specific-purpose code if I ever expected it to, for instance, play tournament chess at any level.