aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes search algorithm: clarity and efficiency Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "search algorithm: clarity and efficiency" Watch "search algorithm: clarity and efficiency" New topic
Author

search algorithm: clarity and efficiency

Titus Campbell
Greenhorn

Joined: Sep 17, 2002
Posts: 10
On the instructions under "Marking Criteria", it mentions that the search algorithm used will account for a fairly large chunk of the total score.
I'm confused. Since the db.db file of flights is not sorted, the only possible search algorithm is a record-by-record search of the entire file. This is clear but nowhere near efficient. I could sort the db and/or create one or more indexes on the file. This would require a major overhaul of the Data class and seems, to me, out of the intended scope of the project.
Thoughts anyone???
Thomas Fly
Ranch Hand

Joined: Sep 09, 2002
Posts: 164
You're right
Still you have to compare each record to the search criteria and determine a match. I suppose there are really stupid ways to do that which could cost you points, but fortunately I couldn't think of any.


Fly by Night Consultants<br /> <blockquote><font size="1" face="Verdana, Arial">quote:</font><hr><i>I climbed on the back of a giant albatross<br />which flew through a crack in the cloud<br />to a place where happiness reigned...<br />all year 'round<br />the music played ever so loudly!</i><p><a href="http://thomasfly.com/songs/Traffic/Hole_in_my_Shoe_qt.htm" target="_blank" rel="nofollow">Hole in My Shoe</a><hr></blockquote>
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hi Titus,

... This would require a major overhaul of the Data class and seems, to me, out of the intended scope of the project.

You definitely don't want to do that. Efficiency doesn't necessarily translate to speed in this case. What's more important is simplicity and clarity. You should be able to write criteriaFind() with between 50 and 75 lines of code. Just keep it simple.
Hope this helps,
Michael Morris


Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage - to move in the opposite direction. - Ernst F. Schumacher
John Smith
Ranch Hand

Joined: Oct 08, 2001
Posts: 2937

I'm confused. Since the db.db file of flights is not sorted, the only possible search algorithm is a record-by-record search of the entire file. This is clear but nowhere near efficient.

Here are two things that Sun might be looking for in terms of efficiency:
1. If one (or more) fields in your criteria is "any", do not do any field comparisons for that field.
2. Avoid the following:

Instead, use an algorithm that only checks the fields in the search criteria.
Eugene.
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hey Eugene,
You been hangin' out at a different forum? Glad to see a post from you again.
Best Regards,
Michael Morris
John Smith
Ranch Hand

Joined: Oct 08, 2001
Posts: 2937
Hey Michael,
I got a bit bored discussing the same topics here, but Java Ranch is such a good place and I am forever bound to it.
Eugene.
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

between 50 and 75 lines of code

Wow that's alot of lines of code. Does that include creating the criteria string and any supporting classes?
Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hi Mark,
I used a seperate interface and class for criteriaFind()'s implementation. I must admit that there is some fluff in my code for the sake of clarity. It can certainly be done in less than 50-75 lines of code. I was just trying to give an outside ballpark estimate on what may be acceptable.
Michael Morris
parthiban subramaniam
Ranch Hand

Joined: May 15, 2002
Posts: 116
Hi guys,
well my search implementation is generic and can take the any no of columns in any order and which is there in the underlying db.. before doing the search I'm making sure that the column supplied is there in the description and its not specified to manytimes in the same search criteria which serves no purpose ..
1.validating the column name is not required but its a simple thing to do.. i have implemented it is this a overkill am doing ??
2. if the caller of this method forgets to specify a value for the field he wants to search am just giving it empty string ..
//instead i can throw an exception back
3, i have implemented both the AND and OR way of searching the underlying db right now the client can't specify whether he wants to perform a AND or a OR in the supplied columns .. future enhancements and overloading the criteria method can be done to achieve this effect ..right now am using the AND based search method
not required .. but serves a basic purpose of future search's
4, it takes .. 2 seconds for the whole search operation to complete for 1000 clients searching at the same times supplying 5 column and values to match both in AND and OR ..
both jumps out of the method which is comparing a row in the table as soon as it hits a failure in the case of AND..and as soon as it succeeds in the case of OR so it might be the first column or the last column

now your thoughts on this please
cheers,
parthi.
[ September 18, 2002: Message edited by: parthiban subramaniam ]

Even crazy and silly looking problems are sometimes real.
parthiban subramaniam
Ranch Hand

Joined: May 15, 2002
Posts: 116
hi guys
nuthin from your guys so far for the above post .. am in the right track
cheers,
parthi.
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hi parthi,

1.validating the column name is not required but its a simple thing to do.. i have implemented it is this a overkill am doing ??

Do you mean the field (schema)name? If you do, then it is required. Here is the quote from the instructions:
In the event of an invalid field name being provided as part of the criteria the behavior of this method is the same as if no records matched correctly specified criteria.

2. if the caller of this method forgets to specify a value for the field he wants to search am just giving it empty string ..
//instead i can throw an exception back

You should just return an empty DataInfo[] if any part of the criteria is invalid.

3, i have implemented both the AND and OR way of searching the underlying db right now the client can't specify whether he wants to perform a AND or a OR in the supplied columns .. future enhancements and overloading the criteria method can be done to achieve this effect ..right now am using the AND based search method
not required .. but serves a basic purpose of future search's

That's probably OK. I am having trouble undestanding the benefit of an OR unless it relates to the same field. For example: "Carrier='SpeedyAir',Carrier='PromptAir". Or possibly more complex groupings like: "(Carrier='SpeedyAir',Origin='SFO'),(Carrier='BeethAir',Origin='LAX')". Maybe you should elaborate on this.

4, it takes .. 2 seconds for the whole search operation to complete for 1000 clients searching at the same times supplying 5 column and values to match both in AND and OR ..

That's plenty fast. You shouldn't concern yourself too much with perfomance anyway.
Hope this helps,
Michael Morris
parthiban subramaniam
Ranch Hand

Joined: May 15, 2002
Posts: 116
Hi Michael,
what i meant by validating the colum name is ..if the cirteria string was supplied with a invalid collum name .. I'm throwing an exception back ..idealy this shuld be dealt by the devloper cos its an error ..it serves no purpose to return a no match when the collum specified in the criteria itself is not there in the underlying db .. this is a basic fucntionality which any data base provides in its sql manipulation .. goin by the way u sugest .. just returnig a empty datainfo[] makes my life easy but serves no purpose to the sollutin we are providing .. what shall i do .. stick to the specifications or justify my implementation ..
OR provides you to search for records which have match for any values in the specified criteria meaning .. "Carrier='SpeedyAir',Origin='SFO'" will return any records which has its Carrier value as 'SpeedyAir' or Origin's vlue is 'SFO'

thanx for your comments

hungry for more
cheers,
parthi.
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451
Hi Parthi,

... goin by the way u sugest .. just returnig a empty datainfo[] makes my life easy but serves no purpose to the sollutin we are providing .. what shall i do .. stick to the specifications or justify my implementation ..

What I suggested is the minimum requiremet for the project. This is the way I did it and scored perfect on the search assesment. Having said that, there is certainly nothing wrong with throwing an exception on a malformed criteria and I would concede to you a better way to do it. It's all a matter of how important the individual candidate feels about such matters. If you do decide to throw an exception, be sure to document it and give your reasoning.
Hope this helps,
Michael Morris
parthiban subramaniam
Ranch Hand

Joined: May 15, 2002
Posts: 116
cheers Michael
will keep bugging you guys

cheers,
parthi.
Pete Lyons
Ranch Hand

Joined: Aug 18, 2002
Posts: 109
I think adding the OR ability is:
-beyond the requirements, which is bad
-inherently more complicated, less clear, and harder to maintain then the simple AND search that is described in the requirements documents
-Not that useful. Think about it, people buying tickets are going to know the exact origin and destination airports 98% (or more) of the time, which already reduces the data set significantly. Much more useful would be the ability to sort the columns in the results table by clicking the column header so you could search for all SFO-->ATL flights and sort by price ascending. (and even that I did not do)
-while flashy, it only increases your chances of making a mistake (do you have extensive unit tests for all of those OR combinations?)
-From what I gather, the grading works by starting with all points and subtracting for errors/weaknesses. It doesn't seem like they give you extra points for extra features.
parthiban subramaniam
Ranch Hand

Joined: May 15, 2002
Posts: 116
Extra features ... who wants to provide extra features .. they are bugs in a way
i just want to learn some new things and get a pass on this exam
cheers pete

parthi.
Padmaja Prasad
Ranch Hand

Joined: Nov 14, 2001
Posts: 76
Hi,
my implementation of criteriaFind look like this:
I have one public Interface (FBNHelperInterface) defines two String arrays like this:

The class FBNQueryParser implements this interface. Its constructor maps QUERY_FLD_NAME to FBN_SCHEMA in FLD_SCHEMA_MAP. Its getQueryMap(String query) method simply returns a HashMap containing Coulmn Name(FBN Schema corresponding to the field name in the query � obtained using getFBNSchema() method) and the value (supplied in the query).
The CriteriaFinder class uses ArrayList to store the values of matched records. Its criteriaFind(String criteria) first call the FBNQueryParser�s getQueryMap() method.
For each record in the database, it constructs another map (recordMap) having "FiledName - value" pair for each record. If the entrySet of both queryMap & recordMap matches, then that record will be added to the resulting ArrayList.
Does my search criteria looks very vague?? I am not validating any column names. If the column names supplied in the query String does not match the items in QUERY_FLD_NAMES, then it simply returns a null DataInfo[] object. So if my criteria is �Origin airport = �SFO�� it will return only Null object. Do I need to include additional code to tackle this condition??
Please suggest me. Your opinion is much more important to me.
Thanks
Padmaja
[ September 20, 2002: Message edited by: Padmaja Prasad ]
[ September 20, 2002: Message edited by: Padmaja Prasad ]
[ September 20, 2002: Message edited by: Padmaja Prasad ]
Padmaja Prasad
Ranch Hand

Joined: Nov 14, 2001
Posts: 76
Suggestions please......
John Smith
Ranch Hand

Joined: Oct 08, 2001
Posts: 2937

Suggestions please......

I don't think it is a good idea to base your search algorithm on the hardcoded schema and field names, as in your code. In fact, you don't need to define them at all, -- use FieldInfo instead to identify the field indexes that you need to check. That way, if database schema changes, not a single line of code in your criteriaFind() method need to change.
Eugene.
Michael Morris
Ranch Hand

Joined: Jan 30, 2002
Posts: 3451

Originally posted by Eugene Kononov:
... That way, if database schema changes, not a single line of code in your criteriaFind() method need to change.

I absolutely agree. Should the server expand to offer more than one database table, you'll be ready for any schema.
Michael Morris
Padmaja Prasad
Ranch Hand

Joined: Nov 14, 2001
Posts: 76
Thanks Eugene.
I have one more doubt.The criteria supplied will be of the form "origin = 'SFO', Destination = 'DEN'". but the FiledInfo stores the fields as "Origin airport", "Destination airport". So somewhere I have to map query fields with database schema. Or can I change the criteria as "Origin airport = 'SFO', Destination airport = 'DEN'"?
Thank you once again, your help is very much appreciated.
Padmaja
parthiban subramaniam
Ranch Hand

Joined: May 15, 2002
Posts: 116
Hi,
the field names which are mentioned in the instructions.html are just examples .. which gives you an idea how the search needs to behave..
i agree with michael and eugene try not to hardcode anything cos it makes the whole thing bound to it and not generic.
in a long run this will help cos your code can adapt to changes.
am stuck at lock
cheers,
parthi.
[ September 20, 2002: Message edited by: parthiban subramaniam ]
Padmaja Prasad
Ranch Hand

Joined: Nov 14, 2001
Posts: 76
Thank you so much for the replies.
I'll figure out the soultion in different way.
Thanks
Padmaja
Richard Yip
Greenhorn

Joined: Nov 24, 2002
Posts: 11
Hello,
I have just started working on the criteriaFind task and this thread helps me understand the task better.
From all the ideas gathered, I have decided break the functionality of criteriaFind into two task.
The first task is the parsing algorithm which breaksdown the criteria string into a search critera object.
The second task is the searching algorithm. This uses the search criteria object in searching the records.
Here the details of the two algorithm.
My assumption for the criteraString is that it has no duplicate field.

Parse Algorithm
1. Iterate the criteria string character by character.
2. At each field token, iterate the fieldInfo schema.
3. If match, create a map with the field token as the key and break the fieldInfo iteration.
4. Get the next token (i.e. value token).
5. put value token into the map.
6. put the map into a vector.
Search Algorithm
1. Iterate the list of records.
2. At each record, create a copy of the parsed vector. Each record will make changes to this copy.
3. Iterate the list of record fields.
4. At each record field,
5. Iterate the parsed vector.
6. Compare the record field with parsed vector field.
7. If the field matches, compare the values.
8. If the values matches, delete the matching field/value pair in the copy of the parsed vector. This will indicate that one search criteria is successful.
9. Check if parse vector is empty. If empty (i.e. all search criteria is successful, positive record found), add record to record list.
10. Break out of record.

This is a genric find that is capable of searching any field/value combination of records.
I will be implementing the two algo soon. Any comments from you guys?

Richard
Peter den Haan
author
Ranch Hand

Joined: Apr 20, 2000
Posts: 3252
Originally posted by Richard Yip:
From all the ideas gathered, I have decided break the functionality of criteriaFind into two task.
The first task is the parsing algorithm which breaksdown the criteria string into a search critera object.
The second task is the searching algorithm. This uses the search criteria object in searching the records.
Exactly my implementation. I had a criteriaFind(String criteria) which would do the parsing, and at the end call an overloaded private criteriaFind(String[] fieldCriteria) that did the actual search. The second method expected an array element for each field in the database, and a null String would indicate the absence of any criteria for that field.
Some will dislike the overloading, and I'm not sure I'd do it exactly the same now, but it worked well and was easy to understand. Between them they are about 70 lines of code, the bulk of which is dedicated to parsing the criteria string using a StringTokenizer. The syntax is enforced strictly and the parser throws exceptions when unexpected syntax is encountered.
Please note that you would only have to synchronize the second method (task), which improves the concurrency slightly.
My assumption for the criteraString is that it has no duplicate field.
What happens if it does?
1. Iterate the criteria string character by character.
Would a StringTokenizer simplify matters? I split on "=',", returning tokens.
5. put value token into the map.
6. put the map into a vector.
Here our implementations start to diverge. Two remarks.
  • Although you say vector, I hope you're coding against the List interface rather than any particular implementation.
  • Avoid Vector -- use ArrayList instead. The last time I explained why is here.
  • Although a Map allows you to iterate through those fields that actually have criteria against them, I would contend that for realistic field counts it is no more efficient than the naive method, which is to have a String[] array with one element for each database field, each element containing the desired value for that field, with the absence of a criterion denoted by a null value.
  • Search Algorithm
    1. Iterate the list of records.
    2. At each record, create a copy of the parsed vector. Each record will make changes to this copy.
    3. Iterate the list of record fields.
    4. At each record field,
    5. Iterate the parsed vector.
    6. Compare the record field with parsed vector field.
    7. If the field matches, compare the values.
    8. If the values matches, delete the matching field/value pair in the copy of the parsed vector. This will indicate that one search criteria is successful.
    9. Check if parse vector is empty. If empty (i.e. all search criteria is successful, positive record found), add record to record list.
    10. Break out of record.
    Whoa. What with the cloning of Vectors and nested iterations, this is really seriously convoluted and suboptimal.
    Algorithm Ia. Start with a Map mapping field names to their required values. This Map contains only those fields that are actually part of the criteriaFind string. Iterate over all records.
    1. Go to the next record in the iteration. If there is no next record, you are done.
    2. Set match to true.
    3. Start at the first database field (from FieldInfo[]).
    4. See if criteriaMap.contains(fieldName). If not, go to 6.
    5. See if criteriaMap.get(fieldName).equals(currentRecordFieldValue). If not, set match to false and break out of the field loop (7).
    6. If there are more database fields, go to the next field and loop back to 4.
    7. If match is true, add the current record to the results List.
    8. Loop back to 1.
    Algorithm Ib is an optimised version of Ia. The difference is that your input Map maps field positions to field values, rather than field names. Iterate over all records.
    1. Go to the next record in the iteration. If there is no next record, you are done.
    2. Set match to true.
    3. Iterate over the map entries (map.entrySet().iterator()).
    4. See if currentRecord[entry.getKey()].equals(entry.getValue()) (note: this is pseudocode). If not, set match to false and break out of the field loop (6).
    5. If there are more map entries, go to the next entry and loop back to 4.
    6. If match is true, add the current record to the results List.
    7. Loop back to 1.
    The advantage of this method is that you loop only over those fields that actually have any criteria against them.
    Algorithm II. Start with a String array that contains the desired value (if any) for each field. Iterate over all records.
    1. Go to the next record in the iteration. If there is no next record, you are done.
    2. Set match to true.
    3. Start at the first database field (from the value array; you don't need FieldInfo[]).
    4. If the corresponding criterion is not null, see whether field value and criterion are equal. If they are not equal, set match to false and break out of the field loop (6).
    5. If there are more fieldsd, go to the next field position and loop back to 4.
    6. If match is true, add the current record to the results List.
    7. Loop back to 1.
    The advantage of this method is simplicity and efficiency.
    - Peter
    [ January 11, 2003: Message edited by: Peter den Haan ]
    Richard Yip
    Greenhorn

    Joined: Nov 24, 2002
    Posts: 11
    Peter,
    Thanks for the quick reply. Your analysis on the algos has broaden my understanding of the find tasks. It is definately much more simple and elegant than my existing algos.
    I will take into consideration of your recommendations when I rehash my algos.
    Thanks Peter.

    Richard
    Chandra Peri
    Ranch Hand

    Joined: Sep 18, 2002
    Posts: 46
    Richard,
    I've used JDK1.4 for my implementation. The new package java.util.regex was very useful. In your case, it'd replace your parsing algo.
    Waiting for the result..
    Chandrakumar
    Richard Yip
    Greenhorn

    Joined: Nov 24, 2002
    Posts: 11
    Hello,
    I have made some changes in my algorithm, code and tested the criteria find. It is working fine. Here's my updated algorithm for criteriaFind.
    Duplicated fields and its value in the criteria string will be ignored during parsing of the criteria string.
    Invalid fields and its value will be ignored during parsing of the criteria string.

    Parse Algorithm
    0. Set boolean isKey to false. Create key and value Strings. Create searchCriteria using a HashMap.
    1. Create a tokenizer st1 using the criteria string and strip all quotations; i.e.both double and single and commas.
    2. Iterate the tokenizer st1.
    3. Create a tokenizer st2 using each token of tokenizer st1 and strip all "=".
    4. Iterate the tokenizer st2.
    5. If token is a valid Field, assign token to key and set isKey to true.
    6. If token is not a valid Field, do task 7 to 9
    7. If isKey is true, assign token to value.
    8. If searchCriteria does not contain key, put the key/value pair into searchCriteria.
    9. Set isKey to false;
    10. End Iterator for the tokenizer st2.
    11. End Iterator for the tokenizer st1.

    Search Algorithm
    0. Create a dataInfoList using an ArrayList, Get the searchCriteria.
    1. Iterate the list of records.
    2. Skip deleted records.
    3. Get the number count of criteria that the record must fullfilled before it is considered a match from searchCriteria.
    4. Iterate the record's fields.
    5. If searchCriteria.containsKey(record's field), do task 6 to 8.
    6. Get the matching field's value from the searchCriteria.
    7. If the two value matches, decrement number count; i.e. one criteria is fullfilled.
    8. If number count is zero, add record to dataInfoLIst; i.e. all criteria fullfilled, record found.
    9. End Iterator for the record's fields.
    10. End Iterator for the list of records.

    Thanks Peter for the great help in streamlining my algos.

    Richard
    aadhi agathi
    Ranch Hand

    Joined: Apr 29, 2002
    Posts: 263

    Eugene.[/qb]<hr></blockquote>
    Eugene,
    i am confused about the second point. i will extract the field,Value from the record (goes into HashMap) , matching the fied,Value of criteriaString passed (goes into HashMap) and compare it . i dont understand why this is not advisable. maybe that the 1st and 2nd options are referring the same "ANY" usage.


    split the field, Value of searchString into HashMap

    for each record
    for each field
    if field macthes searchField
    add into HashMap
    compare searchMap and recordMap
    if equals add to result.
    nextRecord


    please comment and i am sorry for this ->this

    [ January 28, 2003: Message edited by: aadhi agathi ]
    [ January 29, 2003: Message edited by: aadhi agathi ]

    Aadhi
    aadhi agathi
    Ranch Hand

    Joined: Apr 29, 2002
    Posts: 263
    any thoughts???
    John Smith
    Ranch Hand

    Joined: Oct 08, 2001
    Posts: 2937

    i am confused about the second point. i will extract the field,Value from the record (goes into HashMap) , matching the fied,Value of criteriaString passed (goes into HashMap) and compare it . i dont understand why this is not advisable. maybe that the 1st and 2nd options are referring the same "ANY" usage

    If your criteria contains an "any" qualifier for a particular field, it doesn't need to be a part of the criteria. It's that simple.
    Eugene.
    aadhi agathi
    Ranch Hand

    Joined: Apr 29, 2002
    Posts: 263
    thanks a lot Eugene!
    by any chance you have a chance to look at my implementation of criteriaFind in the other post. please let me know, if this way of handling "ANY" is ok. or do i really handle do i "really" handle any here??
    [ January 29, 2003: Message edited by: aadhi agathi ]
     
    I agree. Here's the link: http://aspose.com/file-tools
     
    subject: search algorithm: clarity and efficiency
     
    Similar Threads
    search algorithm
    document in the final submission jar!!!
    Assignment submission + essay exam done
    Create record on deleted position
    The locking in the create() operation