public void depositData (String data) { //stores this data }

public String[] getSortedData () { //returns all data that is currently there in a sorted String array. } }

Other considerations. >used in multithreaded environment add method will be called roughly 5-10 times every second, while get called once every 10 second. > Should be able to perform for very large number of records, say a million. > there is enough memory available to run this application.

Would appreciate a detailed description of approach you would follow, including choice of data structure to hold data and sorting, synchronization issues.

Sorted in alphabetic order, they are just strings.

Dave Wingate
Ranch Hand

Joined: Mar 26, 2002
Posts: 262

posted

0

If - data items need to be returned sorted alpha - you're going to be working with a large number of records - you'll be calling both the add and get methods frequently - calling get does not remove elements from your collection - you don't plan to also provide a method like getFirstOrderdElements(int numElements)

A heap is not the way to go after all; time complexities for your operations backed by a heap would be: depositData: O(lg n) since we need only insert the new element into the heap getStoredData: O(n + n lg n) we must both copy our heap (to not destroy data) and then remove elements from the heap in order

An linked list would be the way to go; time complexities for your operations backed by a linked list would be either:

(unordered) depositData: O(1) since we would just insert new daa at end of list getStoredData: O(n + n lg n) since we must sort prior to returing a copy of the data in array form

or

(ordered) depositData: O(lg n) since we must use binary search to find proper insert location getStoredData: O(n) since all we have to do is return a copy of the data in array form

Since you plan to call the two methods with equal frequency, I'd recommend an ordered list for better performance .... because when we combine time complexities for both operations: O(1 + n + n lg n) > O(n + lg n)

You could further improve performance by allowing getStoredData to return a reference to the underlying data structure:

(ordered) depositData: O(lg n) since we must use binary search to find proper insert location getStoredData: O(1) if we don't have to make a copy of our data [ January 05, 2007: Message edited by: Dave Wingate ]

An linked list would be the way to go; time complexities for your operations backed by a linked list would be either:

(unordered) depositData: O(1) since we would just insert new daa at end of list getStoredData: O(n + n lg n) since we must sort prior to returing a copy of the data in array form

or

(ordered) depositData: O(lg n) since we must use binary search to find proper insert location getStoredData: O(n) since all we have to do is return a copy of the data in array form

Insertion into an ordered linked list is O(n), binary search assumes random access.

Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter

Dave Wingate
Ranch Hand

Joined: Mar 26, 2002
Posts: 262

posted

0

Yep ... thanks for the correction ... so using either an array based list or a linked list would result in these time complexities:

(ordered - array-based list) depositData: O(lg n + n) since we must use binary search to find proper insert location and then call the insert operation which over time amortizes to O(n) (described in API) getStoredData: O(1) if we don't have to make a copy of our data

(ordered - linked list) depositData: O(n + 1) since we may have to search the entire list for the proper insertion point, but insertion takes a constant amount of time for a linked list getStoredData: O(1) if we don't have to make a copy of our data

Although not quite as good as I led you to believe in my previous post, the ordered linked list implementation is still more efficient than the ordered array list or any unordered list structure.

Note that the implication of insert operatin having O(n) instead of O(lg n) means that the orrdered linked list is a viable option only if you really are going to call the "get" method about as frequently as you call the "insert" method. Insert of O(n) is a tough price to pay for very large collections. If it turns out that you will be calling "insert" much more frequently than you call "get," you should consider using a heap as I originally mentioned.

[ January 06, 2007: Message edited by: Dave Wingate ] [ January 06, 2007: Message edited by: Dave Wingate ]

If you don't have duplicates in the collection, I would use a TreeSet.

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com