# Having trouble choosing a data structure

Jon Swanson

Ranch Hand

Posts: 216

posted 4 years ago

I don't think I am posting this twice. I timed out before I submitted it and it seemed to vanish when I logged back in. Since I have to type the question again, maybe at least I will be more succinct.

I am trying to implement something like this

if (x,y) in datalist

add v to list pointed to by (x,y)

compute average of v's on the list

else

add a new (x,y) item and point it to v

One trick is x,y and v are doubles.

This is what I came up with-

I'm a bit concerned about having to loop through the list every time I do anything. I don't like there is no good return value for when the point does not have an average. I can add hasValue as a method, but then I'm looping through the list even more!

BTW the real application has another layer

(x,y)->z

z->v

for each (x,y) there may be several z's each z has a list of v's all with the same (x,y,z) values. I need to compute a bunch of things based on z and avg of v(z) for each fixed (x,y).

I am trying to implement something like this

if (x,y) in datalist

add v to list pointed to by (x,y)

compute average of v's on the list

else

add a new (x,y) item and point it to v

One trick is x,y and v are doubles.

This is what I came up with-

I'm a bit concerned about having to loop through the list every time I do anything. I don't like there is no good return value for when the point does not have an average. I can add hasValue as a method, but then I'm looping through the list even more!

BTW the real application has another layer

(x,y)->z

z->v

for each (x,y) there may be several z's each z has a list of v's all with the same (x,y,z) values. I need to compute a bunch of things based on z and avg of v(z) for each fixed (x,y).

Stephan van Hulst

Bartender

Posts: 5406

52

posted 4 years ago

All of this sounds a lot like a means to an end. Why do we care about x, y and v and how they are stored?

What do they represent and what do you need them for?

What do they represent and what do you need them for?

*The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.*

Jon Swanson

Ranch Hand

Posts: 216

posted 4 years ago

x, y and z are experimental conditions under which the value v is measured.

it may be that v is measured multiple times under the same conditions, in which case, one would use the average value of v.

z and v at a constant (x,y) can be used to derive the value w.

to obtain the value of w at values of (x,y) that were not measured, a regression equation can be computed w = f(x,y)

That equation is then used to predict w for an arbitrary (x,y).

So I broke that down into

a data structure containing

a dictionary keyed on (x,y)

regression coefficients

and methods to regress on (w,x,y)

and methods to predict w for a given (x,y)

(x,y) dictionary containing

a dictionary keyed on z

w

methods to compute and return w

(z) dictionary containing

a list of v for a given z

avg v

methods to compute and return the avg v

so I'd need to say, for example, add the element(x,y,z,v). If the (x,y) was seen before, then add(z,v) to the existing element, otherwise create a new one. Adding (z,v) means that if z was seen before (in a specific (x,y) context) then add v to the existing element, otherwise create a new one.

If I wanted a prediction for (x,y), if the regression parameters existed, I'd use them, otherwise, I would query each (x,y) for w, which it might need to compute. If it did, it would need to get the avg z's, which would be computed if not already available. The values are computed each time anything is added to the object, so everything should always be up to date.

Also, this is all needing to either be retrieved from or displayed in a table in a GUI.

it may be that v is measured multiple times under the same conditions, in which case, one would use the average value of v.

z and v at a constant (x,y) can be used to derive the value w.

to obtain the value of w at values of (x,y) that were not measured, a regression equation can be computed w = f(x,y)

That equation is then used to predict w for an arbitrary (x,y).

So I broke that down into

a data structure containing

a dictionary keyed on (x,y)

regression coefficients

and methods to regress on (w,x,y)

and methods to predict w for a given (x,y)

(x,y) dictionary containing

a dictionary keyed on z

w

methods to compute and return w

(z) dictionary containing

a list of v for a given z

avg v

methods to compute and return the avg v

so I'd need to say, for example, add the element(x,y,z,v). If the (x,y) was seen before, then add(z,v) to the existing element, otherwise create a new one. Adding (z,v) means that if z was seen before (in a specific (x,y) context) then add v to the existing element, otherwise create a new one.

If I wanted a prediction for (x,y), if the regression parameters existed, I'd use them, otherwise, I would query each (x,y) for w, which it might need to compute. If it did, it would need to get the avg z's, which would be computed if not already available. The values are computed each time anything is added to the object, so everything should always be up to date.

Also, this is all needing to either be retrieved from or displayed in a table in a GUI.

posted 4 years ago

Whoa there. My advice: slow down and take things one at a time.

Let's look at the first: "a dictionary keyed on (x,y)":

Anything with a "key" in it suggests a java.util.Map.

The questions you then need to ask yourself are:

1. (Which you've already answered yourself) Do you need fast access to a specific x,y value?

2. Do the Map keys need to be

3. Do you need to get ranges of values for x,y? If so, you'll probably need a NavigableMap.

In general, the fastest type of Map for direct access to a key is a HashMap, but it is

The only other wrinkle you need to consider is this: Do you need

It might also be worth mentioning that Java already has a class that does some of what you want, called Point2D.Double, so you could either extend it for your DataPoint class or (possibly better) have your class

HIH

Winston

Jon Swanson wrote:So I broke that down into

a data structure containing

a dictionary keyed on (x,y)

regression coefficients

and methods to regress on (w,x,y)...

Whoa there. My advice: slow down and take things one at a time.

Let's look at the first: "a dictionary keyed on (x,y)":

Anything with a "key" in it suggests a java.util.Map.

The questions you then need to ask yourself are:

1. (Which you've already answered yourself) Do you need fast access to a specific x,y value?

2. Do the Map keys need to be

*ordered*? If so, you'll need a SortedMap.

3. Do you need to get ranges of values for x,y? If so, you'll probably need a NavigableMap.

In general, the fastest type of Map for direct access to a key is a HashMap, but it is

*unordered*, so if the answer to either 2 or 3 is true, the normal choice is a TreeMap, although there is also ConcurrentSkipListMap.

The only other wrinkle you need to consider is this: Do you need

*more*than one value to be stored for any given key? If so, then each key needs to be associated with a java.util.Collection of some sort. If duplicate values are allowed for any given key, this should probably be a java.util.List; if not, it should be a java.util.Set.

It might also be worth mentioning that Java already has a class that does some of what you want, called Point2D.Double, so you could either extend it for your DataPoint class or (possibly better) have your class

*include*one. The nice thing about this is that the

`equals()`and

`hashCode()`methods (which you'll need if you want to put your DataPoint in a HashMap) are already defined, so you don't have to write them again.

HIH

Winston

"Leadership is nature's way of removing morons from the productive flow" - Dogbert

Articles by Winston can be found here

Campbell Ritchie

Sheriff

Posts: 48424

56

posted 4 years ago

Have you considered an Observation object? You can create a Map which takes the two fields

Don’t use Dictionary; it is “legacy” code.

Your Observation or Condition class, whichever you use as a key, must have correctly-overridden equals(Object) and hashCode() methods.

If your Condition class implements the Comparable<Condition> interface or if you can create a Comparator<Condition>, you can use a SortedMap, which might make retrieval easier.

`x`and`y`as its key and the`v`, or the whole observation, as its value. That is probably easiest if you have a Condition class and have that as a field of the Observation class, rather than separate`x`and`y`. Obviously you can use a List of Observations as the value instead, in which case you add to the List in the Map.Don’t use Dictionary; it is “legacy” code.

Your Observation or Condition class, whichever you use as a key, must have correctly-overridden equals(Object) and hashCode() methods.

If your Condition class implements the Comparable<Condition> interface or if you can create a Comparator<Condition>, you can use a SortedMap, which might make retrieval easier.

Jon Swanson

Ranch Hand

Posts: 216

posted 4 years ago

Thanks for the advice. I first thought in terms of a HashMap. But x and y are doubles. I couldn't find any examples were double were used in anything other than arrays and ArrayLists. My concern is that unlike integers or strings, operations on doubles can result in round-off that makes them unequal. So I am used to comparing doubles if (|a - b| < threshold) abEqual = true; and I have trouble fitting that into my understanding of maps,

I hadn't come across Observation objects, that gives me something to pursue. I can define equals and comparator, less clear how to define hashCode().

I need to know if (x,y) is in the list I've created. I do need multiple items per object, I'm pretty comfortable with having a list of these.

I looked into Point2D.Double. but I will also have the case of (x,y,z) keys and Point3D.Double seemed to be missing a bunch of stuff Point2D.double implements and it seemed easier at that point to make my own class, since none of the geometry methods apply to my case.

I hadn't come across Observation objects, that gives me something to pursue. I can define equals and comparator, less clear how to define hashCode().

I need to know if (x,y) is in the list I've created. I do need multiple items per object, I'm pretty comfortable with having a list of these.

I looked into Point2D.Double. but I will also have the case of (x,y,z) keys and Point3D.Double seemed to be missing a bunch of stuff Point2D.double implements and it seemed easier at that point to make my own class, since none of the geometry methods apply to my case.

posted 4 years ago

It's not really anything to do with Maps specifically; it has to do with whether two

I think, if it were me, I would construct each of my '

1. Both objects have the same number of dimensions.

2. Both objects have the same threshold.

3. The difference between the values of each dimension is less than the threshold.

That means that:

(a) An x,y point

(b)

But it

HIH

Winston

Jon Swanson wrote:My concern is that unlike integers or strings, operations on doubles can result in round-off that makes them unequal. So I am used to comparing doubles if (|a - b| < threshold) abEqual = true; and I have trouble fitting that into my understanding of maps...

It's not really anything to do with Maps specifically; it has to do with whether two

`DataPoint`s are "equal" or not. In your case, I suspect that you need an

`equals()`method that returns true if two of your objects are "close enough", although that may be a bit tricky to dovetail with a

`hashCode()`method.

I think, if it were me, I would construct each of my '

`DataPoint`' objects (and you're right,

`Point2.Double`probably isn't appropriate in this case)

*with*a threshold, and have

`equals()`return true if, and only if,

*all*of the following are true:

1. Both objects have the same number of dimensions.

2. Both objects have the same threshold.

3. The difference between the values of each dimension is less than the threshold.

That means that:

(a) An x,y point

__cannot__be "equal to" an x,y,z point (although you could perhaps add methods to convert between the two).

(b)

`DataPoint`s with different thresholds can never be "equal" (if they could be, the function would not be symmetric).

But it

*does*allow you to create a

`hashCode()`for the object, by including the threshold value in the hash. You'd probably want to include the number of dimensions too.

HIH

Winston

Articles by Winston can be found here

Jon Swanson

Ranch Hand

Posts: 216

posted 4 years ago

I have been thinking over the responses. Thanks to everyone that answered. I am thinking we got a little afield of what I am trying to implement.

I need to display a table with 7 columns. As per suggestions, I won't worry about the table now. I think what matters is I need to come up with something that makes this process efficient-

Seven empty columns- a b c d e f g

User enters a b c d and optionally e

If e is entered, use it, otherwise calculate it by grouping the data in groups such that a, b, and c are all the same within a group and only d varies

e = f(d) and h = f'(d) within a group.

Now make new groups where a and b are the same within the group, but c varies. Note, the original values of d no longer matter, I am using e and f for which there is one value per (a,b,c)

f = f''(c,e,h) g = f'''(c,e,h)

Then calculate the final result. Note, the original values of c not longer matter, I am using f and g, for which there is one value per (a,b).

i = f''''(a,b,f,g)

The result of i is used to make a graph.

I tried to create a class heirarchy, so

class M

list of class A

class A

a

b

list of class B objects

f

g

class B

c

list of class C objects

e

h

class C

d

And at that led to this sort of logic

M.add(a,b,c,d[,e])

in M

if M has A with (a,b)

A.add(c,d[,e])

else

new A

compute i

in A

if A has B with c

B.add(d[,e])

else

new B

compute f and g

in B

C.add(d)

if not e compute e

compute h

Are we still talking about this type of structure, but making a change like-

class M

map with keys (a,b) of class A objects

class A

map with key (c) of class B objects

f

g

class B

List of class C objects

e

h

class C

d

then say

M.add(a,b,c,d[,e])

in M

if M.map(a,b)

map(a,b).add(c,d[,e])

else

map(a,b) = new A

compute i

in A

if A.map(c)

A.map(c).add(d[,e])

else

map(c) = new B

compute f and g

in B

C.add(d)

if not e compute e

compute h

or is something more extensive being suggested? If not it seems to be mostly figuring out how to use double precision numbers as keys to a map.

doubles as keys in hash maps

I need to display a table with 7 columns. As per suggestions, I won't worry about the table now. I think what matters is I need to come up with something that makes this process efficient-

Seven empty columns- a b c d e f g

User enters a b c d and optionally e

If e is entered, use it, otherwise calculate it by grouping the data in groups such that a, b, and c are all the same within a group and only d varies

e = f(d) and h = f'(d) within a group.

Now make new groups where a and b are the same within the group, but c varies. Note, the original values of d no longer matter, I am using e and f for which there is one value per (a,b,c)

f = f''(c,e,h) g = f'''(c,e,h)

Then calculate the final result. Note, the original values of c not longer matter, I am using f and g, for which there is one value per (a,b).

i = f''''(a,b,f,g)

The result of i is used to make a graph.

I tried to create a class heirarchy, so

class M

list of class A

class A

a

b

list of class B objects

f

g

class B

c

list of class C objects

e

h

class C

d

And at that led to this sort of logic

M.add(a,b,c,d[,e])

in M

if M has A with (a,b)

A.add(c,d[,e])

else

new A

compute i

in A

if A has B with c

B.add(d[,e])

else

new B

compute f and g

in B

C.add(d)

if not e compute e

compute h

Are we still talking about this type of structure, but making a change like-

class M

map with keys (a,b) of class A objects

class A

map with key (c) of class B objects

f

g

class B

List of class C objects

e

h

class C

d

then say

M.add(a,b,c,d[,e])

in M

if M.map(a,b)

map(a,b).add(c,d[,e])

else

map(a,b) = new A

compute i

in A

if A.map(c)

A.map(c).add(d[,e])

else

map(c) = new B

compute f and g

in B

C.add(d)

if not e compute e

compute h

or is something more extensive being suggested? If not it seems to be mostly figuring out how to use double precision numbers as keys to a map.

doubles as keys in hash maps

Jon Swanson

Ranch Hand

Posts: 216

posted 4 years ago

I think the suggestions about DataPoint objects and Observation objects is starting to sink in. Am I right in thinking that rather than putting the methods that calculate the intermediate objects in DataPoint, they should go int a separate object? That way DataPoint would contain nothing but a constructor to set the value, dimensions, and threshold and methods for equals and hashCode. The other object, that I will call dataValue would contain a list of values and the methods for generating and returning the intermediate values that get computed? If I have that straight I'll see if I can extend that to the third layer of nesting I actually have. I have to think a little more about where all the values that computed from the data get computed. Once I have more than one layer, I am not completely clear on how to keep the key and value cleanly separate.

Am I also right that when I am looking to see if a DataPoint is in my map, I need to do something like:

That looks pretty clean, until I then need to see if the value, which would contain a map and data has an entry on its map.

Am I also right that when I am looking to see if a DataPoint is in my map, I need to do something like:

That looks pretty clean, until I then need to see if the value, which would contain a map and data has an entry on its map.

Jon Swanson

Ranch Hand

Posts: 216

posted 4 years ago

I am still unable to see the route that allows me to use maps when my data is nested. I sorely wish I was writing in C and could use structures or Python and use dictionaries.

I have several layers of data. In the end, I need to compute, store and retreive one (f,g) pair for each (a,b). I have a lot of (a,b). I have been trying to make this work-

It all seems very convoluted and since I need to add all sorts of getters and setters and updaters to make it really do something, I was hoping to get some comment that I am going down the right road and not missing some simpler straight-forward solution to handling the data. This information is retreived and displayed in a table. I'm becoming concerned I'll be able to wrap a table model around all this, as well.

I have several layers of data. In the end, I need to compute, store and retreive one (f,g) pair for each (a,b). I have a lot of (a,b). I have been trying to make this work-

It all seems very convoluted and since I need to add all sorts of getters and setters and updaters to make it really do something, I was hoping to get some comment that I am going down the right road and not missing some simpler straight-forward solution to handling the data. This information is retreived and displayed in a table. I'm becoming concerned I'll be able to wrap a table model around all this, as well.