# Don't know which algorithm to use

Joseph Tulowiecki

Greenhorn

Posts: 25

posted 4 years ago

OK. I'm trying to figure out which algorithm I can use to efficiently accomplish this task in a computer program I'm writing:

I start out with a set of numbers 1234567....n.

I have to get the two digit combinations of this number. Then the 3 digit combinations.. etc.

For example the number might be 1234

The 1st row would look like 1,2 1,3 1,4 2,3 2,4 3,4

Then the 3 digit combinations... etc.

This is the issue I'm running into. After I calculate a row of combinations of x digits. I then have to test each combination to see if it meets certain unrelated conditions. After each test, it will return True or False about the specific combination. So for example, Lets say 1,2 = true 1,3 = false 1,4 = false. 2,3 = false 2,4 = true 3,4 = false. I record all of these values before preceeding to the combinations of 3 digit sets. And here is why:

Each value in the following row can not contain, as a subset, any combination that is marked as true in the previous row. So for instance, the following row in this situation can not contain 1,2,4 because 2,4 in the previous row is marked as true. I could test every possible combination but it would be extremely inefficient because testing each combination is a greedy algorithm in itself, so my goal is to only test the absolutely necessary combinations and this is how I'm trying to go about doing it. Does anyone have any kind of advice on what algorithm may be useful to accomplish this task?

I start out with a set of numbers 1234567....n.

I have to get the two digit combinations of this number. Then the 3 digit combinations.. etc.

For example the number might be 1234

The 1st row would look like 1,2 1,3 1,4 2,3 2,4 3,4

Then the 3 digit combinations... etc.

This is the issue I'm running into. After I calculate a row of combinations of x digits. I then have to test each combination to see if it meets certain unrelated conditions. After each test, it will return True or False about the specific combination. So for example, Lets say 1,2 = true 1,3 = false 1,4 = false. 2,3 = false 2,4 = true 3,4 = false. I record all of these values before preceeding to the combinations of 3 digit sets. And here is why:

Each value in the following row can not contain, as a subset, any combination that is marked as true in the previous row. So for instance, the following row in this situation can not contain 1,2,4 because 2,4 in the previous row is marked as true. I could test every possible combination but it would be extremely inefficient because testing each combination is a greedy algorithm in itself, so my goal is to only test the absolutely necessary combinations and this is how I'm trying to go about doing it. Does anyone have any kind of advice on what algorithm may be useful to accomplish this task?

Mike Simmons

Ranch Hand

Posts: 3090

14

posted 4 years ago

One thought is, consider using a Set<Set<Integer>> (probably using HashSets) to contain all the combinations that have been marked true. This will facilitate testing quickly.

In order to generate, say, all possible 5-element sets, I would first consider all possible 4-element sets, eliminating any that contain forbidden combinations*, and then consider all the possibilities that can be achieved by adding one more element. I would only consider new elements (numbers) that are greater than those already in the set, in order to avoid redundant checking. And when checking for forbidden combinations, I would only check those combinations that include the new (fifth) element. Anything that does not include the new element should have

* This process is recursive - to eliminate all forbidden 4-element sets, you will need to eliminate all forbidden 3-element sets. Which requires the elimination of all two-element sets. Which requires elimination of all one-element sets.

That's just a rough idea, and there are certainly still details to work out. And there may well be further optimizations to make. But that's a start, at least.

In order to generate, say, all possible 5-element sets, I would first consider all possible 4-element sets, eliminating any that contain forbidden combinations*, and then consider all the possibilities that can be achieved by adding one more element. I would only consider new elements (numbers) that are greater than those already in the set, in order to avoid redundant checking. And when checking for forbidden combinations, I would only check those combinations that include the new (fifth) element. Anything that does not include the new element should have

*already*been checked when generating 4-element combinations. It's important to eliminate these early, and not keep checking what happens if we add an element to a combination that is already forbidden.* This process is recursive - to eliminate all forbidden 4-element sets, you will need to eliminate all forbidden 3-element sets. Which requires the elimination of all two-element sets. Which requires elimination of all one-element sets.

That's just a rough idea, and there are certainly still details to work out. And there may well be further optimizations to make. But that's a start, at least.

Joseph Tulowiecki

Greenhorn

Posts: 25

Joseph Tulowiecki

Greenhorn

Posts: 25

posted 4 years ago

Oh I just thought of something though. I need to generate the 1 digit combos, then the 2 digit combos, then the three all the way up to n-1. This would work if I was going from n-1 to 1. I'm thinking, and I haven't analyzed this too much yet I could be wrong, but I don't think I could get this to work by going the other way because it seems like I would have to actually generate the set first and then test if the previous forbidden sets are subsets of the new set. Am I right here? I'm going to look at this a little more this is just an initial thought.

Joseph Tulowiecki

Greenhorn

Posts: 25

posted 4 years ago

Maybe if I reversed what your saying and just added values to the non forbidden sets and then check for subsets, it will still be way slower than I wanted it to be but at least it cuts out some of the unnecessary generation of new sets. This might be as fast as it can be done. What do you think, any ideas? Wait a minute.. recursion though I could be wrong again and maybe I can make this work.. ok ill stop posting till i look into this a little more thanks again!

Joseph Tulowiecki

Greenhorn

Posts: 25

Mike Simmons

Ranch Hand

Posts: 3090

14

posted 4 years ago

Excellent, good to hear.

On second thought, the Set<Set<Integer>> can be represented much more efficiently than I suggested. If there are no more than 32 elements, you can just use a Set<Integer>. If there are no more than 64 elements, use a Set<Long>. For more, use a Set<BigInteger> or Set<BitSet>. In each of these, every possible element that could be in a set is represented by one bit. And that bit is either 0, indicating the element is not a member of the set, or 1, indicating it is a member. You can do other bit twiddling tricks to perform other operations - for example if A and B are ints representing setA and setB, then if (A & B) == A that tells you that A is a subset of B. This sort of thing should take a lot less space than a HashSet of HashSets of Integers.

You can either keep a set of all forbidden sets, or a set of all allowed sets. I would choose whichever is smaller. If there are a lot of forbidden rules, it may be easier to reverse things and only keep track of allowed things. But I will generally assume that it's easier to use a forbidden set - if that's incorrect, then you may have to reverse some of what I say, accordingly.

Note that you don't actually have to keep a set of

It does seem like this technique could still be quite computationally intensive. It may be worthwhile trying to find other ways to do this more efficiently. I'm not sure what other improvements may be possible. But it may also be instructive to implement this algorithm anyway. Maybe it will turn out to be good enough, or maybe there

I also have the feeling this may be suitable for a rules engine and the Rete algorithm. But I'm not sure; I don't understand that very well. Perhaps others here may offer suggestions in this area.

It could also be useful to know more about the "certain unrelated conditions" that you are using to test if a row is forbidden. If we knew what these conditions were, there may be other mathematical properties to be exploited.

On second thought, the Set<Set<Integer>> can be represented much more efficiently than I suggested. If there are no more than 32 elements, you can just use a Set<Integer>. If there are no more than 64 elements, use a Set<Long>. For more, use a Set<BigInteger> or Set<BitSet>. In each of these, every possible element that could be in a set is represented by one bit. And that bit is either 0, indicating the element is not a member of the set, or 1, indicating it is a member. You can do other bit twiddling tricks to perform other operations - for example if A and B are ints representing setA and setB, then if (A & B) == A that tells you that A is a subset of B. This sort of thing should take a lot less space than a HashSet of HashSets of Integers.

You can either keep a set of all forbidden sets, or a set of all allowed sets. I would choose whichever is smaller. If there are a lot of forbidden rules, it may be easier to reverse things and only keep track of allowed things. But I will generally assume that it's easier to use a forbidden set - if that's incorrect, then you may have to reverse some of what I say, accordingly.

Note that you don't actually have to keep a set of

*all*forbidden items. If {1,2} is forbidden, then there's no need to also put {1,2,4} in the forbidden set - you can already exclude any supersets of {1,2,4} based on the fact that they will also contain {1,2}, and that's enough to be forbidden.It does seem like this technique could still be quite computationally intensive. It may be worthwhile trying to find other ways to do this more efficiently. I'm not sure what other improvements may be possible. But it may also be instructive to implement this algorithm anyway. Maybe it will turn out to be good enough, or maybe there

*is*no shorter way to do this. Or maybe, in the course of implementing the algorithm, you will learn new things about the problem, and thus get new ideas.I also have the feeling this may be suitable for a rules engine and the Rete algorithm. But I'm not sure; I don't understand that very well. Perhaps others here may offer suggestions in this area.

It could also be useful to know more about the "certain unrelated conditions" that you are using to test if a row is forbidden. If we knew what these conditions were, there may be other mathematical properties to be exploited.