Stephan van Hulst wrote:Well, the example input you gave will just result in a ridiculously large product. Your memory complexity is in Ө(m^n), where m is the number of elements in the largest set, and n is the number of sets.
Stephan van Hulst wrote:Even if you used an optimized algorithm that could construct the product in a reasonable amount of time, you still need to keep 34,828,517,376 vectors in memory. If we assume that a reference is 4 bytes large, you need around 130 GB to store just the references to the vectors, and then we haven't even considered the contents of the vectors.
Stephan van Hulst wrote:What you want is simply not feasible.
I hate signatures!
. . . Stephan said assuming 4 bytes per reference, which makes 1,129,718,145,924, which is approx. 1130GB, even more unmanageable than he thought initially. Of course, in order to store than many references, you would need a 64‑bit file system, and that will double the size of the collection. Since that amount of storage is only feasible in a linked list or deeply nested arrays, that will increase the storage requirement yet again. I shall let you work out how long it would take to iterate a 280GB linked list; have you got a few days to spare?Punit Jain wrote:. . . In my dataset: m=9, and n=12. Hence m^n would be: 2,82,429,536,481.
How did you calculate 130 GB?
Piet Souris wrote:I don't know your requirements, but must you store the full Cartesian Product? For any quadruple (x, y, z, w) it is easy to calculate the corresponding cp.
Campbell Ritchie wrote:. . . Stephan said assuming 4 bytes per reference, which makes 1,129,718,145,924, which is approx.
Campbell Ritchie wrote:Since that amount of storage is only feasible in a linked list or deeply nested arrays, that will increase the storage requirement yet again. I shall let you work out how long it would take to iterate a 280GB linked list; have you got a few days to spare?
Stephan van Hulst wrote:The Big Theta expression I gave expresses an order of growth, it's not a function you can use to calculate the actual memory requirements.
The input you used has a set of 9 elements, a set of 6 elements a set of 8 elements, and repeats this 4 times.
The number of times that the inner loop is executed is (9*8*6)^4. That's also the number of vectors stored. Multiply by 4 to get the number of bytes, and divide by 1024^3 to get the number of gigabytes.
Punit Jain wrote:But eventually, I will need the full cartesian product. In order to have all the combinations, I will have to have all the sets. Can you please elaborate?
Paul Clapham wrote:
Punit Jain wrote:But eventually, I will need the full cartesian product. In order to have all the combinations, I will have to have all the sets. Can you please elaborate?
Do you need the full cartesian product, in memory, all at the same time?
Remember that there are about 35,000,000,000 vectors in that full cartesian product. We've already established that they don't all fit into memory. So perhaps you don't need them all at the same time. Maybe you don't actually expect to use all 35 billion of them, so you could just calculate the ones you actually use, at the time you use them.
Piet Souris wrote:I don't know your requirements...
Paul Clapham wrote:
Piet Souris wrote:I don't know your requirements...
I don't know your requirements either. I only know that the fact is that you can't store all of the data in memory, and you don't have enough time to store it all somewhere else. So computing relevant subsets as they are needed might be a good strategy. Which subsets are the relevant ones, I can't tell.
Punit Jain wrote:Okay. my bad I should have written this earlier. I don't have exact requirements. I just picked a random problem wrote some code and now trying to optimize it for my own understanding.
Paul Clapham wrote:
Punit Jain wrote:Okay. my bad I should have written this earlier. I don't have exact requirements. I just picked a random problem wrote some code and now trying to optimize it for my own understanding.
Yes, I suspected that was the case. Working on "random problems" is a useful way to improve your skills, but when you don't have any requirements then you can't really ask what you should do.
Stephan van Hulst wrote:You still haven't given requirements. Questions about performance are meaningless as long as it's unknown what a correct program should output.
I hate signatures!
Punit Jain wrote:
Piet Souris wrote:I don't know your requirements, but must you store the full Cartesian Product? For any quadruple (x, y, z, w) it is easy to calculate the corresponding cp.
But eventually, I will need the full cartesian product. In order to have all the combinations, I will have to have all the sets. Can you please elaborate?
I hate signatures!
Stephan van Hulst wrote:
prefixes.forEach(prefix >
forEachCombination(remainingInput, smallerCombination > {
List<T> combination = smallerCombination;
combination.add(0, prefix);
action.accept(combination);
})
);
}
[/code]
Liutauras Vilda wrote:@OP If that's for an interview, avoid such writings as: if (input.length <= 0), how that can ever be less than 0?
Piet Souris wrote:
Well, I wondered why you would need the full Cartesian Product.
Piet Souris wrote:
Can you give a use case where you need to have available the full Cartesian Product?
Piet Souris wrote:
But what I meant was: if you have N lists, with sizes S(1), ...., S(N), then you can convert easily each long L into the Ntuple (i1, i2, ..., iN), and the corresponging product would then be simply List(1).get(i1) + ... +List(N).get(iN).
so you can't have a more compact representation then just the input lists.
Punit Jain wrote:
Piet Souris wrote:
Well, I wondered why you would need the full Cartesian Product.
What I was thinking was, I will have all the combinations in memory and whenever I need one, I will get it from memory but it's inappropriate.
I can imagine circumstances under which that would be a reasonable requirement. I have a model checker for the Dining Philosophers problem in a procedural language, and that reduces the state of the philosophers and their forks to a single int. You convert it back with the remainder operator (called modulo across the Channel). If you want elements A₁, B₂, C₃ and D₄, and the sizes are A, B, C, D = 10 11 12 13, you can calculate the index with 4 + 3 × 13 + 2 × 13 × 12 + [1 × ]13 × 12 × 11. That will be out of 13 × 12 × 11 × 10 = 17160 discrete indices. As long as the number datatype you are using is large enough that you don't suffer an overflow error, you can reverse the procedure and calculate element number 2,037,655 in constant time. If we have input vectors size 7 8 9 10 11 12 13, you would use 2,037,655 % 13, 2,037,655 ÷ 13 % 12, 2,037,655 ÷ 13 ÷ 12 % 11, etc. as the indices, the first result on the right and the last result on the left. Those indices would multiply to give 13! ÷ 6! = 8648640 different combinations.Paul Clapham wrote:. . . I need element number 2,037,655 or something like that. Which all seems quite improbable to me.
Punit Jain wrote:Okay. I am totally lost here. So its recursively calling forEachCombination for the number of sets and returning every combination to the main. But how exactly its working?
Also, is it possible to convert it to an iterative solution?
In my solution, instead of storing all the combinations into combinations vector, I should just return every combination?
Piet Souris wrote:I had this in my library, but I never checked for efficiency:
(@Stephan: that was from before I promised to practise ? super T, ? extends T )
Stephan van Hulst wrote:You can convert recursive solutions to iterative solutions by making the recursion stack explicit. However, the code for that usually isn't pretty and the recursive solution is much easier to understand. If you're not comfortable with recursion, I strongly recommend that you practice because it's an incredibly important tool.
Stephan van Hulst wrote:How are you going to return every combination? [/quote
I mean when I am making combinations, I will put my action in there to perform a task for the combinations. For example
Stephan van Hulst wrote:
You could pass an action from the main method that you want to perform on every combination, but because your algorithm walks through the tree of partial combinations breadthfirst, it still needs to keep all partial combinations in memory, which again will crash your application. The fact of the matter is that you can't solve this problem in Ө(n) space complexity (n being the number of sets in the input, the size of those sets being of no consequence) without using recursion or a stack machine.
I love a woman who dresses in stainless steel ... and carries tiny ads:
Devious Experiments for a Truly Passive Greenhouse!
https://www.kickstarter.com/projects/paulwheaton/greenhouse1
