This week's book giveaway is in the OCAJP 8 forum. We're giving away four copies of OCA Java SE 8 Programmer I Study Guide and have Edward Finegan & Robert Liguori on-line! See this thread for details.

in a company, employee are allowed to go for lunch between 1:00pm and 2:00pm,
and each people have only 20 minutes for staying in the cafeteria,
John and Jack are work in the same company,
what is the probability that both people are in the cafeteria (even 1 second counts) ?

Disclaimer: been decade and a half since I worked on probability

Its more complicated than that. This is the approach I could come up with in 5 minutes. I have no clue if it is right, so scrutinize as you please

The total number of possibilities for having lunch is 40.

0-20
1-21
2-22
...
40-60

If A and B have lunch, they can have lunch at any of these time slots. There are 3 distinct slots

Slot 0-20 and 40-60

There is a 19/40 chance of lunch intersecting (Assuming that 0-20 and 20-40 is an invalid intersection)

Slot 1-21 through 20-40

The probability of A and B meeting each other increases as we move from one slot to the next. If A arrives at 1-21, B will meet A for the slots 0-20 through 20-40 (20 chances). If A arrives at 2-22, B will meet A from 0-20 through 21-41 (21 chances). So the chances of not meeting A reduces with each try - 19,18,17... until A arrives at 20-40. It is hard to miss A here. If B arrives at 0-20 or 40-60, then B will miss A. Missing A is a linearly decreasing function till this point.

Slot 21-41 through 40-60

The opposite happens this time. When A comes at 22-42, B can miss A on slots 0-20 | 1-21 | 2-22. B cannot come at 43 because he/she cannot have lunch. From this point, the chances of a miss is a linear increasing function 2,3,4,5...19

So the summation N(N+1)/2 - 2 for N=19 gives you the total misses for all combinations. There are 40 possibilities for each attempt.

Not sure how to proceed from there. Do I multiply the individual probabilities ? Or is this approach going to fall like a house of cards

First person takes their lunch at 1pm + x, where x in [0, 40]
Second person takes their lunch at 1pm + y, where y in [0, 40]

And I'm going to assume that x and y have an even probability distribution over that range (unlikely in practice, but we don't have anything else to go on).

The lunchtimes overlap if:
- for x <= 20, y in [0, x + 20], so probability of overlap is (x + 20)/40
- for x >= 20, y in [x - 20, 40], so probability of overlap is (60 - x)/40

Now you need to average that over the range of x. By symmetry, the two conditions above provide the same contribution anyway, so we just need to calculate one of them.

The total number of possibilities for having lunch is 40.

I don't think that's the right approach. The question specifically said to take seconds into account as well.

It's also possible that a person will take less than 20 minutes (we don't know if that was allowed by the original question, though).

Ahh I see. I just noticed the "even 1 second counts" quote. In that case every slot, say 0-20, can be broken down to 60*20 = 1200 seconds. The slots would now be categorized by seconds. 0-1200, 1-1201 and so on. But for the units, the approach would then remain the same and should be valid.

It's also possible that a person will take less than 20 minutes

I agree but disagree. I agree that a person could take less than 20 minutes, but the minimum lunch time is debatable. No on can have lunch in 1 second for example Assuming that A and B stay for exactly 20 minutes would simplify the problem.

Deepak Bala wrote: . . . The total number of possibilities for having lunch is 40.

0-20
1-21
2-22
...
40-60 . . .

That makes 41, not 40. It's like indices in an array.

Thanks for pointing that. Yep I am pretty sure there are other boundary mistakes in my post. I didnt double check any of the numbers. Just wanted to jot down an initial approach quickly.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

Matthew was right in the first place. I don't know why the rest of you are still talking about it. Seriously, counting discrete possibilities like that is just using an integer approximation to a continuous function - why bother?

If Jack shows up at 12:00, he has a 50% chance of meeting John. If he shows up later, his chance of meeting John increases linearly, up to a maximum of 100% if Jack shows up at 12:20 exactly. (Technically there is a minuscule chance that John shows up at exactly 12:00 or 12:40 and misses Jack, but this is too small a chance to be counted.) If Jack shows up after 12:20, his chances go back down again linearly, to 50% if he shows up at 12:40.

So, if Jack arrives between 12:00 and 12:20, he goes from a 50% chance of meetup to 100% chance, and the average is 75%. If Jack arrives between 12:20 and 12:40, he goes from 100% to 50%, and the average for this period is also 75%. Overall, the chance of meetup is 75%.

That's what Matthew's integrals are telling us. (Prior to the f(x) stuff.)

----

Although, since this is an interview question, I think it's worth exploring some other possibilities that avoid the simplifying assumptions that most of us have made.

Aside from some people eating in less than 20 minutes, there could be clustering effects. Maybe most people try to show up early each day, to get the best choice of food. Or maybe John or Jack despise lines, and show up late in order to avoid them. Maybe John and Jack are friends, and make a point of having lunch together often. Maybe they despise each other and choose completely different times for that reason.

Now all that sounds like it makes it impossible to determine the answer. And it does, without more data. But in an interview, we can ask questions, and imagine solutions based on gathering more data.

In particular, it seems to me that any company with such rigid rules about employee lunch times may well be tracking the times somehow. Perhaps with an electronic badging system that logs lunch start and stop times for each employee? There may well be a database or log file somewhere with real data in it. In which case, for a particular pair of employees Jack and John, we could write a query or program to determine how often they actually do have lunch together. This would factor in most all the unknowns that we've been avoiding. Although it would be worthwhile to look at historical trends as well - perhaps they used to be friends, but now avoid each other. Whatever. The point is, an answer based on gathering real data is likely to be far more accurate than an answer based on assumptions we make just because they simplify the problem.

We don't know which of these approaches the interviewer may be interested in. If I were the interviewer, I think they would both be relevant.

What are they serving for lunch? If it is not tasty, then employees might not stay for 20 minutes, maybe they stay for 2 minutes.

Also, to get the actual probability that Jack and John are in the cafeteria at the same time, you would need to know how many employees will be going to the cafeteria for lunch, i.e. sample space.

good point. How many people does the fire marshal allow in the cafeteria at any given time? If the second person arrives and the cafeteria is full, they have to wait until someone leaves - possibly more than one if there is a queue of people already waiting to get in.

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187

posted

0

We need the sample space to do the probability equations properly. This is true even without all of the assumptions mentioned.

If you were playing Roulette and wanted to know the probability of the ball landing in the #7 slot on a single spin, you would need to know how many slots there are on the wheel.

If John and Jack are the only employees that use the cafeteria, that is fine, the sample space is 2.

If 100 employees are using the cafeteria and we want know the probability of John and Jack in the cafeteria at the same time, that is fine, the sample space is 100. The probability value will be different than the one with a sample space of 2.

The question above as written does not contain enough information to actually get an accurate probability value, i.e. perform the correct probability calculations.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

I disagree - the other employees are irrelevant, unless they influence the behavior of Jack and John. (E.g. by overcrowding the cafeteria as Fred suggests.). But, if we can assume that both Jack and John choose their start times randomly in an even distribution from 12:00 to 12:40, and if we can assume that both Jack and John stay for exactly 20 minutes - then the other employees do not matter. The question was about whether Jack and John would be present in the cafeteria at the same time on a given day. Other employees are irrelevant.

If the question had been whether Jack would encounter any other employees, or whether Jack would encounter John first (before another employee did) - then those other employees would be relevant. As it is though, they aren't.

Chrix Wu wrote:in a company, employee are allowed to go for lunch between 1:00pm and 2:00pm,

If they are allowed to go for lunch between 1pm and 2pm, then they could conceivably leave their desk at 2pm to go to the cafeteria, thereby increasing the time range.

Chrix Wu wrote:what is the probability that both people are in the cafeteria (even 1 second counts) ?

Doesn't ask about whether they are there simultaneously, or even whether they are there at lunchtime. They could be cooks and be there from 12pm through to 3pm. They could be the pencil-pushers who are counting every second that an employee spends in the cafeteria. They could be painters who are there for the entire day except for a 40 minute window.

Assuming that the 20 minutes lunch time must fall in the interval 1 pm(inclusive) to 2 pm(exclusive)

Then each employee's lunch must start in the interval 1 pm(inclusive) to 1:40 pm(exclusive)

That is, in an interval of 40 * 60 seconds.

They are in the cafeteria at the same time if the difference of the start times of their lunch is less than 20 * 60 seconds

This resolves to finding the probability of the difference of two random numbers in the interval [0..40*60] being less than 20*60

I'm still waiting for the run to complete, this machine is an old P4-2.4 ... ok, it comes up with 74.98%. I would guess that a rigorous solution using probability theory might result in 75%.

Most appropriately, this was done during my lunch break

luck, db
There are no new questions, but there may be new answers.

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187

posted

0

Probabilities are only indications of how likely events are; they're NOT guarantees.

Probability = John and Jack in cafeteria at same time / number of ways employees can be present in cafeteria at same time

Just ran it again at home up to Integer.MAX_VALUE and still got 74.98. I changed the code to use BigDecimal to rule out floating point errors, and ran it though 10 iterations.

The final code:The output:

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187

posted

0

Wonderful! Did you use all of your twenty minutes for lunch to write this code?

Including the time it took to run on a P4-2.4, probably

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187

posted

0

lol

Using the possible time slots as the key measure to determine the "probability" that Jack and John will be in cafeteria at the same time would strongly depend upon a narrow and strict constraint that employees are in the cafeteria for all of their allotted 20 minutes. However, in real-life, some employees may come in for 5 minutes, some will come in for 10 minutes and some may come in for 15 or 19 minutes, some may come in for 2 minutes.

The equation to determine a real-world "statistical" probability would be different, and would require quantitative data reflecting employee usage of the cafeteria.

x = John and Jack in cafeteria at same time
y = number of ways employees can be present in cafeteria at same time

P(x) = n(x) / n(y)

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

I believe many people have already acknowledged the real-life considerations that make this much more complex. What I'm not clear on though: when you talk about the importance of knowing the total number of employees, do you believe it has importance even given the simplifying assumptions stated above by several people, namely:

(a) Each employee picks a start time randomly in the range 12:00-12:40 with a uniform probability distribution, and

(b) Each employee stays for exactly 20 minutes of lunch.

Sure, there are various reasons these will not be true in the real world, we all agree on that. But my question is, if they were true (or were acceptably close to being true), would you still insist on the importance of knowing the number of employees?

Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 44550

34

posted

0

I would have thought the probability of the two meeting is independent of the number of employees.

an I think that is the ultimate point of questions like this. They are less about "getting the right answer" than how you approach the problem, what assumptions you make, whether you realize you are making any assumptions, etc.

IF i asked a question like this, I would want the interviewee to talk through the problem, ask these very questions, sketch out an approach, etc. Anyone who just says "The answer is X" gets lower marks than someone who never comes up with an answer, but picks away at the requirements and unknowns.

Jimmy Clark
Ranch Hand

Joined: Apr 16, 2008
Posts: 2187

posted

0

To get the "statistical" probability of two specific employees being in the cafeteria in real-life, you would need to gather data. You would collect employee names, frequencies, durations, etc. With this data you could then determine the "probability" of two specific employees being in the cafeteria at the same time, using known "statistical" probability equations.

x = John and Jack in cafeteria at same time
y = number of ways employees can be present in cafeteria at same time

P(x) = n(x) / n(y)

The equation above would be the final one. There would be other required equations to get the values for x and y.

So, it is not so much about the number of employees, it is the number of combinations possible that matters, e.g. the y value, and here you will know the # of employees.

Aside, in terms of the math puzzles above with the time solts, this is not "statistical" probability and the number of employees or any other data would not matter. All you need is the time slots and the unrealistic constraints mentioned. Not very helpful if I need to make multi-million dollar business decisions about my employees cafeterias.

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

So I guess that was "no".

Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3028

10

posted

0

Anyway, even for the "real world" version of the problem, I had already talked about the importance of gathering actual data. This is not a new idea. And frankly, the total number of employees STILL DOES NOT MATTER. (At least not directly - not as something we have to measure or think about.*) Those other employees aren't the "sample space" - they're irrelevant. The sample space is the total number of days for which you have records of when people went to lunch. Other than that, all you need to know is, how many times did Jack and John actually meet? The probability is then estimated as

P = (number of days when actually J&J met for lunch, according to records) / (number of days in the records)

Well, we might want to limit this to some range of recent history. Certainly exclude any records before J&J were hired, or before current cafeteria rules were in effect. And it would be interesting to know if P changes significantly if we only look at the last month, or if we include their whole history.

* The important thing here is that by measuring the end result - did they meet or not? - we implicitly include all the more complicated effects that might have contributed to this outcome. Even if there is cafeteria overcrowding, for example, its effect has already been included in the final outcome, which is what we're measuring.