Suppose I have a bag of 5 balls numbered 1-5. If I draw 3 balls at random with replacement, I believe the probability of getting 3 different balls (zero duplicates) would be: (5/5) x (4/5) x (3/5) = 60/125 = 0.48 Is that correct? Then the probability of getting at least a duplicate (1 or more) would be: 1 = 0.48 = 0.52 Correct? Now, if I run this experiment a million times (drawing 3 balls at random with replacement), how can I calculate the expected number of duplicates? That is, the expected (or average) number of draws of the 3 that will be a duplicate of a precious draw. I wrote a little simulator to tally the number of duplicates. This is what it came up with: #Draws #Duplicates 2 . . . 0.3335 3 . . . 0.8887 4 . . . 1.5922 5 . . . 2.3954 7 . . . 4.1755 10 . . . 7.9519
Your first part looks good (up to a small typo in the second equation). To calculate the expected number of duplicates \(\bar{N}\), the calculation is similar. But instead of calculating just one probability, you need to perform a weighted sum over the probabilities of getting various numbers of duplicates: \(\bar{N}=0*P(\textrm{zero duplicates})+1*P(\textrm{one duplicate})+2*P(\textrm{two duplicates})\). Note that \(P(\textrm{one duplicate})+P(\textrm{two duplicates})=P(\textrm{at least one duplicate})\), so we can rewrite the sum as: \(\bar{N}=P(\textrm{at least one duplicate})+P(\textrm{two duplicates})\). As you calculated, the probability of at least one duplicate is 0.48. The probability of two duplicates is easier to find, so I'll leave it to you. More generally, you might want to find \(\bar{N}\) for D draws and B balls. Following the above reasoning, we get: \(\bar{N}=\displaystyle\sum_{n=1}^{D-1}P(\textrm{at least n duplicates})\) where \(P(\textrm{at least n duplicates})\) depends on B. Something similar to your above technique will work for any specific D and B, but finding a symbolic solution that works for all D and B is a nifty problem that I'll leave as a brain teaser. Please Register or Log in to view the hidden image!
Thanks for the response. Unfortunately, you left as a brain teaser the exact question I was trying to get answered. I've been working on this for an hour or two and can't make any progress. If you know the answer, I'd really appreciate having it. For the simple case of drawing 3 balls from a set of 5, I worked out this table, where D = a duplicate draw N = a non-duplicate draw. Result . . . Calculation . . . . . . Probability . . . Comments NNN . . . (5/5) x (4/5) x (3/5) . . . 0.48 . . . Zero duplicates NND . . . (5/5) x (4/5) x (2/5) . . . 0.32 . . . 1 duplicate (3rd draw) NDN . . . (5/5) x (1/5) x (4/5) . . . 0.16 . . . 1 duplicate (2nd draw) NDD . . . (5/5) x (1/5) x (1/5) . . . 0.04 . . . 2 duplicates (2nd & 3rd draws) Dxx . . . . (0/5) . . . . . . . . . . . . . . . . 0.00 . . . Duplicate on 1st draw (not possible) I believe these numbers are correct, but I cannot see a pattern. BTW: Is there a way to display tables in this forum? I tried tabs and multiple spaces, but they get edited out. Most forums have a table icon. I don't see one here. I did work out a general formula for the probability of zero duplicates (P0) in K draws of N balls. I can't seem to get LaTex to work, so bear with me: P0(N,K) = N/N x (N-1)/N x (N-2)/N . . . (N-K+1)/N . . . . . . . . = N!/(N-K)!(N^K) = permut(N,K)/(N^K) For the case of N=%, K=3 (above), this gives us: P0(5,3) = 5!/(5-3)!(5^3) = (5x4x3)/(5^3) = 60/125 = 0.48 That's as far as I got.
Huh! Looking at the problem more closely, when I thought I had the solution earlier I was actually overcounting some terms. I don't know how to solve this problem! Maybe rpenner will show up and give us the answer. Please Register or Log in to view the hidden image!
I ran some simulations to obtain some data that I could use to check any formulas anyone might come up with. And I found a way to emulate tables with this forum software. This table shows results for N=2, 3, 5, 10, & 35,000. The first 4 values are included because they lend themselves to manual solutions. The last value, 35,000, is the one I am interested in. The K column is the number of balls drawn with replacement. The E(Dups) column shows the results of the simulation program. It drew K balls from a set if N millions of times and averages the number of duplicates. Here's the data. Code: N=2 N=3 N=5 N=10 N=35,000 K E(Dups) K E(Dups) K E(Dups) K E(Dups) K E(Dups) 1 0 1 0 1 0 1 0 1 0 2 0.4999 2 0.3332 2 0.2000 2 0.0999 10 0.0013 3 1.2500 3 0.8888 3 0.5598 3 0.2900 50 0.0355 4 2.1248 4 1.5927 4 1.0481 5 0.9048 100 0.1410 5 3.0626 5 2.3952 5 1.6381 7 1.7832 150 0.3178 6 4.1312 6 3.2645 6 2.3104 10 3.4868 200 0.5677 7 5.0157 7 4.1757 7 3.0488 12 4.8241 250 0.8882 8 6.0080 10 7.0521 10 5.5368 15 7.0590 300 1.2776 10 8.0020 15 12.0069 25 20.0188 20 11.2160 500 3.5478 15 13.0001 20 17.0009 50 45.0001 50 40.0522 1,000 14.1338 20 18.0000 50 47.0000 100 95.0000 100 90.0003 10,000 1301.5429
I don't know if you're still working on this, but if you are, I had one thought that might help. If you draw D balls including a total of C colors, then all but one of each color you draw will be a duplicate. Therefore, the expected number of duplicates is just D minus the expected number of colors, so you can solve your problem by finding the expected number of colors in a given set of draws. Still a tough problem, though!