Duplicate probabilities

Discussion in 'Physics & Math' started by Jennifer Murphy, Nov 14, 2015.

  1. Jennifer Murphy Registered Senior Member

    Messages:
    239
    Suppose I have a bag of 5 balls numbered 1-5. If I draw 3 balls at random with replacement, I believe the probability of getting 3 different balls (zero duplicates) would be:

    (5/5) x (4/5) x (3/5) = 60/125 = 0.48​

    Is that correct?

    Then the probability of getting at least a duplicate (1 or more) would be:

    1 = 0.48 = 0.52​

    Correct?

    Now, if I run this experiment a million times (drawing 3 balls at random with replacement), how can I calculate the expected number of duplicates? That is, the expected (or average) number of draws of the 3 that will be a duplicate of a precious draw.

    I wrote a little simulator to tally the number of duplicates. This is what it came up with:

    #Draws #Duplicates
    2 . . . 0.3335
    3 . . . 0.8887
    4 . . . 1.5922
    5 . . . 2.3954
    7 . . . 4.1755
    10 . . . 7.9519
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Fednis48 Registered Senior Member

    Messages:
    725
    Your first part looks good (up to a small typo in the second equation).

    To calculate the expected number of duplicates \(\bar{N}\), the calculation is similar. But instead of calculating just one probability, you need to perform a weighted sum over the probabilities of getting various numbers of duplicates:

    \(\bar{N}=0*P(\textrm{zero duplicates})+1*P(\textrm{one duplicate})+2*P(\textrm{two duplicates})\).

    Note that \(P(\textrm{one duplicate})+P(\textrm{two duplicates})=P(\textrm{at least one duplicate})\), so we can rewrite the sum as:

    \(\bar{N}=P(\textrm{at least one duplicate})+P(\textrm{two duplicates})\).

    As you calculated, the probability of at least one duplicate is 0.48. The probability of two duplicates is easier to find, so I'll leave it to you.

    More generally, you might want to find \(\bar{N}\) for D draws and B balls. Following the above reasoning, we get:

    \(\bar{N}=\displaystyle\sum_{n=1}^{D-1}P(\textrm{at least n duplicates})\)

    where \(P(\textrm{at least n duplicates})\) depends on B. Something similar to your above technique will work for any specific D and B, but finding a symbolic solution that works for all D and B is a nifty problem that I'll leave as a brain teaser.

    Please Register or Log in to view the hidden image!

     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. Jennifer Murphy Registered Senior Member

    Messages:
    239
    Thanks for the response. Unfortunately, you left as a brain teaser the exact question I was trying to get answered. I've been working on this for an hour or two and can't make any progress. If you know the answer, I'd really appreciate having it.

    For the simple case of drawing 3 balls from a set of 5, I worked out this table, where
    D = a duplicate draw
    N = a non-duplicate draw.​

    Result . . . Calculation . . . . . . Probability . . . Comments
    NNN . . . (5/5) x (4/5) x (3/5) . . . 0.48 . . . Zero duplicates
    NND . . . (5/5) x (4/5) x (2/5) . . . 0.32 . . . 1 duplicate (3rd draw)
    NDN . . . (5/5) x (1/5) x (4/5) . . . 0.16 . . . 1 duplicate (2nd draw)
    NDD . . . (5/5) x (1/5) x (1/5) . . . 0.04 . . . 2 duplicates (2nd & 3rd draws)
    Dxx . . . . (0/5) . . . . . . . . . . . . . . . . 0.00 . . . Duplicate on 1st draw (not possible)

    I believe these numbers are correct, but I cannot see a pattern.

    BTW: Is there a way to display tables in this forum? I tried tabs and multiple spaces, but they get edited out. Most forums have a table icon. I don't see one here.

    I did work out a general formula for the probability of zero duplicates (P0) in K draws of N balls. I can't seem to get LaTex to work, so bear with me:

    P0(N,K) = N/N x (N-1)/N x (N-2)/N . . . (N-K+1)/N
    . . . . . . . . = N!/(N-K)!(N^K) = permut(N,K)/(N^K)

    For the case of N=%, K=3 (above), this gives us:

    P0(5,3) = 5!/(5-3)!(5^3) = (5x4x3)/(5^3) = 60/125 = 0.48

    That's as far as I got.
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. Fednis48 Registered Senior Member

    Messages:
    725
    Huh! Looking at the problem more closely, when I thought I had the solution earlier I was actually overcounting some terms. I don't know how to solve this problem! Maybe rpenner will show up and give us the answer.

    Please Register or Log in to view the hidden image!

     
  8. Jennifer Murphy Registered Senior Member

    Messages:
    239
    I ran some simulations to obtain some data that I could use to check any formulas anyone might come up with. And I found a way to emulate tables with this forum software.

    This table shows results for N=2, 3, 5, 10, & 35,000. The first 4 values are included because they lend themselves to manual solutions. The last value, 35,000, is the one I am interested in. The K column is the number of balls drawn with replacement. The E(Dups) column shows the results of the simulation program. It drew K balls from a set if N millions of times and averages the number of duplicates.

    Here's the data.

    Code:
        N=2          N=3           N=5          N=10           N=35,000
     K E(Dups)    K E(Dups)     K E(Dups)    K E(Dups)        K   E(Dups)
     1    0       1    0        1    0       1    0           1      0
     2  0.4999    2  0.3332     2  0.2000    2  0.0999       10    0.0013
     3  1.2500    3  0.8888     3  0.5598    3  0.2900       50    0.0355
     4  2.1248    4  1.5927     4  1.0481    5  0.9048      100    0.1410
     5  3.0626    5  2.3952     5  1.6381    7  1.7832      150    0.3178
     6  4.1312    6  3.2645     6  2.3104   10  3.4868      200    0.5677
     7  5.0157    7  4.1757     7  3.0488   12  4.8241      250    0.8882
     8  6.0080   10  7.0521    10  5.5368   15  7.0590      300    1.2776
    10  8.0020   15 12.0069    25 20.0188   20 11.2160      500    3.5478
    15 13.0001   20 17.0009    50 45.0001   50 40.0522    1,000   14.1338
    20 18.0000   50 47.0000   100 95.0000  100 90.0003   10,000 1301.5429
    
     
  9. Fednis48 Registered Senior Member

    Messages:
    725
    I don't know if you're still working on this, but if you are, I had one thought that might help.

    If you draw D balls including a total of C colors, then all but one of each color you draw will be a duplicate. Therefore, the expected number of duplicates is just D minus the expected number of colors, so you can solve your problem by finding the expected number of colors in a given set of draws. Still a tough problem, though!
     

Share This Page