Variation on dupicate birthday problem

Discussion in 'Physics & Math' started by Jennifer Murphy, Oct 29, 2015.

  1. Jennifer Murphy Registered Senior Member

    Messages:
    239
    A Taipei game reports a game number for each game it generates. I got curious about how often it generated duplicate games. So I set up a spreadsheet to keep track of each game number and check for duplicates. The game numbers are mostly 4 and 5 digits, such as 27025, 5403, 16548, 23512, 30182, 28534, etc...

    I got the first duplicate game number on game #112 (7962). I am now up to 188 games and have not gotten a second duplicate.

    I decided to calculate the expected number of duplicates. In the classic birthday problem, the probability of no duplicates is calculated first. The odds of no duplicates is a group of 5 people is

    365/365 x 364/365 x 363/365 x 362/365 x 361/365 = 0.972864​

    So the odds of one or more duplicates is 1=0.972864 or 0.02713. It turns out that it only takes 23 people for the odds to greater than 50%.

    To solve my problem, I need to know the number of possible game numbers. Since I don't know that, I need way a way to calculate it. Two ways occurred to me: (1) keep track of the largest game number and (2) calculate the average game number. Of these, the latter should be the better estimate.

    If I generate 1,000 random integers from 1 to 100,000, I would expect the average to be very close to half of that or 50,000.

    The average game number of the 188 games is 17,449. This suggests that the maximum game number is close to 34,898. Using the formula above, the odds of one or more duplicates when generating random numbers on that range is 39.63%.

    When I saw that, I was initially surprised that I had only 1 duplicate and wondered if I had done the math right. So I wrote a little simulation to generate random integers on (1, 34898) and check for duplicates. In 6 simulations, I got my first duplicate at game number 234, 336, 196, 77, 383, and 145. So I guess I did the math right.

    Does anyone disagree?

    I see now that my error (I think) was in confusing the 39.63% odds of having one or more duplicates in 188 draws so far with a 39.63% chance of a duplicate on the next draw. That would be the number of game numbers drawn so far divided by the total number of game numbers. I have drawn 188 game numbers with one duplicate, so 187 game numbers have been drawn so far. The odds of a duplicate on the next draw, then, would be:

    (188-1)/34898 = 0.5358%​

    Any errors there?

    Now I would like to calculate the expected number of duplicates after K draws. Can someone help me understand how to do that?

    Thanks
     

Share This Page