Statistics Problem finding the error

Discussion in 'Physics & Math' started by Vern, Dec 21, 2005.

  1. Vern Registered Senior Member

    Messages:
    695
    I'm trying to test the accuracy of the odds predicted for a series of events, for example, the probability of winning horse races. In the example below, there are 20 races. The odds were calculated based upon probabities where the sum of the recipricals correctly equals one.

    Predicted odds are counted and stored; the number of times there is a win is counted and stored, and then I test the accumulated deviation between predicted and actual.

    If done correctly, it seems to me that the accumulated deviation should finally become zero at some odds number as both predicted and actual counts are exhausted. However, I can't make it be so. The more races I test it with, the more accumulated deviation I get.

    Code:
    <pre>
    Calculated probability:
    Odds to one      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    Qty these odds   1  7 12 24 17 16 14  7  4  6  4  3  2  1  3  3  2  1  3 59
    Predicted wins   0  2  3  5  3  2  2  0  1  0  0  1  0  0  0  0  0  0  0  3
    Actual Wins      0  0  1  4  2  2  2  0  0  1  1  1  0  0  0  0  2  0  1  3
    Accum deviation  0 -2 -4 -5 -6 -6 -6 -6 -7 -6 -5 -5 -5 -5 -5 -5 -3 -3 -2 -2
    </pre>
     
  2. Google AdSense Guest Advertisement



    to hide all adverts.
  3. Vern Registered Senior Member

    Messages:
    695
    I think I'm missing something in calculating the probabilities. There are only 20 races in the sample above, but there are 22 predicted winners. Clearly not possible. But I can't figure out whether the error is in calculating the odds, or in calculating the number of winners the odds predict. The number of winners should be:
    Qty at these odds / these odds + 1.
    The odds of this horse winning based upon point scores should be:
    Total point score of all horses / point score of this horse.

    Which is wrong ??
     
  4. Google AdSense Guest Advertisement



    to hide all adverts.
  5. DaleSpam TANSTAAFL Registered Senior Member

    Messages:
    1,723
    Could you provide a little more information. I know probability, but I don't know horse races. How many horses/race? What exactly are the "odds to one" row and the "qty these odds" row? Exactly where did you get that info, i.e. is this simulation or actual race data that you are trying to analyze?

    -Dale
     
  6. Google AdSense Guest Advertisement



    to hide all adverts.
  7. Vern Registered Senior Member

    Messages:
    695
    Hi; thanks for your response and your help. The sample shown is 20 races; in horse racing on the program is the "Morning Line" which is the racing stewards idea of what the odds of each horse should be. The morning line is not statistically correct because the sum of the recipricals will not add up to one. I want to convert the ML odds to real odds and test the accruacy of the track steward's predictions.

    First to convert to real odds.
    ML = Morning Line.
    HS = Horse Point Score.
    TS = total of point scores of all horses in a race.
    Odds = Each horses converted real odds - 1 ( - 1 to account for the way the track pays the odds -- 2 * odds + 2)

    This is what I am doing; it is almost working, but is off somewhere:
    Convert each horse's ML to point scores.
    HS = ( 1 / ML ) * 100
    TS = the sum of HS of all horses in a race.
    Odds = (TS / HS) - 1

    I count and store the odds numbers in a container from one to 20. All less than one goes in one; all more then 20 goes in 20. In the sample, this is from 20 races. The number of horses per race varies, min 4 max 14.

    The predicted number of times that each odds number wins should be, it seems to me:
    PW = Count / (OV + 1)
    Where
    OV = Odds value from one to 20.
    Count = sum of the horses assigned each particular odds value.
    PW = Predicted wins based upon the Count stored in each odds value.

    I think my problem is in the decimal remainder between the Odds Values. For example, when the contents of OV(1) is divided by the number OV + 1, there is a decimal. How does that decimal apply to the next higher odds value??

    In the sample posted, I am multiplying the decimal portion by (OV + 1) before accumulating the next higher OV value. I've tried dropping it, leaving it alone, passing it around the subsequent divisioin by (OV + 1) and several I can't remember.
     
  8. Vern Registered Senior Member

    Messages:
    695
    Well; I got it working; but don't know exactly which fix fixed it ?? I kept the decimal portion of each predicted win and applied it to the next highest odds value without trying to adjust it for the new odds level. Seems to work, but still doesn't seem right to me.
    Code:
    Calculated probability:
    Odds to one      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    Qty these odds   3  8 21 49 77105129 82 97 72 80 48 37 25 11  5  9  7  8 90
    Predicted wins   1  3  5 10 13 15 16  9  9  7  7  3  3  2  0  1  0  1  0  0
    Actual Wins      2  3  5 16 18 16 13 12  7  4  4  2  0  1  0  0  1  0  1  2
    Accum deviation  1  1  1  7 12 13 10 13 11  8  5  4  1  0  0 -1  0 -1  0  2
    
     
  9. Vern Registered Senior Member

    Messages:
    695
    The number 2 remaining in odds column 20 is due to the grouping of everything above 20 into that container.
     
  10. DaleSpam TANSTAAFL Registered Senior Member

    Messages:
    1,723
    OK, I think I see what you are doing. The key point is your calculation of the actual probabilities by generating a point score for each horse. It looks like your model is working decently well with this set of data. What you will want to do is try it on other sets of data and see if it still works well.

    You might think of allowing a fractional number of predicted wins rather than rounding and applying to the next. So, lets say that you predict 1.7 wins. You will never get no deviation, but if your model is right you will probably get only a -.3 deviation. In a similar manner you may leave your odds values as raw probabilities. You can then "bin" your probabilities to do your analysis. But that might make it more difficult to really analyze the steward's performance. I guess the question is if you really want to analyze the steward or if you want to use the steward as input to a model to analyze the races.

    -Dale
     
  11. Vern Registered Senior Member

    Messages:
    695
    Hi Dale; thanks for your input. Of course the real motive is find those tracks where the stewards are good at predictions and play overlays based upon their predictions. I'm about ready to save up twenty bucks and give it a try

    Please Register or Log in to view the hidden image!

    I'm testing the model every day. Following is all races that ran on the 22nd of December.
    Code:
    Calculated probability:
    Odds to one      1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
    Qty these odds   9 46 87 79 63 47 51 48 16 25 34 28 22 19 12 24 16  9  7321
    Predicted wins   4 15 22 16 10  7  6  6  1  3  3  2  1  2  0  2  1  0  0  0
    Actual Wins      7 15 23 14  3 11  7  6  2  3  4  2  0  1  1  2  0  0  1  5
    Accum deviation  3  3  4  2 -5 -1  0  0  1  1  2  2  1  0  1  1  0  0  1  6
    
     
    Last edited: Dec 24, 2005
  12. Dinosaur Rational Skeptic Valued Senior Member

    Messages:
    4,885
    Work on Blackjack, which has the lowest house edge of any game. The house take for horse racing is about the worst.

    If you want to beat the horses (which is not going to happen), you must work to develop an algorithm which accurately predicts each horse's probability of winning. Place and show betting are not worth analyzing.

    You do not look for the horse most likely to win. You try to find a horse which gives you a positive expectation. A horse unlikely to win might provide the best expectation of profit from betting on him.

    Ignore maiden races and other races for low quality horses.
     
  13. Dinosaur Rational Skeptic Valued Senior Member

    Messages:
    4,885
    Some of the events run by the country club set have hand books instead of a paramutual system. An old fashion hand book can be beaten.

    I do not understand why you pay any attention to the morning line. This is probably the worst source of data to use. It is just somebody's guess with very little analysis. The Tout Sheets are better.
     
  14. Vern Registered Senior Member

    Messages:
    695
    I don't know the reason, but since I've been playing with this model, the Morning Line low odds has won between 30% and 34% on a daily average. I chose the morning line for this discussion because it is the most pubically available free data set.

    What I really hoped to find out from this thread is how to apply the decimal portion of a predicted set of happenings at one odds level to the next higher odds level.
     
  15. Dinosaur Rational Skeptic Valued Senior Member

    Messages:
    4,885
    Vern: Note that 30-34% is approxuimately a probability of winning equal to one third. This is two to one odds. The morning line pick is usually the betting favorite who rarely pays as much as two to one (a $6.00 payoff).

    With the statistics you mentioned, you should not bet on the morning line at less than about five to two odds.

    Buy a Racing Form (or better, The Morning Telegraph) and start analyzing the Past Perfomance data. This is the starting point for analysis of horse races.
     
  16. CANGAS Registered Senior Member

    Messages:
    1,612
    In horse racing and in greyhound racing, nothing crooked ever happens. Insider information and doping are never factors. No race is ever fixed.

    There is a vast amount of ocean front property in Montana that is on sale for a bargain price.

    There is really a tooth fairy.

    If you drop a lead brick it will fall UP.

    Everything in this thread so far is such a deeply insightful analysis of betting on races that you can turn 20 bucks into a small fortune on the first try.
     
  17. Dinosaur Rational Skeptic Valued Senior Member

    Messages:
    4,885
    Cangas: Nothing I posted suggests that you can make money at the track, where the house edge is worst than any game I know, except for some one-armed bandits.

    Because of some addicted friends, I happen to have studied the subject quite a bit. It is interesting that a parimutuel system can theoretically provide bets which favor the player over the house. In practice, I doubt that it ever happens. If the House edge were 5% or even better 1.4% (like craps), a parimutuel system probably would provide some horses with a positive expectation. The betting public does not do a very good job of handicapping horses and they are the ones who control the odds.

    Your sarcasm was noted, but I do not think that many races are fixed. In my youth, I was very friendly with a bookie (Louie) who would accept bets from jockeys, owners, trainers, anybody. He ran a big enough book that other bookies laid off on him when they had too much action on a horse. Louie said that there were not enough fixed races for the insiders to make up for the bad handicapping they did on races that were not fixed.

    He would not accept bets on races at one particular minor track a few hundred miles from his location. This suggested to me that there are very few fixed races at the major tracks.

    Two or three times a year, Louie and a few other bookies ran hand books at events run by the country club set. He explained how you could beat hand books due to their giving different odds on the same horse at different times prior to the race. He did not care what happened at those events. It was a nostalgia trip for him. He had been booking bets prior to the existence of parimutuel systems, and enjoyed a day chumming with some of his old time friends who felt the same way. His regular booking operation paid off at track odds, and handled 50 to 100 times as much money as was bet at the horsey set events.
     
  18. Vern Registered Senior Member

    Messages:
    695
    My interest is still in the arithmatic. How do you apply the decimal portion of accumulated deviation between predicted wins and actual wins to the next higher odds level. There is some statistically correct way. I don't know the equation, but would like to find out what it is.

    I don't expect to make any money, but playing overlays can produce some fun times at relatively little expense.
     

Share This Page