As I mentioned in my introductory post, win probability is calculated by comparing the current game situation to past games where the same situation occurred. For example, if you punch in Blue losing a scout and Red losing a demoman at the mid-fight, the calculator reports the probability that Blue caps mid is 55% and the probability that Blue wins the entire round is 61%.
It arrives at those numbers by looking at the historical data. In all the matches in the dataset, it finds all the mid-fights where one team was down a scout, and the other team was down a demom! an. That particular situation occurred in 109 mid-fights. The team trading their scout for the enemy demoman ended up winning 60 of those mid-fights, and 67 of the rounds those mid-fights started. It is interesting to observe that the impact of the scout-demo trade is only 5%. The impact of the trade on the round win probability is 11%. This would seem to say that the demoman is more important after the mid-fight than during it.
What if Red wipes at mid? The calculator reports 100% chance of Blue capping and 100% chance of Blue winning the round. This certainly agrees with our intuition about how the game plays out. But 100%? Doesn't that say that it's impossible for Red to come back? The problem is a small dataset. In the dataset that I have, every time Red wiped, Blue won. That's why I also added a margin of error calculation.
Similar to the margin of error on a pollster's poll, the margin of error term is an indica! tion of how confident we can be in the results, given the numb! er of da ta points we had to work with. On the last example, the margin of error is 31% at the 90% confidence level. 90% confidence means that there is a 90% chance that the true odds of one team coming back from a wipe at mid is less than 31%. If a 31% margin of error seems huge, it is. Most polls of the kind you would see in a newspaper have 2-5% margin of error. A 31% margin of error means we only had a handful of cases in the dataset where one team completely wipes at mid. Compare that to the full strength numbers. Since every round starts out as a full strength battle, we have much more data to work with, and the margin of error is just 2%. But even with a 31% margin of error, the result agrees with our experience in this case: wiping on mid is bad, and you're likely to lose the round. Coming back from a wipe at mid is a rare event, and there's not enough data right now to get a better estimate.
The calculator is colorblind. Every calculati! on looks at both sides of the map: if Red is pushing last and Blue is down a scout, the numbers automatically include the times when Blue is pushing last and Red is down a scout. This is probably the most useful behavior for comparing situations and possible outcomes.
The only issue is that it doesn't show Blue's slight edge. In current dataset, Blue wins 53% of all mid-fights. Blue has the edge at mid on Badlands, Granary, and Freight. Only Follower bucks this trend, with Red winning 56% there. 53% is the same kind of edge we've been seeing ever since Valve published their stats. Your guess is as good as mine why Blue has an edge, but it certainly seems to be real.
At the time of this writing the dataset is 187 matches from #tf2.pug.na. Special thanks to Cinq for sending me the log files. I'm always looking for more data. Margins of error! will naturally decrease with more data.
probability calculator
No comments:
Post a Comment