Math and Max: A statistical analysis of "100 Matches with the Best Deck in Magic"

@womba said in Math and Max: A statistical analysis of "100 Matches with the Best Deck in Magic":

@maxtortion said in Math and Max: A statistical analysis of "100 Matches with the Best Deck in Magic":

I'm curious if anyone has been Game 1 Mulliganing against you, thinking you were on Dredge. I know I've done that against you once. I threw away a totally reasonable 7 because it couldn't beat Dredge, and you opened up with Blue spells! Learned my lesson.

@Maxtortion this is a common occurrence, more so in paper but to an extent in MTGO haha.

One day someone will get blown out against me by doing this, but if the last 9 years are any indication, it won’t happen any time soon unless the DCI has something to say about it.

@womba It was the best of times, it was the worst of times, he drinks a Whiskey drink, he drinks a Vodka drink...


Edit: I'm pretty sure the discussion of player skill deserves its own section. Basically, I think players tend to have an unrealistic expectation of a format's skill level. It's not like GPs with hundreds of players don't have bad players that make common and at times mind-boggling mistakes. Even pros can make mistakes repeatedly on camera. Part of what makes Magic so great is its complexity but the tradeoff is that perfect play is a hypothetical aim to strive for but never attain. I don't think the presence of 'bad' players is unique to Vintage or that Vintage has a disproportionate number of them.

That said, Vintage leagues are definitely a step down on the competitive ladder. More popular formats like Modern and Standard have two different league types with different entry fees and prize structures. The friendly leagues are much better as entry points to a format and the competitive leagues are much better for established pilots. Without that delineation, you get a mixture of experience levels in Vintage leagues. It doesn't invalidate data - nothing besides outright fraud and shoddy methodology invalidates data in my opinion - but it is a grain of salt. Again, I think that the best approach is a far reaching one that takes data from as many sources as possible and breaks down their pros and cons.

last edited by ChubbyRain

Nice job! A few remarks :

  1. the second table is the same pic as the first
  2. you want to say "The constant z depends on the desired confidence".. LEVEL, not "interval".
  3. about sample size : the things is, it's not exactly that small samples can't be used, it's that they can only be used for extreme stats, but as I say in my article at for such imba matchups/decks, you don't need stats to know that imbalance; and if the matchup/deck winrate is tight, which are the cases you'd be interested in, then it's not realistic to expect to find sample of a size and quality high enough to gives us solid stats to rate such tight margins. I've had an interesting chat with @Smmenen the other day and he pushed me to give a size of a sample that would be acceptable for me, I caved and gave an answer but really shouldn't have. It seems this forum has the means to assemble sample of sufficient power to have good to great chances to detect a 60% winrate (as in Shops in, and he seemed to insist that the skill, the decklists were stable enough, so the quality of the sample would be presumably quite good. So possibly for that kind of range you could use stats. Is it worth the effort, considering using the good statistical tools can be time consuming and that there are still so many ways to use and interpret them incorrectly ? To me it isn't, a 60/40% winrate I expect to be clear enough that if I play or observe the deck enough I'll get the idea, while learning things on the way. Seems like a better deal to me, since even a 60/40 matchup is quite skill dependant, generally.
  4. Therefore I don't think we should ask you to do a large-scale analysis.
  5. Please consider not relying on the concept of "statistical significance".
last edited by Timewalking


Can you elaborate on your advice not to rely on the concept of statistical significance?

@senor_bisquick There's a chapter about it in the link.


  1. Done. I had to copy and paste from my response in the other thread to a new thread. Obviously, I screwed up. Thanks for pointing it out.
  2. Yes, thank you and changed.
  3. Agreed. Ryan and I have been trying to include standard deviations based on the binomial model in our metagame breakdowns. If you look at the end of the piece, I mentioned that in the 400 match sample size, we had a confidence interval of +/- 5%. In the 277 matches with Oath, we have +/- 6%. 123 matches 'bought us' an additional ~2% of certainty. As for small sample sizes being usable, I think I made the point that "more data is better", not that small sample sizes are unusable. The null hypothesis for MTG matchups is that both matches are equally favored. It is much easier to reject the null hypothesis given a more skewed sample and the confidence intervals support that. Shops at Champs was to the casual observer clearly an above average deck given it's dominance of the top 8. The win rate and confidence interval augment that conclusion: at 54-64%, that's outside of the definition of average. The next best deck Oath does not have the same statistical support. The range of 48.5-60.5% includes 50%, so the deck might be average based on the confidence intervals. Taking the entirety of champs you reach a very limited conclusion about the field: Shops is above average, Paradoxical and Eldrazi are below average. Every other archetype is statistically average. And of course this is without other assumptions and arguments mucking the water.
  4. I am thinking it would be helpful. Again, I think the null hypothesis that matchup X vs Y is even is a valid and worthwhile result.
  5. I agree wholeheartedly and am familiar with the debate currently ongoing with in the scientific community. The point of this was actually not to establish conclusions but to convey the complications of deriving conclusions from statistics.

Thank you for the feedback. I will look at your blog post later tonight and am looking forward to it πŸ™‚

last edited by ChubbyRain

@chubbyrain I agree on the focus on specific matchups rather than on "vs the field". Much cleaner data there.

@timewalking I read your article and enjoyed it - would recommend it to others as well. I also tracked down David Colquhoun's article and stashed that into my file of articles I'd like to keep readily available going forward. It reminds me of my lecturer, who quipped "I could devise a screening test that is 98% sensitive, 98% specific, and wrong 98% of the time."

I am curious curious why you feel specific matchups would be better than against the field?

@chubbyrain The math tools we use are made for experiments conducted in the same context. Ideally they would query the presumed universal laws of nature and nothing else. When you test deck A vs B, even if no changes to the lists are made in the sample, then you'd still need to know that the players don't have different ways of playing the matchup. And there would be a multitude of parameters that could change in the sample A and B like their health, fatigue, etc. But we omit the human factors and still use the statisticians tools. If you test deck A vs the field you make the testing environment much more unstable, your field will change more than B, might be not representative of the "real" meta (not sure there's such a thing really πŸ™‚ ), and anyways the deck we face in a tournament are quite random.

Also on the personal level, I find matchup winrates, well, omiting that I'm not in favor on relying on stats in mtg, much more interesting for me personally. If I'm confident my deck beats X by a considerable margin, if I could have a relatively exact measure of that margin, I could use that as a tool to see how far I could go in weakening my deck a bit against X to help other matchups for instance. Since I don't expect to have reliable stats "ever" for tight matchups anyways, that's what I could use.

When playing against the field, I can try to use metagame shares (which are much more stable than winrates, or at least that seems clear to me) to chose decks and sideboard, but those aren't winrates, and also I base my strategy on the principle that only the players of top skill matter : if I'm not one of them I'll lose, if I am I'll win against the weaker players, so I'm interested in winrates against top players by top players, that's what of real strategical importance to me and by definition winrates against the field aren't of that nature. Sure neither are matchup rates typically, but they could be devised to be so, theoretically. (another reason why for me stats are so overrated in mtg)

@womba said in Math and Max: A statistical analysis of "100 Matches with the Best Deck in Magic":
Additionally, I had MULTIPLE OPPONENTS keep cards like Mental Misstep and Gush in against me, on the draw no less.

Good points added. It's generally not right to keep Misstep in but I figured I would add that Oath pilots should generally keep 1 (possibly 2) in post sideboard due to the importance of stopping Grafdigger's Cage. If they weren't on Oath, they probably have no excuse aside from "I just had so many Pyroblasts and Flusterstorms I didn't have enough cards to take out" in which case, again, no excuse. πŸ˜„

last edited by brianpk80
  • 19
  • 4442