The biggest Vintage event of the year just took place and I have ~350 decklists sitting on my desk. I'll be going through all of them as many notable cards like Paradoxical Outcomes, Leovold ("Butters"), Emissary of Trest, and Chandra, Torch of Defiance made appearances outside of the top 8 and there are some rather sweet lists. Rather than make everyone wait, here is what I have done so far:
Top 8 Lists: http://www.cardtitan.com/coverage
I recommend checking out Reid Duke's Paradoxical Storm, David Fleischmann-Rose's Odd Oathstill, and Kurt Crane's "Red Meta", as examples of innovative lists that did well.
*Except for Mukesh Ablack who we assume is on Colorless Eldrazi based on what his opponents told us.
As always, thank you to Ryan Eberhart, my partner in crime, for his considerable help.
Ryan Eberhart (@diophan) and I will be updating this post throughout the week but we wanted to disseminate data from the event as we process it. Currently, we have the metagame broken down by archetypes and subarchetypes.
Top 16 Lists
- Andy Markiton - Ravager TKS Shops
- Tom Metelsky - Grixis Pyromancer
- Ryan Glackin - Amalgam Dredge
- Hank Zhong - Esper Mentor
- Andy Probasco - White Eldrazi
- Roland Chang - Ravager TKS Shops
- Vito Picozzo - Jeskai Mentor
- Brian Kelly - Esper Mentor
- Brian Schlossberg - Ravager TKS Shops
- Jason Jaco - Eldrazi Tribal
- Jordan Kasten - Transform Dredge
- AJ Grasso - Gush Mentor Bomberman
- Nicholas Cummings - Belcher
- Lee Hillman - White Eldrazi
- Matt Murray - Sylvan Mentor
- Nick Dijohn - Smasher TKS Shops
The rest of the lists will be made available at some point, likely on EternalCentral.com.
Major props to Nick, the judges, and the rest of the tournament staff - they did a phenomenal job running the event.
Update: Link to decklists
Save the spreadsheet as a copy and you can make use of search Ryan and I built.
It's pretty clear that players (and Wizards of the Coast) have divergent opinions on the role of data in Magic. My opinion is that data is an imperfect representation of reality. I would much rather have as much data as possible, while understanding the limitations of such data. That said, I think we have reached a point where we as a community might be oversaturated with these weekly reports and the conversations that follow. Ryan and I are both committed to collecting Vintage data from as many sources as possible. What we remain split on is how often to disseminate that data. We could continue to publish weekly reports or we could combine them into monthly reports like this one. Please chime in the comments below.
Top 32 Lists
Xerox is what we are using to refer to former Gush decks - most are Young Pyromancer based, either with Delver or splashing white for Mentor. Some are playing Seeker of the Way... Fish decks refer to Bug Fish, Combo is kind of a grab bag for DPS decks - most combo decks have been absorbed into Paradoxical. Tom's winning Rector Flash list is the source of the 87.5% MWP on 9/23.
Thanks to Ryan for his considerable help as usual. Jonathan Suarez also helped with data collection for the 7-23 challenge and it was much appreciated.
I took the data from the last 3 Power 9s and combined it, looking to see what we could glean about certain matchups. There are several results that contradict commonly held beliefs about the format:
People need to stop insisting Storm and Oath beat Gush.
Here is the metagame breakdown for the May edition of the MTGO Power 9 Challenge:
5-2 or better:
- Collector - Dredge
- Call1Me1Dragon - White Eldrazi
- Montolio - Eldrazi Shops
- Pubert - Grixis Pyromancer
- Mr. Random - White Eldrazi
- BlackLotusT1 - White Eldrazi
- Lexor19 - White Eldrazi
- JdPhoenix - Blue Moon
- Diophan - Jeskai Mentor
- IcyManipulat0r - UW Mentor with Delver
- Ravager101 - DPS
- Footemanchu - DPS
- Deibler - Jeskai Mentor
- Sigaisen - Dark Sylvan Mentor
Metagame Breakdown and Win Percentages:
Note: Mirror matches were not included for overall win %.
Archetype vs Archetype Win Percentages and Sample Sizes (n = number of wins - for instance, Gush decks went 5-0 against Oath)
Subarchetypes as promised:
As always, a major thank you to @diophan , Ryan Eberhart , for his help with this (and condolences on the 9th place finish).
Edit 1: Top 16 decklists linked to above.
Edit 2: Updated the original post with Subarchetypes. We considered IcyManipulat0r's deck to be a "mentor" deck instead of Delver as it had a more expansive manabase (with Sol Ring and Mana Crypt) along with Mana Drains. Obviously, this helps emphasize that classifications are gray rather than black and white and there is no perfect system.
Edit 3: There was a rather odd bug in or google doc that affected a handful of matches in round 4 (for some inexplicable reason, the fill function skipped cell 43 and caused a misalignment of opponents to archetypes for 10 match results or so). This slightly affected the calculated win% against the field and we have updated the initial table.
Since this is still going on, I think it would be beneficial to break this down statistically. I started this as a reply but it reached sufficient length that I decided it deserved its own thread. The link to the original thread is here.
Max in the Shops Mirror
The model that best approximates a 'coin-flip' scenario in which there are two outcomes, determined by luck, with probability p, repeated n times, is a binomical distribution. Let us apply this to Max's experience with the mirror. We have the following parameters of n = 16 games and p = 50% or 0.5 (since it is a 'mirror'). Ignoring skill, Max should win
u = n * p = 16 * 0.5 = 8 games
Let's stop and double check that the result makes sense. You flip 16 coins and on average there will be 8 heads and 8 tails. Moving on to variance from that mean, we use the equation of
Var(sigma^2) = n*p * (1-p) = 16 * (0.5) * (1- 0.5) = 4.0 games
The standard deviation is typically more useful than variance and determined by taking the square root of the variance. The standard deviation is
Std Dev (sigma) = sqr rt(4) = 2
The final breakdown is:
Max's 15 wins in 16 matches is much higher than the 50-50 an average Shops player would obtain. Therefore, it would be pretty reasonable for a casual observer to say Max is an above average Shops pilot based on these results.
Max against the Field
There are two ways to establish the probability that can be used in these models. The first is theoretically derived, like we did in the first section. In the mirror the cards are assumed to be the same or close to the same, so if player skill is ignored, the theoretical probability of winning is equivalent to losing, or 50-50. However, if the cards are substantially different, i.e players are playing different decks, it is much more tenuous to assume a 50-50 win-loss record. You can make an argument for it: the tournament structure is such that a loss for one player equals a win for another, so the overall record of the field must be 50%. If you do so and exclude Max's Shops matchups, you get
Max's actual number of wins, 66, and actual match win percentage, 78.5%, are much higher than what we would expect given a 50-50 win rate. It would be pretty reasonable to conclude that Max is an above average player with Shops against the field, too.
How can we 'science' up the above conclusion?
Science is conducted through the scientific method: you make a hypothesis, conduct an experiment, then reject or accept the hypothesis based on the results. "But Max didn't have an hypothesis." In many cases, data are collected before an actual hypothesis is made. The default position is that of the null hypothesis, that there exists no statistical difference between two groups of data. Put in the context of this experiment, we are essentially taking the position that there is no statistical difference between Max's results with Shops and the theoretical results of an average player with an average deck (average defined as 50% win rate). That is, Max's results happened purely through chance and neither skill or deck selection played a role.
Rejecting the Null Hypothesis (Confidence Intervals)
Max has collected his data - Now we have to determine whether or not Max is good or lucky. And honestly, we cannot know for sure. If you flip a coin 10 times and it comes up heads 10 times, would you conclude that this was luck or something nefarious was in play? The odds landing on heads 10 times is theoretically (0.5)^10 or 1/1024. Alternatively, the coin could be weighted so that it almost always comes up heads. Both are possible, right? There is that one-in-a-thousand chance and weighted coins exist. Granted, in this example the coin is severely disfigured and would be readily apparent that it was doctored... Still, if someone says to you "I just flipped 10 coins and had 10 heads" with no additional information, what should you believe?
This brings us to confidence intervals. We know that with any type of probabilities, there exists a range of theoretical outcomes and that certain outcomes are more likely that others. What we have to determine is our threshold for error or alternatively our confidence in the results. Luckily, we did most of the work already by calculating the standard deviations. There is a statistical rule called the 68-95-99.7 rule that states the likelihood of a certain result falling within one, two, or three standard deviations of the mean. Those ranges are given in the above charts. If the above games were played by an average Shops pilot, there is a 68% chance that the Shops pilot would win between 6-10 games, a 95% chance they would win between 4-12 games, and a 99.7% chance that a Shops pilot would win between 2-14 games. Max won 15 games and so his odds of being an average Shops pilot based on this data set are <0.3%.
Dividing the number of wins by the number of games played gives us a match win percentage that allows us to compare different sample sizes. Doing so shows how win rates can vary dramatically based on limited results (and why looking at small sample sizes is unreliable, like @Timewalking suggested). We would expect an average Shops pilot to win 37.5-62.5% of their matches 68% of the time, 25-75% of their matches 95% of the time, and 12.5-87.5% of the time over 16 matches. Conversely, the confidence intervals for 84 matches (the number of matches Max played against the field) are much smaller: 44.5-55.5% for one standard deviation, 39.1-60.9% for two standard deviation, and 33.6-66.4% for three standard deviations. Statistically, more data is always better. Max won 78.6% of these matches, so again, it strongly suggests that Max is an above average Shops pilot and/or that Shops is an above average deck.
By convention, those in the medical field (and many other fields) tend to use the cutoff of 95% (2 standard deviations) as a statistically 'true' result. Max is well beyond that, so we can statistically conclude what most of us already concluded - that Max is not an average Shops pilot. We have a higher degree of certainty, of at least 99.7%, but it's much simpler mathematically to stop here for now.
What other meaning can we derive from the data?
There are really two other questions/observations that emerged from the thread concerning Max's article.
- Does Max's higher win rate in Shops mirrors (94% vs. 79%) suggest that Shops is actually a weaker deck against the field?
- Does Max's 81% win rate in total and 79% win rate against the field suggest that Shops is an above average (or good deck) in the metagame?
Let's start with the first as it is easier to address. The argument assumes that skill in one matchup is transferable to another, that the Shops mirror is inherently a 50-50 matchup, and that since Max won at a higher rate against Shops than against other decks, the skill-independent MWP of Shops is below 50% (making it a 'bad' deck). While considering assumptions is really important in interpreting data, it actually doesn't matter much statistically. The numbers are what they are: Max won 94% of matches against Shops in 16 matches and 79% of matches against non-Shops decks. The question is whether or not this discrepancy is real.
Is there a statistical difference between Max's results against Shops and Max's results against the field?
The second way of determining a probability (and by far the most common) is to do so experimentally. We don't know how many matches Max should win when we factor in his skill and his deck selection. How good is Max? How good is Shops? How good is Max with Shops? Again, we don't for sure, but one thing we can do is have Max play a bunch of matches with Shops to give us an experimental value for his win probability. Well, Max already did that so let's use Max's win rate against other decks as a starting point. Max won 66 of 84 matches, for an experimental probability (P because I don't know how to add a circumflex to the letter p) of ~79%. What is our confidence interval for this value? Well, there are several ways to calculate confidence intervals of experimental means based on sample sizes. Easiest one to use is a normal approximation interval or Wald method where the range is:
The constant z depends on the desired confidence level - for 95%, z is 1.96. Punching the numbers in, we get an experimental probability of 0.79 +/- 0.09 and a range of 70-88%. Max's win rate of 93.5% is outside of this range, implying that there is statistical significance in the discrepancy between the Shops mirrors and matches against the rest of the field.
Does a statistically significant result actually tell us what we want to know?
Now it's time to look at our assumptions. We assumed that
- Mirrors are inherently 50-50.
- Skills with a deck are transferable between the mirror and other matchups.
- Skill differences affect outcomes in other matchups to the same degree .
I can poke holes each of these arguments. The first assumption is that mirrors are inherently 50-50, but that ignores the fact that 'true' mirrors are relatively rare. Most decks are not 75 card copies of each other, and most classification schemes lump similar decks into the same archetypes. For Shops, this includes Ravager Shops but Stax, Rod, and other variants. Ravager Shops tends to destroy these other versions, which is part of its dominance within the metagame. Foundry Inspector breaks the symmetry of Sphere effects and is unaffected by Null Rod, the threat base is wider and lower to the ground (i.e. many creatures that can be cast cheaply), and the mana denial is much more effective against other decks with higher mana curves. Max went at least 5-0 against these 'mirrors' which arguably should be considered decks. If one assumes that the remaining 10-1 record was against other Ravager decks, that gives a win probability of 0.91 +/- 0.17, or a lower limit of 74%. This result is no longer statistically significant.
For the second and third assumptions, Max and I both stated that we thought the mirror tested different skills and was very skill-intensive (i.e. that the skill discrepancy with a deck went a long way to predicting the winner). The Ravager mirror does have blowout potential but many games develop into complicated board stalls with key pieces such as Walking Ballistas, Arcbound Ravagers, Steel Overseers, and Hangarback Walkers shutdown by Phyrexian Revokers. Oh, and Metamorphs, Wurmcoils, and Precursor Golems providing powerful threats to be navigated. Complex combat math is arguably the most valuable skill in the mirror, with sequencing less important. These types of scenarios are uncommon in other matchups and the combat math is much more simplistic as most opposing creatures provide few decision trees (most creatures are vanilia x/x's like tokens and creatures with abilities tend to be static like the lifelink of Griselbrand or triggered and predictable like Inferno Titan). Sequencing is more important for the Shops pilot who assumes the proactive role. Skill from the other side of the matchup is also minimally interactive - as Max said, either the opponent kills all your threats or deploys a massive trump like Blightsteel through Spheres and mana denial, or they don't and die. That is more draw and die-roll dependent than skill based.
I think that statistically significant results in this case point to a couple of possible conclusions. First, I think the most likely explanation is that the Shops mirror tends to be less variable than other Shops matchups. This doesn't require assumptions about the transferability of skills from one matchup to another. It actually assumes the opposite of assumption #3 in that it assumes matchups are influenced by skill to varying degrees. Max reached this conclusion as well. I think it is less likely that Shops is weaker than other decks in the field, because we have more premises that I find hard to logically accept to reach that conclusion.
Does this article indicate that Shops is an overpowered deck in the metagame?
Short answer is "No". That type of question is much better answered by our metagame breakdowns. Again, more data is better and you mitigate issues of player skill by having a much larger sample size. Applying the same statistical tests to this most recent Vintage Champs gives a win rate of 59% (+/- 5%). In this sample size of 404 matches played by 72 players, it's pretty statistically clear that Shops is a good deck. Is it the best deck? Oath is the closest of the other archetype with a win rate of 55% (+/-6%). Those confidence intervals overlap, so you can't statistically claim that Shops is the best archetype. The answer of course is "more data". When you look at results from the Vintage Challenges and other tournaments (taken collectively), ideally it paints a consistent and accurate picture of reality. That's how science works...you do radiometric dating of a bunch of radioactive minerals and when many different labs reach a consensus of 4.5 billion years old, that's what they put in the textbooks. Would people be interested in a large scale analysis of all available metagame data (in essence, a meta-analysis or the strongest form of scientific evidence in medicine and other areas of science)? I am willing to do this, but I would like confirmation that players would be receptive to the data.
Alright, back to one 100 match set played by one player. We can agree that Max's skill has skewed his results away from that of an average player. The question is what additional component arises from Max's deck selection. Again, we have to make various assumptions. We don't know Max's 'true' win probability with other decks, but he has stated that he has won roughly 70% of his matches in PTQ's. If we accept this figure as accurate and assume that this MWP is transferable to Vintage, and assume that PTQ's are comparable in level of competition to Vintage leagues, then we can use this 70% value as a theoretical probability. In this case,
The confidence interval has an upper limit of 79, which suggests with 95% certainty that Max's results are not just a product of variance. He won 82 games. If you exclude Shops decks, you are at the edge of statistical relevance (remember our confidence interval from that data set was 70-89). If you exclude true Ravager Shops mirrors and include Shops variants, you are back above statistical significance with a confidence interval of 72-88 MWP. Given the proximity to the limits and the assumptions required, I would not personally conclude from this that Shops is an above average deck in the metagame.
Hopefully this type of data analysis was informative and accurately conveys some of the challenges with regards to interpreting data. Questions and comments? Please, let me have them.
Number of Participants: 114
Top 8 lists
- Joe Brennan - Jeskai JVP Mentor
- Zohar Bhagat - Jeskai Nahiri Control
- Brian Kelly - Dromoka Gush Oath
- Brad Gutkin - Blue Moon
- Shawn French - UW Landstill
- Ross Pranjzner - Dark Jeskai Mentor
- Paulo Cesari - Jeskai Mentor
- Nick Dijohn - "Car Shops" (aka. Get outta my dreams, get into my car)
Rest of the X-2+
- Porterfield, Avery - Jeskai Delver
- Geras, Jonathan - Ravager TKS Shops
- Castrucci ,Sam - Ravager TKS Shops with Cruisers
- Lynch, Paul - Salvagers Oath
- Fleischmann-Rose, David - 4C Odd Oathstill
- Sees, William - UW Emrastill
- Dayton, William - Ravager TKS Shops
- Eberhart, Ryan - Grixis Therapy
- Ata, David - Ravager TKS Shops
- Sacino, Joey - BUG Fish
- Miller, Daniel - Moat Control
- Barkon, Daniel - Ravager TKS Shops
- Dail, Ryan - White Eldrazi
- Dobbin, Zach -Jeskai Mentor
- Waldron, Scott - Salvagers Oath
- Johnson, Richard - Salvagers Oath
- Difebo, Dominic - Jeskai Mentor
Metagame and MWP against the field:
Archetype breakdowns and MWP against the field
Note: Gush Oath decks were classified under the Gush archetype. Frankly, there is no perfect way to classify lists and it is unclear to us to what purposes you the reader apply these breakdowns. If you are concerned about B&R rationale, a very relevant question is "what percentage of the metagame is running Gush" and that is the question we chose to answer by putting them into the Gush archetype. Alternatively, if you are a Shops or Eldrazi player concerned with the Oath matchup, you would want to consider those 3 lists as part of the Oath archetype. The actual effect is small: plus or minus 3 out of 114 players is a 2.6% change and the MWPs are virtually identical at 61.0% (Gush), 62.5% (Oath), and (61.1% Gush Oath). I wanted to mention this in the interest of transparency since we only have one basket in which to drop these deplorables.
Archetype vs Archetype win rates
Mistakes? Typos? Comments? Let me know. As always, thanks to @diophan for his help on these endeavours. Additional thanks go out to @EEMagic for running an excellent event at a great venue and supplying us with the lists and WER files. Please support them by attending EE6 or watching the coverage if you can't make it.
At the request of Andy, I'm reposting here:
The restriction of Thorn of Amethyst and Monastery Mentor took effect September 1 2017. That means we are over 6 months out from that restriction and I think it is worth looking back at how effective those moves were. As many of you know, Ryan Eberhart (@diophan) and I spend quite a bit of time collecting metagame data from major paper events and the MTGO challenges. We do this for a couple of reasons. Personally, I use this data when it comes to creating new decks. The version of Snapcaster Control that I've played in the last three challenges was heavily influenced by what I saw from our challenge data. The prevalence of Shops and Planeswalkers in both Oath and Xerox (i.e. cantrip heavy blue decks) motivated me to shift the removal suite to Lightning Bolts and Fiery Confluences instead of Swords to Plowshares and Balance. This was further justified by an absence of Eldrazi and Merfolk decks in the format. Honestly, that was my primary reason and hope when we started collecting data: that what we gathered would be used to promote innovation in a small format like Vintage.
Alternatively, Ryan and I wanted to provide an accurate picture of the Vintage metagame for use in discussions involving the Restricted list. Much of what we read previously tended to be hyperbolic, opinionated, and poorly reasoned. We hoped people would use our data in forming conclusions like scientists or researchers. In both cases what we wished to happen didn't actually happen. Most responses to our posts consisted of hyperbolic, opinionated, and poorly reasoned arguments, just now with cherry-picked data. There was very little commentary on trends and how to combat them, no brewing of decks. We went from posting results weekly after each Challenge, to monthly aggregations of the previous month's events, to not posting or gathering data from February. In effect, Ryan and I burnt out, on both playing Vintage and collecting data about the format. We asked for help and nothing really materialized. The reason I'm bringing this up is that I don't know if we will continue this in the future. So if you do find this beneficial, please let us know and considering helping out if you play Vintage on MTGO. The Challenges continue to be excellent EV, with the top 32 (basically any 3-3 and several 2-4s) making their entry fee back. Power is affordable - a set of VMA Power 9 costs less than 100 dollars. Complete decks range from 120 tix for Dredge, 300 tix for DPS, 500 tix for Ravager Shops, and 700 tix for UWR Mentor or UW Landstill. Which serves as a pretty good segue into the next section...
Paper vs Online Metagames
We hear a lot of comments concerning real or perceived differences between these two metagames, often in the context of B&R discussions. While I appreciate that players may play Vintage in widely divergent paper metagames, that doesn't invalidate data collected in other metagames. At the end of the day, the DCI is going to base their decisions on the data they have available. This likely is limited to the large sanctioned events of European, North American, and Japanese Eternal Weekends, along with the results from MTGO Leagues and Vintage Challenges, so that's where we've focused our efforts. And, frankly, the MTGO metagame has several advantages compared to paper Vintage. The cost of decks is lower, even considering proxies, which allows players more freedom in deck selection. Events are more frequent and typically larger than their paper counterparts. We are looking at four 40+ person events whereas a local tournament may have one monthly event with 17-32 players. This gives us a much larger sample size from which to draw conclusions. And finally, players on MTGO tend to do very well in paper tournaments such as the 2018 North American Champs. Winner Andrew Markiton (MTGO: Montolio), finalist Rich Shay (The Atog Lord), Patrick Fehling (Clone9), Brian Kelly (brianpk80), and Eric Vergo (caggii) all are regulars on MTGO.
Before the Restriction
The Gush and Gitaxian Probe restriction took effect April 24 2017, so we used the May through July challenges to establish a baseline. Individual events can be found by searching TMD, but the compiled data is available here.
Following the Restriction
We changed our spreadsheets when we moved to monthly reporting. It allowed us to do a month-by-month breakdown of events. Note: February's metagame breakdown is drawn from the Top 32 results of that month's challenges. As mentioned previously, we didn't do our usual data collection for that month. Also, January is missing an event in which Ryan and I were unable to participate.
As can be seen by the monthly breakdown, October and November are dramatically different from the other months. The most likely explanation is that the proximity of these events to the North American Vintage Champs altered player attendance and behavior. North American Vintage Champs was October 19-21 and Ravager Shops was absolutely dominant. It met in the finals, won the tournament, placed 5 decks in the top 8, 11 decks in the top 32, and had a 58.9% win rate against the field. Yet on MTGO, Shops portion of the metagame actually fell. Among many players, there was concern that a Shops restriction was imminent, so they played other decks, leaving Shops as the "best deck" primarily played by those unfamiliar to the format. Many established players flocked to the deck that supposedly "beat" Shops, Inferno Titan Oath, as Oath's percentage of the metagame tripled from 6.4% in September to 19.0% in November. And still others went next level the various Mentor/Xerox decks that tend to beat Oath. Those decks put up an impressive 63% win rate in October and November. Now I'm not in the habit of ignoring data, but data should make sense. If it doesn't, you have to wonder what factors might be influencing or introducing bias into your study.
If you exclude the October and November like we did above, there is a remarkably consistent picture of Shops' dominance.The combined results show a 59.0% win rate, virtually identical to the 58.9% win rate at Champs, slightly decreased from the 59.2% win rate in the pre-Thorn metagame. The metagame share is slightly decreased but trending upward. These trends seem to hold so far in March, as you can see below. Shops has a 31.4% metagame share and a 62.1% win rate. In my opinion, the results from October and November appear as outliers rather than a true indicator of Shops place in the metagame. However, one of the reasons to write these in-depth reports is to solicit differing opinions, similar to peer-review. I invite whoever is so inclined to chime in below with their thoughts. If you feel this is some sort of adaptation by the rest of the metagame, I am curious to hear what you think that was and why the metagame revert back to its previous state.
What beats Shops?
For those that do not follow other formats, Standard underwent several bannings in January. Ian Duke's explanation of those bans is well worth a read as it provides useful insight into WotC's reasoning and approach to B&R decisions. Ian spends quite a bit of time discussion the matchups of Standards top 2 decks, Temur Energy and Ramunap Red, and how these decks have a favorable matchup profile against the field, suggesting that the metagame is unable to adjust. Let's take a look at Shops' matchup profile since September:
With November and September removed:
Ironically, Shops only "bad" matchup (and I admit to being a bit lazy with the statistics here - if you want the raw data and the sample sizes, it's here), is the "Other" category where we throw decks that don't fit into other categories. Apparently, the Monored Hate deck with Null Rod and Ensnaring Bridge went 5-0 against Shops in October and November... Outside of what are essentially rogue decks, Shops either has a good matchup of >55% or is essentially even (between 45% and 55%). This includes Oath, which is Shops' worse matchup but only at 47.5%. Decks that were traditionally thought to be good matchups, like Dredge and Landstill (the most popular variant in the"Blue Control" category) actually end up struggling against Shops.
The goal of this post isn't to propose specific actions: it's to establish the need for such action. The previous restriction of Thorn of Amethyst has not discernibly altered the win rate or metagame share of Shops in the MTGO Vintage Challenges or in the NA Vintage Championship. If such action was indicated then, it holds that additional action is indicated now. Of course I have my own opinion on what I think should be done. However, I want to allow some time for players to read and process this. Comments, thoughts, and opinions are welcome and encouraged.
Edit: Added Archetype vs Archetype Win Rates with the Champs months removed.
Data from the month of March
Major props to twitch user k0dydraven who has figured out a way to import round results from mtgo, saving us a lot of time.
Edit: Also props to Ryan @diophan XD
There has been quite a bit of change recently in the Vintage format. Wizards has been more active in managing the restricted list and printing powerful Eternal relevant cards like the Delve spells, Dack Fayden, and Monastery Mentor. After the restriction of Lodestone Golem, I wanted to take this opportunity to look at how the metagame evolved following the removal a key card. I felt that while players understandably will have different views on what the Vintage format should be like, we should also have as much information as possible available to us that we can use to construct informed opinions and arguments going forward. Ryan Eberhart (aka @diophan) and I have been collecting and disseminating data from MTGO Power 9 events, but we have also been collecting data on the Vintage Dailies and paper tournaments around the world. I would like to share with you now the data we have collected on the MTGO Daily Events since the Lodestone Golem restriction took effect on April 13th (paper results will be following shortly).
We have classified decks according to the following archetypes and broken them down further into sub-archetypes in an effort to more accurately convey the metagame.
- Gush - If Gush was a primary component of a deck's gameplan, it was put into this category. We then broke this down essentially by win condition: Delver, Mentor, Pyromancer, Combo (Doomsday and Gushbond), and Other (Thing in the Ice or Vault/Key/Tinker, mainly).
- Shops - The Shops archetype was obviously hit hard by the restriction of Lodestone Golem and went through quite a transitional period. Over the last three months, the archetype has reestablished itself by turning to Thought-Knot Seer as a replacement for Golem. The most successful build has been the Ravager TKS deck though other lists have incorporated TKS and put up results. A third category includes the non-TKS Shops lists but these have been a minority of lists and slanted towards April.
- Eldrazi - An archetype that emerged from the LSG restriction, the most popular variant of the archetype has been White Eldrazi which pairs the colorless creatures with White Hatebears like Thalia and Vryn Wingmare. A minority of decks have fully embraced the tribal element of Eldrazi, i.e. Jaco-Drazi.
- Dredge - Divided by sideboard strategies based on whether they intended to combat opposing hate head with Creature, Enchantment, and Artifact removal or Transform post SB. The former approach remains the most popular.
- Combo - Predominantly Dark Petition Storm but also a few Belcher decks and odd-balls (like Two-Card Monte and Rector Flash)
- Blue Control - The more controlling remnants of the Mana Drain pillar like Landstill in various colors and Blue Moon.
- Big Blue - Less controlling artifact-based combo decks like Control Slaver, Painter-Grindstone, Academy combo.
- Oath - If it contained maindeck Oaths, it found it's way here. Variants include Salvagers Oath, Control Oath (Fenton Oath with Griselbrand as the primary win condition), Combo Oath (i.e. Burning Oath), Oathstill, and other Oath (odd Oath).
- Null Rod - The various Fish decks that have historically belonged to the Null Rod Pillar. These types of decks are almost nonexistent on MTGO but include BUG Fish, Hatebears (White Trash and 5c Humans), Merfolk, and Other (in this case, a monored 8 Moons deck).
We kept track of 4-0 and 3-1 finishes and used these to create a category called Total Wins ( # of 4-0 finishes * 4 + # of 3-1 finishes * 3). This more heavily weighted the 4-0 finishes, from which we calculated the % of Total Wins for that archetype/subarchetype. Comparing the totals reflects performance - a positive Delta % Total means the deck disproportionately put up 4-0 finishes. However, the sample size is not really large enough to infer much from this.
There is a function in Google Sheets that allows you to count unique entries within a data set. We used this to calculate the number of unique players both overall and within archetypes/subarchetypes. Over time, you would expect the majority of MTGO Vintage players to put up a finish so this is a rough indicator of the total pool of MTGO players that participate in these events. It also helps to remove repeat performers like Rich Shay or Montolio as they can potentially skew results for certain archetypes. It should be noted that players can switch archetypes/subarchetypes so some players will be counted twice or more as you breakdown the data.
That out of the way, let's get to the results.
The true value of this data in my opinion is how the different archetypes and sub-archetypes have changed over time. Ryan and I broke down these results by week and displayed them on several graphs.
As we can see, the trend of a declining metagame prevalence for Gush has not continued (did anyone aside from @Smmenen think this would be the case?). Metagames tend to be cyclical by nature - people build their decks to combat specific decks and that focus shifts with time. Gush was the clear target that emerged from the Lodestone Golem restriction and decks adapted to combat Gush, with a surge in Sudden Shocks, Sulfur Elemental, Thorns, and Defense Grids. As the field diversified, the narrower hate-cards were supplanted by more broad removal (you don't want to be holding a Sudden Shock against a resolved Thought-Knot Seer) and Gush decks themselves diversified to dodge the hate with these decks turning to Tendrils, Pyromancer, and Thing in the Ice. At its heart though, Gush is a control deck with a powerful card advantage engine - it just needs to draw into the right cards for the field. A key development was the adaptation of Cabal Therapy and Baleful Strix by Grixis Pyromancer (and ultimately Esper Mentor) as a means of competing with Eldrazi, Cavern of Souls, and the broader field. This has lead to a resurgence in Gush, decline in Shops and Eldrazi, and ironically the metagame percentages have returned to roughly the same percentages as the start of April. It remains to be seen how the metagame will adapt but I hope this look at it has been interesting. Keep in mind, all statistical work is subject to variance and the samples sizes are low (though we have a comparable number of lists to Paper over the same time span). Questions? Comments? Suggestions? Have at them and I hope we can get a good discussion going.
Correction 1: We noticed an error in our calculations that affected the Sum of 3-1 Finishes (it did not count Eldrazi) and as a result, the percentages were high. We've had an issue with Google Sheets where the formulas we write do not appear to "fill" properly, randomly skipping certain cells...This could be an issue with us simultaneously trying to edit a sheet. This specific instance could have been human error (aka I screwed up), but we really don't know. In any case, the best thing to do is post a correction explaining the error and fixing the data. The first table has been updated and should be correct now. Other charts were unaffected as they did not use the "Sum of 3-1 Finishes" in the calculation.
Bah, it's four in the A.M. Here's some data.
Top 8 Results
Of note, the Paradoxical lists with the asterisks contained Monastery Mentor as their primary win conditions. Neither list was the standard 4X Mentor Outcomes, but rather eccentric lists from players such as @iamfishman and @brianpk80. These lists contained 2 Mentors each, with Brian running one in the SB "to come in against other Mentor decks". If counted as part of the Mentor archetype, that brings the total up to 28.4%.
This breakdown is based on archetypes and not tags. The actual percentage of Mentor in the metagame is slightly higher. Unfortunately, data collection has been inconsistent as Ryan and I have missed events and relied on others to help us. It's a significant amount of work and we are immensely grateful to @desolutionist and others for that help. However, we don't have tag breakdowns available for every event. The percentage of Paradoxical Mentor is typically low, normally 2-4 players on a given day. A reasonable approximation would put the percentage of Mentor at about 25% of the metagame, or on par with Shops. Again, these decks aren't necessarily focused on Mentor and the number of copies is pretty variable, but since Mentor is currently on the community's radar for restriction, it's best to keep the metagame saturation in mind.
Archetype vs Archetype MWP
The color codes are as follows. For win rates, green corresponds to >50%, red to <50%, and yellow to =50%. The problem with establishing a range is that the sample sizes (and therefore uncertainties) aren't consistent. We expect much more variance from the 3 matches Eldrazi had against Blue Control than the 171 matches Mentor had against Shops. The second table shows those match breakdowns, with green corresponding to >50, yellow 25 to 50, and red <25 matches. I'll leave it to the readers to discuss the implications of this.
Trends in Major Archetypes
Just to show how variable win rates and even metagame percentages can be on a weekly basis. Also, it looks that Shops is trending upwards in metagame percentage while Paradoxical is trending down.
It's 4:00 am. I'm going to bed.
Apparently this card didn't get a spoiler post (unless I missed it). Time to rectify that.
Everyone has their own process for evaluating cards. Some people look to draw comparisons to existing cards. Personally, I hate that approach. Very few cards are functionally equivalent and therefore justify comparisons. What do I mean? Disenchant and Fragmentize have the same function (destroying artifacts and enchantments). Disenchant was a playable card in Vintage and because Fragmentize did the same thing, it makes sense to compare these cards to establish whether or not they are playable. In the end, it was clear that saving a mana was worth the timing and targeting restrictions. Jace, Vryn's Prodigy is not functionally equivalent to Snapcaster Mage or Merfolk Looter, and so evaluation by comparison missed the mark on the card. It makes much more sense to evaluate Jace in the context of how it functions first before any sort of comparisons are made.
Why bring this up? Well, it's very tempting to view The Antiquities War as a bad Tezzeret baby. After all, the card is combination of Tezzeret's abilities in Enchantment form. You do that, and you end up quickly dismissing the card. Tezz the Seeker mostly sees play because it searches up Time Vault and wins the game that way. Tezz Agent of Bolas doesn't really see play at all. Tezz, Antiquities War doesn't serve as an obvious upgrade of either and so many players dismiss it and you end up without a Spoiler thread on TMD.
Let's step away from comparisons and focus on what the card does: In an artifact heavy deck, it impulses for 2 turns then wins the game. It's immune to Revoker effects, can't be attacked by creatures, doesn't require that much set up (it digs up 10 power of attackers. It does get hit to Pyroblast but then most of the cards you are running in Blue suffer from that (well, not Karn...more on him later). That's...not bad right? Suspend 2: win the game for 4 mana with a couple of artifact impulses? I thought not and so I built a deck around this and streamed it. If you missed that (I apologize...the videos get mangled due to twitch's copyright filter), to summarize: the card was a very solid win condition in the Thoughtcast, Mox Opal, Seat shell. It's immune to rod, you can drain into it pretty easily, and it fuels itself. It's less powerful than Paradoxical Outcome but the cards are complimentary - If you lack the artifacts for PO (which happens a lot), TAW will find some for your next turn. After you've PO'd, dumped a ton of artifacts into play, then passed back with counterspells to stop your opponent for one more turn, TAW wins the game. Synergy. I frequently had issues with Tendrils being a dead card when not comboing out. Or drawing a Blightsteel with my Tinker and then hating life. Or getting my Time Vault Dacked. TAW doesn't have those issues and therefore provides several advantages to the Thoughtcast PO deck.
My next stop with the card will be that list above (I was trying to make Damping Sphere work too, but I feel those go in different decks). I hope to stream it on Friday, but until then I hope this prompts some discussion on a card that has until now flown under the radar.
Edit: here is the rough draft of the first list
Yay, more data! Thanks again to @diophan and twitch user k0dydraven for their considerable help in compiling these.
Top 32 lists are available on WotC's page: https://magic.wizards.com/en/content/deck-lists-magic-online-products-game-info
Gentle reminder though that B&R needs to go to the right thread. (Believe it or not, it's possible to speculate on the impact of tech like Shattering Spree, Damping Sphere (Brian played it as a one of in his top 4 list in the last challenge), and the Misstep-less PO list from VSL competitors Ecobaronen and Lampalot), without bringing up possible bans or restrictions.)
@seksaybish Your main point is of questionable significance as Top 8's are essentially single elimination 8-mans in which pairings and luck play a disproportionate role.
I would also challenge you to see past the single decks to the metagame as a whole. Going through the tournaments you've cited, the events not dominated by Gush tend to be dominated by Anti-Gush Thorn decks.
EW - 5/8 Gush or Thorn decks
EE6 - 8/8
JanP9 - 4/8
FebP9 - 5/8
Mar - 7/8
Total - 29/40 = 72.5%
Seeing as Gush and Thorn decks are roughly 50-65% of a given metagame, this indicates an overperformance of these archetypes. Why is that? I would hypothesize that because these decks cannot be attacked on the same axis, it creates a polarized two-deck format. Gush requires a slim manabase, efficient though narrow counters, and is relatively immune to spot removal. Eldrazi and Shops require a robust manabase, a difference set of answers, and copious spot removal. Decks built to attack either Gush or Thorns must dodge the other, which becomes increasingly difficult to do over the course of larger events. This is an environment that is not rewarding of innovation and frankly boring to those that play it frequently. Success is largely matchup dependent once you reach a level of competency with your deck, which regrettably most Gush players have not achieved, leading to there being a large contingent of poor Gush pilots dragging down the match win % to "acceptable" levels.
This past Saturday, 115 Vintage players from the Northeast US gathered for one of the marquee events of the year: the TMD Open 18 aka Waterbury. Ryan and I were unfortunately unable to make the trip, but Ray graciously scanned the decklists and sent them to us so we could give them the usual treatment. By all accounts, @iamfishman did it again, throwing an excellent tournament the exemplified what Vintage is in the Northeast. There was trivia, there were giveaways, there was bingo, and there was beer. After all was said and done, Jarad Demick on Ravager Shops took home first place (the top 4 split the money and played for the trophy, as I understand it).
On Deck Classification
We continued to use the Archetype/Subarchetype scheme along with the breakdown by Tags. Given the popularity of Paradoxical Outcome, we created an archetype which we broke down by win conditions and structure. PO Mentor describes builds that run multiple Mentors as the primary win condition, similar to Kevin Cron's and Stephen Menendian's list from Champs (there were none present at the event). PO Storm describes the broken lists running Draw 7's, Chrome Mox, and LEDs. These decks often used a mixture of win conditions along with either Tendrils or Brain Freeze. PO Tezz describes the classic Vault/Key and Tinker builds - less all in than Storm and with value creatures like Trinket Mage and Snapcaster Mage. Other PO was generally combination of other archetypes. This is a work in progress...
There were several hybrids. Brian Kelly's Emrakul/PO/Gush deck was classified under Paradoxical. PO Oath was classified under Oath. Salvagers Gush Oath under Oath. The Oathstill decks were also included under Oath. If this seems arbitrary, it is. We are open to suggestions on this front but that's the nature classifications. It's why we have the tag system, so that we can attach multiple descriptors. The lists we included in "Other" can be viewed here along with all our raw data and calculations.
Top 8 Decklists
- Jarad Demick - Ravager Shops
- Jonathan Geras - Ravager Shops
- Travis Compton - Unmask Dredge
- Craig Dupre - PO Storm
- Raf Forino - Blitzkrieg Shops
- Akash Naidu - Powered Colorless Eldrazi
- Andy Probasco - Jeskai Mentor
- Andrew Farias - Jeskai Mentor
Congrats to all members of the top 8 and thanks to Ray for running an excellent event and providing us with the raw data. As always, I am indebted to Ryan Eberhart for his considerable help with our analysis. Questions? Comments? Please don't hesitate.
Expect the commentary for these events to get more brief due to their regularity. More in-depth analysis is now possible as we are able to aggregate the results from multiple events, so we'll probably do a monthly "State of MTGO Vintage" with those results. This past event both Ryan and I were able to play, Ryan coming in second with Jeskai Mentor, and me finishing in 9th after an unfortunate misclick on Stream was followed up by mana screw. Despite that, I walked away with 50 extra play points and 10 treasure chests (which would sell for ~24 tix). This is a considerable improvement over the old Power 9s in which I would have gotten my entry fee back, but no more. The EV for these events is excellent and if anyone has online Power, I highly encourage you to participate. Full details are here.
Link to the top 32 decklists: http://magic.wizards.com/en/articles/archive/mtgo-standings/vintage-challenge-2017-05-28
- Pedroj - Foundry Shops (Tangleless)
- Diophan - Jeskai Mentor
- Maegwiny - Foundry Shops
- Anssi A - Jeskai Mentor
- Isomorphic - Academy Combo
- Mlovbo - Ravager Shops
- Mr. Random - Foundry Shops (More Vehicles - No Ravagers)
- Hermoine_Granger - Jeskai Delver
I wanted to discuss two trends that I've been noticing in the metagame. No, not whether the DCI was correct or incorrect - in my opinion, it's far too early to tell. No, not based on this event, Shops is the best deck - a 49 player event is too small of a sample size to draw such conclusions. These trends have to do with personal observations of the metagame.
Several Shops players have started to cut Tangle Wire from their builds. This started with Jazza, who top 8'd the past 2 events (winning one outright), while finishing in the top 16 in this event. Pedroj adopted this strategy and won the tournament. I'm not saying that this is the correct direction for Vintage Shops players to take, but it certainly has merit and should be noticed by the Paper community.
Similarly, I've noticed that I have been less than happy with Jace, the Mind Sculptor in Mentor. I've found him to be poor against Delver, Shops, and Eldrazi where he is both difficult to cast and easily pressured. I've found him to be mediocre against Paradoxical Outcome (on both sides of the matchup). He is a sorcery speed 4 drop that only Brainstorms the turn he is cast, and savvy opponents will generally him resolve then aim to end the game on the next turn while their opponent is tapped out. I haven't been alone in this - if you look at Ryan's deck, you'll notice that he does not run Jace, running an extra Mentor and Snapcaster Mage. My approach was different, running Gifts for value where it enabled some pretty decent lines with JVP and Snapcaster Mage. It also allowed me to sit back on Mana Drain and operate more at instant speed, which improved my matchup against the two Outcomes opponents I played. Who's approach is right? Again, I don't know. This is meant as food for thought.
Major thanks to Twitch user ValanLuca who helped put in round data while I was streaming. This allowed me to suffer through Choice Chamber in between rounds, hopefully providing more entertainment for my viewers than entering numbers into a spreadsheet. Thanks and congratulations to @diophan. And lastly, a very happy birthday to Dragonlord @brianpk80. Questions? Comments? Have at it.
I apologize for the personal attacks - you are right that that rant has been building for some time. I am tired that the focus of these reports remains evaluating the banned and restricted changes or remarking on whether or not this event shows a unhealthy metagame. Did anyone congratulate Jazza or any of the other contestants outside of the initial post? @Serracollector , @desolutionist , and @MSolymossy looked like they were at the beginnings of a productive discussion on PO's poor performance and comparison to FoF and Gifts in similar shells, but that obviously never materialized.
Yes, the format's health is an important issue and I believe everyone should be entitled to their opinion. However, a single weekly event should do little to affect one's view of the metagame. I even tried to start those monthly reports to provide a medium for a more holistic metagame review - that's part of the frustration. We could go through this every week. "Shops won again; clearly is overpowered." "Mana Drain and PO won, the format is fine and everything is glorious". "Oh, Shops won again, time to restrict Shops." Magic is a game of variance and there will be swings...there will be anomalies. That's the nature of our hobby.
So, let me phrase my objection in what I hope is a non-confrontational form. The previous iteration of The Mana Drain had a policy that limited banned and restricted discussions to a specific forum. I imagine they ran into similar issues like we've encountered here. Even if @Brass-Man does not want to take a similar step (and I'm not saying he should), I think that we should try and limit such discussions to threads specifically concerning them. While we call these "metagame reports" they are really snapshots of a specific metagame and not necessarily indicative of the overall trend.
My actual thoughts on the card (rather than my campaign to fight against bad comparisons in card evaluation) are:
Let's start with general concepts when it comes to evaluating planeswalkers. First, individual planeswalkers have significant diminishing returns with additional copies. The first JTMS might be the best possible walker you could possibly be playing at that moment. However, the second JTMS is either FoW fodder or a card to be brainstormed away. Not necessarily a dead card but certainly not as good as an additional walker you could cast and have an impact on the board, such as a Dack, Teferi, or even Arlinn. This is the first half of the ... "Superfriends theory" (I'm making up names for these concepts as we go...).
The second half is that, by the nature of having several abilities, planewalkers are often capable of serving multiple strategic roles. They are not equally adept at performing these roles. To continue with JTMS, I think it is generally accepted that his Brainstorm ability is his most powerful mode. This gives Jace the primary role of a card advantage engine. His other roles are as a win condition (through Fateseal into the ultimate) and removal (the unsummon). However, if forced into these roles, Jace is suboptimal. The Modern format exemplifies how the roles Jace, or any walker, fill may be inadequate given a format's constraints. Consider Teferi, Hero of Dominaria now. Teferi also has a primary role as a card advantage engine, but is weaker than Jace as drawing a card is less powerful than Brainstorming. At the same time, Teferi is a more resilient card advantage engine given his starting loyality and uptick to draw. Teferi is also the better removal spell, as his -3 can target noncreatures, effectively puts the opponent down a card, and does not allow the opponent to recast the target the next turn. Now it might be tempting to debate which is better for Vintage by focusing on these roles, but it's not of incredible importance when it comes to deck building. Just like the diminishing returns of individual planeswalkers incentivizes more planeswalker diversity, the variable roles of planeswalkers also encourages more diversity. It means I can use JTMS as a pure card drawer, knowing that Teferi is in my deck to answer the random Oath of Druids or Planeswalker the opponent might be able to cast. Or that I will draw a Chandra or Nahiri or Arlin to actually close out the game. A name for this: "role optimization".
Because of the Superfriends theory, I tend to spend much more time looking at the strategic implications of new planeswalkers and looking for synergy than I do comparing them to existing planeswalkers. So let's do that now. I think it's pretty clear that Tezz 3.0's best ability is his 0. His +1 ability combined with his starting loyalty makes him very difficult to kill the turn it comes into play in modern formats. If upticked, you are looking at 6 loyalty and a 1/1 flying blocker. It makes it likely that its controller can untap and start to take over the game. Now, this point is somewhat moot in Vintage and Legacy, where I think the majority of planeswalkers that are removed are either countered or Pyroblasted, but I try not to have tunnel vision for just Vintage, as you never know when interactions and tech from one format might find their way into Vintage. Most recent example is "Teferi + Search for Azcanta". In any case, I would classify Tezz as a strong and resilient pure card advantage engine. I think his capacity as a win condition is negligible and he really shouldn't be used for this purpose.
Now what cards work really well with Tezz? Again, I tend to consider other formats when evaluating cards and I think Tezz is going to be a format defining staple in Standard. This is mostly due to the rather absurd synergy between Tezz and Karn, Scion of Urza. Karn's ideal role is pooping out constructs, making him a midrange threat or control finisher. It just so happens this mode generates Artifacts for Tezz's 0 and protects Tezz from being attacked, allowing Tezz to function as a pure card advantage engine without having to use his +1. Speaking of the +1, if drawing cards is not necessary at a certain stage of the game, assembling Thopters is much more impressive when those Thopters are pumping up Constructs. I'm actually considering buying into Standard on MTGO to explore this interaction, as it seems very powerful in the context of that format. In Vintage, it might not be powerful enough, but it's something to keep in mind in trying to find shells for Tezz.
The other card that works really well with Tezz (and is actually Vintage-related) is Dack Fayden. I don't think this has been discussed in this thread, but it's something that immediately jumped out at me when I looked at Tezz. Dack's -2 ability has the effect of both increasing the amount of artifacts on your side of the board, along with ramping from 3 to 5 mana to cast Tezz the next turn. Dack's +1 also is more powerful with raw card advantage and Tezz generates that more quickly than other planeswalkers. Taken with 4, I can see a powerful Vintage shell forming around Dack, Karn, and Tezz, and that is indeed my starting point for playing this Tezz in Vintage.
Note, I didn't really answer whether I think this card is playable. I just tried to arrive at the ideal starting point for a deck that would use this card. If I think a card is unplayable, it will generally be because I can't find a shell that really utilizes it after looking at its strategic roles and synergies. The ultimate determination of what makes a card playable in my opinion, I save for testing (which I plan to do on stream).
If you found this approach interesting, please let me know and I will try to talk through my evaluations of other cards.