Measuring Archetype "Inequality" or metagame balance

I was looking at the Vintage Challenge results for October, and noticed that four different archetypes had between 10-20% of Top 8 appearances (with a 5th, Survival at 9% of the metagame), but none exceeded 20% of Top 8s.

That's the first time all year those two conditions have been met. There hadn't been a month all year where at least one archetype wasn't more than 20% of Top 8s until October.

I was wondering if the mathematically inclined might be able to propose a formula that could measure 'metagame inequality,' much like the Gini Coefficient measures income inequality.

It would have to be sensitive to the performance of multiple archetypes. And since we lack consistent win percentage data, it couldn't be based on that.

In other words, I'd like a formula that could detect, and scale, a metagame that is more 'equal' and balanced, and one that is more 'unequal' and imbalanced.

So, here are two possible extremes:

Balanced Metagame:

Deck 1: 17%
Deck 2: 16%
Deck 3: 15%
Deck 4: 12%
Deck 5: 11%
Deck 6: 8%
Deck 7: 7%
Decks 8-11: under 5%\

Unequal Metagame:

Deck 1: 40% of Top 8s
Deck 2: 30% of Top 8s
Deck 3-11: Under 5% of Top 8s

The "Unequal" metagame is not actually a hypothetical - it was the metagame in the summer of 2017, before Mentor and Thorn were restricted, based upon Top 8 Vintage Challenge results.

Anyway, I'm wondering if there is a formula that could detect these two differences, and give us a scaling value.

@smmenen There are a variety of ways to use a related notion, https://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity . One way would be to simply compute it for the top 8 archetype numbers.

If you have more data, say the top32 lists from every vintage challenge, you could compute it for the top 32, then top 16, and so forth, and see how much the impurity decreases as you zoom in on the winningest slice of the metagame.

last edited by diophan

Looking at imbalance I would try hard to look at overall win % against the meta opposed to numbers of decks being played. Looking at previous meta results theres some decks that have a very good win % against the meta that are underplayed (big blue). I also think that alot of people enjoy certain play styles and that that contributes to Shops and POs numbers.
I don't think a lopsided meta is necessarily bad if its because people are playing the decks they enjoy! And If theres a card pool change because of it that would be the worst thing for the format.

I know we have win % data for at least 2 data points.

last edited by John Cox

@Smmenen For a quick overview of the problem there is https://en.wikipedia.org/wiki/Diversity_index.

This looks like it might fit the bill, and is pretty much what I was looking for. What I like about the Simpson Diversity Index is that it measures BOTH 1) Richness, and 2) Evenness.

So, applied to a Vintage metagame, it could incorporate the number of viable archetypes, and their relative proportions as % of Top 8s.

EDIT: There is apparently an online index calculator that I am going to play around with by plugging in different months of Vintage Challenge results. https://www.alyoung.com/labs/biodiversity_calculator.html

EDIT 2: This index seems to work exceptionally well. I ran some tests, and basically, a very high Simpson index number means a very low level of diversity, and vice versa. So, for example, I put in a 5 deck metagame, with one deck having 1000 top 8 appearances, and the other 4 just 1, and the value was basically .99. And then I put in a 10 deck metagame, with every deck having 2 or 3 Top 8 appearances, and the score was 0.06667.

I am going to calculate a Simpson Diversity Index score for every month this year, using the Vintage Challenge Top 8 data.

last edited by Smmenen

basically, a very high Simpson index number means a very low level of diversity, and vice versa.

Maybe we want a more intuitive behavior for a "diversity index" where high index means high diversity, and low index means low diversity. The Gini–Simpson index is pretty much the same as Simpson except it has this more intuitive behavior.

As others have already said the GINI Index is what you want. There are a number of variations, but its a pretty well understood metric by econimists, and those studying inequalities alike.

@cuikui

The Simpson index ranges from 0 to 1. So 1 minus index value would do what you want.

last edited by Guest

the online calculator I linked above publishes 1-SImpson Index value. So, it has the more intuitive value, and calls it the "dominance index."

@vaughnbros Gini coefficient doesn't capture both elements tho - it only captures inequality, not "richness." Ryan was suggesting Gini Impurity, which is not the same thing. It's a totally different formula.

I also manually automated an excel spreadsheet to calculate the Simpson Index value. So I went through and calculated the Simpson Index value for:

1. every single month where there has been 3 or more Vintage Challenges (so, June, 2017-through October, 2018)

2. for Quarters for the Power Nine Challenges (Q4, 2015, Q1, 2016, Q2, 2016, Q3, 2016, Q4, 2016, and Q1+April, 2017).

3. And I went back into some old metagame reports, and calculated the Simpson Index scores for those, including Q1, 2010, and March-April, 2009.

For all of those, the Simpson Diversity Index values match intuition from looking at the scores.

I will try to figure out the best way to share all of that information, but give away the most revealing bit of information, for all of those calculations, which is 25 values, the highest Simpson Scores (all above .30) were the two months before Mentor and Thorn were restricted. And the lowest score among those 25 calculations was October, 2018, at just about .10. This year had most of the lowest scores.

What is shows is that, from an objective metric, the metagame right now is the most diverse & balanced it has ever been, using data from the Vintage or Power Nine challenges.

@zias Exactly, Gini-Simpson is just another name for 1 minus Simpson index.

@smmenen

Just not sure why you want a measurement that entangles two different ideas.

Richness seems pretty easily defined simply by the number of available archetypes, no?

So you'd have a Gini index for the inequality among the top decks, and a "richness" index that is just the number of unique archetypes.

@smmenen

Just not sure why you want a measurement that entangles two different ideas.

Richness seems pretty easily defined simply by the number of available archetypes, no?

So you'd have a Gini index for the inequality among the top decks, and a "richness" index that is just the number of unique archetypes.

Because both elements matter. If we just measured 'inequality', we wouldn't actually be measuring diversity. So, if you had only two 'species,' but they were equal, that's not actually reflective of a diverse format, even though it may be balanced. I wanted a measure that captured both diversity and balance.

As I said in post 5 of this thread, "what I like about the Simpson Diversity Index is that it measures BOTH 1) Richness, and 2) Evenness." As that biology website I linked above put it.

@zias Exactly, Gini-Simpson is just another name for 1 minus Simpson index.

So, take a look at this link.

Now scroll down, and look at the value below the Simpson value, called Dominance, or (1-Simpson). The website doesn't call it Gini Impurity, although I understand that's the same calculation. Probably just semantics?

Also, look how useful this tool is, as it creates a visualization at the bottom.

last edited by Smmenen

@smmenen For what I can read "Gini-Simpson index" and "Gini coefficient" are two different things.

"Gini-Simpson index" is 1 minus "Simpson index" and also name "Dominance index" in your link. It's is the one I was refereeing too.

"Gini coefficient" is something else.

But at the end it's just semantics.

@smmenen For what I can read "Gini-Simpson index" and "Gini coefficient" are two different things.

"Gini-Simpson index" is 1 minus "Simpson index" and also name "Dominance index" in your link. It's is the one I was refereeing too.

"Gini coefficient" is something else.

But at the end it's just semantics.

That's what I said.

Gini Coefficient is a value that measures inequality. Gini Impurlity is a completely different thing. The Wikipedia entry even says "NOT TO BE confused with Gini Coefficient."

But what I was saying is that the web calculator doesn't call 1-Simpson Gini Impurity. It calls it the "dominance index."

@smmenen

Again, thats why you have two measurements. One is the unique decks, another is a measure of eveness. By using a measurement that is entagling them how are you determining whether the problem in a particular metagane is too few decks or not enough spread amoung decks?

@smmenen

Again, thats why you have two measurements. One is the unique decks, another is a measure of eveness. By using a measurement that is entagling them how are you determining whether the problem in a particular metagane is too few decks or not enough spread amoung decks?

You supplement the primary measure with secondary measures. But my goal, as clearly explained in the OP, was to find "a formula that could detect, and scale, a metagame that is more 'equal' and balanced, and one that is more 'unequal' and imbalanced."

This does that perfectly.

Honestly, I think "balance" is really a shorthand for saying "win rates". It's how I use it. It's basically how wizard's has used it in their most recent B&R reports.

Number of times used in the Temur Energy bannings:
Balance: 0
Win percentage: 14
Win rate: 0

Number of times used in the Aetherworks Marvel ban:
Balance: 0
Win percentage: 3
Win rate: 3

I think what you are seeing is a refinement of the terminology used as Wotc has moved to more data analysis and away from top 8 results. "Balance" was used as a subjective measure of a decks win rate, often derived from top 8 results, which is what they relied on before.

So for me, the interesting thing will be to see how closely your statistical analysis of the challenges matches the data we would have from champs (if the data can be compared).

@ChubbyRain

A few things:

1. Everyone agrees that win % and win rates are the best possible metric for assessing deck performance, but there are two issues with this:

a) we lack this on a regular basis, your Vintage Champs analysis and my mid-October Vintage Challenge analysis are the exceptions that prove the rule, where we actually have win % by archetype.

b) Win % or win rates don't actually tell us that much about the overall shape and scope of the metagame. They don't tell us about diversity. They aren't a metagame metric, per se. You could have a large or small number of decks, and the win rate of any particular deck wouldn't tell us much about that.

1. Language used by Wizards: I agree with your point that Wizards is constantly refining their terminology. BUT, and this is a big caveat, the last time they restricted cards in Vintage, they specifically cited Vintage Challenge Top 8 data, not win rates or win percentages.

https://magic.wizards.com/en/articles/archive/august-28-2017-banned-and-restricted-announcement-2017-08-28

"Data from twelve recent Vintage Challenges reinforces this, with 40% of the Top 8 decks being Shops and 30% being Mentor. Both decks feature strategies that are powerful, stifle diversity, and can be frustrating to play against."

Before I read up on the Simpson Diversity Index, I was thinking about creating a "Menendian Index" that would be a mashup index of different indicators; possibly 1/3 the range of decks in Top 8s, 1/3 a Gini Coefficient-like variable that measures inequality, and 1/3 perhaps something else.

But when I read up on the Simpson Diversity Index, realizing that it is sensitive to BOTH the range of strategies in a metagame AND the relative proportions of those strategies in the field, I realized it was the perfect holistic measure for what I was looking for.

Balance is obviously a metaphore that we are applying to Magic metagames, but balance by itself refers primarily to inequality. The primary image associated with balance is a scale or teetertotter. The problem with balance, by itself, is that the metaphor of balance doesn't include the range of decks. So a 2-deck metagame could be balanced, even though such a duopoly is bad for the format. My OP had two hypotheticals that illustrate two different extremes.

The Simpson Diversity Index is perfect because it accounts for both 'inequality' and for 'diversity.' Both matter.

TLDR: terminology is tricky here. We don't just care about one thing: we care about diversity AND balance, evenness and abundance, inequality AND range. And all of these concepts and terms are conceptually related, but also different.

In any case, I will do a write-up of my findings.

last edited by Smmenen
• 19
Posts
• 5402
Views