This is an incredibly complicated question. First off, I wonder why you want to compare two plays that come in defferent scenarios. In a normal game of Magic you will never be forced to choose between those two plays you presented. So I guess your question is purely theoretical (although it is still hard to define what the goal of such a comparison would be).

Basically you are asking "is the expected win percentage (EWP) in a match after doing a certain play minus the EWP after not doing that play a good metric for comparing Magic plays?". And whether a metric is good depends on what you want to know.

Obviously the most useful comparisons are between two plays in the same scenario. In that case we can probably agree that the proposed method is good, because it finds what we call "the optimal play" (the play after which you have the highest EWP). We still have little to no means of calculating it though. When you have different scenarios, things get more complex. Do you want to learn one play and know which one gives you the highest overall expected MWP? Then you have to also look at things like the metagame presence of decks and the likelihood of a certain scenario in a given matchup. You would even have to check your chances of playing a certain deck (because after you choose a deck, a maximum of one of these plays stays open to you).

Or maybe you want to choose a deck? And you have your spreadsheet with all of the possible scenarios that can happen in a game of Magic. So you look at these two plays and you think: is the Burn play better for my Burn vs the field strategy or the Twister play for my whateverplaystwister vs the field strategy? To know that, you need to, again, know the chances of that scenario coming up. Then you can see how much your percentages vs the field changed and you can choose a deck based on updated values.

Or maybe you came up with some crazy heuristic that will revolutionize Magic-playing bots. And somewhere under the hood you just NEED to know the difference in EWPs that you mentioned. Here obviously this information is very useful (the scenario itself is purely theoretical, but who knows, right?).

So I think you see the pattern. In a real world it is hard to imagine a situation when you want to compare two plays in completely different scenarios. If you want to just be theoretical, then you need to define what are you talking about, as "mathematically identical" is not very precise (depends what is used to measure identity). Different goals can lead to different answers.

However, we still have to remember that even should this kind of measure ever proof useful, we don't have any ways to reasonably estimate it (other than learning from the data, but I don't think we will ever have enough data to analyze the very rare and specific scenarios).

PS: Let's say both scenarios are equaly likely (which is usually just false). Then from the perspective of overall expected MWP those two plays are the same. BUT. Maybe your matchup vs Storm is great and against Burn you basically just lose. And you only have time to learn one play before the tournament. Also, you want the best chance of a money finish. In this case, notice that if you learn the Storm play, your MWP has more variance than it would have if you learned the Burn play. If your expected finish isn't too high before choosing which play to learn, you might want higher variance! (Your expected finish is the same with learning either play, but your range of likely finishes is wider with more variance. So provided that you don't care about a finish unless it is a money finish and money is being given only to the very best finishes, more variance - the Storm play - is better).

Alright, now I'm done