So assuming we all agree that Vintage is not really a one-deck format, there are a few possible explanations for the above data:
The format's best players play Shops, and its worst play decks that prey on Shops, so that the win percentage of Shops against its predators is not accurately reflected in tournament results;
Even though Shops as a whole has a winning match percentage against all other archetypes as a whole, individual builds of Shops are weak against individual builds of, say, Big Blue, and this finer-grained relationship between the decks is not captured in the aggregate data.
I have no idea how to correct for #1 (if it's even true), but #2 could be fixed by going back through Diophan and ChubbyRain's data and doing a more fine-grained taxonomy, splitting archetypes that contain multiple, strategically-varied builds.
I can try doing this, but as it represents a very significant time investment, I'd like to get your thoughts first about what taxonomy you consider reasonable: if you had to decompose the metagame over the past year into ~15 archetypes so that
decks with similar strategic roles in the metagame (and similar win percentage against other archetypes) are grouped into the same archetype;
decks with different strategic role, but similar core, are classed as different archetypes;
what groups would you pick?