I totally agree with your perspective, Andrew: both (1) concerning the fact that variation among sites, even sites belonging to the same gene, or even the same protein domain, is much more important than variation among partitions; and (2) concerning the problem that current implementations of site-heterogeneous models do not scale up well and are just not able to deal with the large super-matrices currently considered by applied phylogeneticists.
I can imagine several possible strategies to deal with this problem: not just improving the code, but exploring more radically different computational approaches, going beyond small finite mixtures (which are not rich enough to capture the true empirical complexity of site-variation) and Dirichlet processes (which will never scale reasonably with the number of positions). Also, other strategies than standard MCMC would be necessary if we really wanted to go in this direction.
However, I wonder if it is worth investing time and effort in all this. This would probably require several years of work, and this, for scaling up a super-matrix paradigm which has important limitations anyway. If 10 000 sites are not sufficient to give you interesting shared derived point substitutions for your clades of interest, then, I am not sure that piling up 100 000 or 1 000 000 aligned positions will improve the situation: there will always be residual model violations, and thus, I am afraid that the only signal that will consistently accumulate across very large alignments will be contributed by the systematic errors induced by those violations.
More fundamentally, now that exome-wide data are routinely produced, perhaps the super-matrix paradigm itself should be considered as obsolete, and more ambitious gene-tree/species-tree methods should instead be developed. Obviously, our field is collectively moving toward this new gene/species paradigm, and we should therefore probably recast the problem of site-heterogeneous effects in this new context. This is in fact my current strategy: trying to combine interesting substitution models at the level of single-gene sequence evolution with good gene duplication-loss-transfer models. Possibly using multi-step approaches.