Turning to comparability across groups, invariance results were mixed across assessments. In PISA, the bidimensional model achieved configural, metric, and scalar invariance across countries, providing strong evidence that the construct is comparable across participating education systems (supporting H2). In ICILS, the model reached configural and metric invariance but did not meet scalar invariance criteria across countries, suggesting that cross country mean differences should be interpreted cautiously and reinforcing the importance of explicitly evaluating between country comparability before drawing substantive conclusions (partial support for H2). These results indicate that even when the same two-dimensional structure is recoverable, the degree of cross national comparability may depend on assessment specific design features and item functioning. In contrast, gender invariance results were consistently supportive. Both assessments achieved scalar invariance across gender, indicating that General and Specialized DSE can be compared meaningfully between boys and girls within each dataset (supporting H3). This finding strengthens the validity of gender comparisons in DSE and suggests that observed gender gaps in DSE are unlikely to be driven by measurement non-equivalence, at least within the tested frameworks.
I think that here a raference to Hristov et al., and Campos & Shcerer could add deepness. Hristov done a invariance analysis between gender and migration status groups, and campos & Scherer applied the alginment method to the DSE construct