eLife Assessment
This manuscript introduces a potentially valuable large-scale fMRI dataset pairing vision and language, and employs rigorous decoding analyses to investigate how the brain represents visual, linguistic, and imagined content. The current manuscript blurs the line between a resource paper and a theoretical contribution, and the evidence for truly modality-agnostic representations remains incomplete at this stage. Clarifying the conceptual aims and strengthening both the dataset technicality and the quantitative analyses would improve the manuscript's significance for the fields of cognitive neuroscience and multimodal AI.