eLife assessment
This important study presents a machine learning-based classifier that can accurately determine the geographic origin of a Salmonella enterica sample from its whole-genome sequencing data in under five minutes leading to actionable public health insights. Applying the method to 2,313 whole genome sequences collected in the United Kingdom and several external validation datasets, the authors provide convincing evidence that Salmonella genomic data can be used to identify the likely geographic source of a food-borne outbreak and, in most cases, correctly identify the country of origin of an infection acquired overseas. The work presents an excellent case for the potential utility of routine genomics coupled with machine learning for public health microbiology and the methods are likely to be applicable to other pathogens besides Salmonella enterica.