irrelevant
I like "off-target" here
irrelevant
I like "off-target" here
CrAssphage viruses
Maybe use "Carjivirus (CrAssphage)" as in the figure
Figure 9:
Missing description of data source
virus’
Noting inconsistency in using "virus'" and virus's"
virus’s
Noting inconsistency in using "virus'" and virus's"
AP Style book says "virus's" is correct: https://x.com/APStylebook/status/1371535719437590533
we and others have voiced concerns that passengers’ behavioral tendency to limit defecation on flights greatly limit pathogen shedding
Maybe cite this paper which quantifies defication rates on short- and long-haul flights.
Suitability of aircraft wastewater for pathogen detection and public health surveillance https://doi.org/10.1016/j.scitotenv.2022.159162
"Overall, our data revealed that a maximum of 13 % of individuals are likely to defecate on a short-haul flight (< 6 h in duration; Fig. 1A). Further, this response shows a small but significant gender effect (P = 0.005) and positive relationship with age (P = 0.012) although there was no separation based on social grade (P > 0.05) or family status (P > 0.05). In contrast to short-haul flights, the likelihood of defecating on a long-haul flight (> 6 h in duration) was much greater (ca. 36 % of the total) with more respondents suggesting that this might occur sometimes (Fig. 1B). Further, this showed a strong gender bias with males more likely to defecate than females (P < 0.001) and a negative relationship with age class (P < 0.001). However, no clear relationship was apparent with social grade or family status (P > 0.05)."
thought to often be the first to bring in a new pathogen or variant
I had to dig up citations for this recently as part of our air sampling manuscript (which recommends air sampling at airports for the same reason).
Human Mobility and the Global Spread of Infectious Diseases: A Focus on Air Travel https://doi.org/10.1016/j.pt.2018.07.004
Frequent Travelers and Rate of Spread of Epidemics https://doi.org/10.3201/eid1309.070081
Dispersal patterns and influence of air travel during the global expansion of SARS-CoV-2 variants of concern https://doi.org/10.1016/j.cell.2023.06.001
This improved depletion efficiency may be due to the harsh chemical environment of airplane waste (high pH and detergents) causing degradation of cell-free ribosomal RNA while having less effect on viral particles.
Here and in other places where the effect of the chemical environment is discussed, it might be good to cite some prior work on this (some of which is linked from Janika's memo: https://docs.google.com/document/d/15LN6ijozpnUqPd-xD-9YNAGTpryUhStTFl-fAQQmEEs/edit?tab=t.0)
For instance, Ahmed et al. 2020 writes:
Effects of aircraft wastewater tank disinfectant The effect of aircraft toilet deodorant and viricidal/bactericidal (Novirusac Gel Bulk, Aero Defence Pty. Ltd, Southport, Queensland, Australia), which is typically dosed into the tank of an aircraft before departure, on coronavirus (i.e. murine hepatitis virus) stability was assessed. Novirusac Gel Bulk comprised of hexylene glycol, benzalkonium chloride, didecylmethylammonium propionate ethhoxylated, N-(3-aminopropyl)-N-dodecyl-1,3-propanediamine, ethanolamine and water. Briefly, 100 μl of Novirusac Gel Bulk was mixed with 100 μl of untreated wastewater. Murine hepatitis virus (10 μl) was seeded into the mixture in triplicates. Before, seeding the Cq value of the MHV RNA was determined using RT-qPCR. Two sets of samples were incubated at 15°C (typical temperature of wastewater in an aircraft) for 48 h. RNA was extracted from the incubated samples after 24 (set 1) and 48 h (set 2). The Cq values obtained for 24-hour and 48-hour incubated samples were compared with the Cq value obtained for the seeded murine hepatitis virus stock to determine the shift in Cq values over the incubation period. RNA extraction and RT-qPCR of MHV was performed according to a recent study.20 ... "It was also postulated that the disinfectants used in the aircraft may accelerate the decay of SARS-CoV-2 RNA. However, only 1.6–2 Cq increase was observed after 48 h for murine hepatitis virus, suggesting that Novirusac Gel Bulk has little impacts on the decay of SARS-CoV-2 for the flight duration 8–13 h. It has to be also noted that for the MHV RNA decay experiment, we used a high concentration of Novirusac Gel (i.e. 1:1 ratio Novirusac Gel:wastewater), however, the ratio of Novirusac Gel to wastewater is typically 100–1000 times lower in the aircraft and will have little impact on the decay of SARS-CoV-2 RNA."
Table 4:
Mengyi et al.
The results for Mengyi in Fig. 1 and 3 suggest the composition looks more like the other blood samples than the plasma. I know there was confusion about their sample prep. Can you add discussion of this to the dataset description? Do you think it's possible that these are actually whole blood samples?
Compare the mean relative abundance of human infecting viruses between plasma and whole blood
Where do you compare the means? Try to be more precise and quantitative here.
8.87e-08
Are you sure this calculation is correct? When controlling for human RA, overall virus abundance goes from 0.00011 % to 0.07906 % (2 OOM difference) yet these numbers don't seem that different.
plasma has a higher viral read fraction by a magnitude of 10
Again, this is not a very quantitatively precise statment. The difference between viral fraction in Cebria Mendoza and and O'Connel is far greater than 10X. Maybe give a gemotric average difference (and range) for both the human-read removed and normal datasets.
we can see that they have similar read composition across the board
I don't think this is accurate, or at least it's highly subjective. Across blood data without human reads, viral read fraction is still differing across an OOM.
Another way we can look at this is by controlling for human reads across all datasets
Explain the motivation here:
Clearly sample prep can lead to large variations in how many human reads end up in your sample. Cebria Mendoza is an example of highly efficient human read removal in plasma. Probably this is more challenging in blood, but may still be possible particularly if looking at the extracellular content. Perhaps some of the blood studies were focused on human genome sequencing? If so, I'd also mention that in the dataset descriptions, since this means they're really not interested in viral enrichment. Possibly human read removal techniques like JumpCode genomics' kit could be applied effectively here.
plasma dataset
I would say specifically the "if we instead only look at the Cebria-Mendoza 2021 plasma dataset..."
4.2.3 Species composition
This is a great plot.
Figure 1: Read composition
Again, I would keep this separated by study.
Viral Enrichment
Is it safe to say that viral enrichment here refers to sample prep human read removal?
160 samples
If we won’t know whether 1 sample = 1 donation I would call that out explicitly.
SARS
Looks like we only got SARS reads in two samples. This is a good candidate for validating with BLAST if there's time.
species
Also noteworthy that we saw a lot of anelloviridae in the Cebria Mendoza RNA MGS paper.
Maybe we have both RNA and DNA stages of anelloviridae get picked up, or maybe the RNA > cDNA sample prep just also got a bunch of anelloviridae DNA in it.
e preparation/library preparation is not taken
I would highlight/link the Cebria Mendoza paper here.
Popular/Well-known Name
This is a great addition. I think it would actually be better to use the popular names in the above RA plot as well.
We may also have some data source internally at SB that has a mapping from NCBI taxonomic virus name to common name.
exclude Anelloviridae
Why exclude Anelloviridae?
I think it would make sense to have a version of this plot with human reads removed as we discussed in the context of Cebria Mendoza paper.
2.2 Quality control metrics
Can the two QC sections be combined or are they stage specific?
3.2 Total viral content
I would make this a histogram. You can make two histograms - one for classified reads and one for all reads.
4.1 Overall relative abundance
I still think making a histogram is the way to go for this plot
p_reads_max
What is p_reads_max?