Hypothesis

77 Matching Annotations

Dec 2022
www.nature.com www.nature.com

PLD3 affects axonal spheroids and network defects in Alzheimer’s disease

11
1. rmflight 20 Dec 2022
  
  in Public
  
  All of the custom codes for FIJI and MATLAB used in this study have been deposited at GitHub (https://github.com/PaulYJ/Axon-spheroid).
  
  GitHub is not a repository. figshare, zenodo, data-dryad, and other similar sites are appropriate code repositories.
2. rmflight 20 Dec 2022
  
  in Public
  
  In all statistical comparisons, non-parametric tests were used unless otherwise justified. Specific tests used for each graph can be found in the corresponding figure legend.
  
  So, one issue I see, is that there are several statistical tests done in each figure:
  
  Fig 1: 8
  
  Fig 2: 7
  
  Fig 3: 5
  
  Fig 4: 9
  
  Fig 5: 5
  
  Ext Fig 3: 1
  
  Ext Fig 4: 3
  
  Ext Fig 6: 7
  
  Ext Fig 8: 7
  
  Ext Fig 9: 2
  
  Ext Fig 10: 5
  
  Ext Fig 11: 6
  
  Total: 65 tests.
  
  That's a lot of tests. More than enough to be concerned about the reported p-values.
3. rmflight 20 Dec 2022
  
  in Public
  
  When more than two groups were considered and compared, corrections for multiple comparisons were performed as part of the post hoc analysis. All statistical analysis was performed using GraphPad Prism.
  
  This is good.
4. rmflight 20 Dec 2022
  
  in Public
  
  we found a similar correlation between premortem cognition, the abundance of large ELPVs and low levels of cathepsin D within PAASs (Fig. 2o and Extended Data Fig. 6c,d).
  
  In figure 2o, the p-values between PLD3 - AD and AD - MCI are the same, 0.0095. What are the odds that we get the same p-values in both of these comparisons?
5. rmflight 20 Dec 2022
  
  in Public
  
  Furthermore, we also found that small PAASs were predominantly filled with vesicles that contained higher levels of the protease cathepsin D and were acidic—characteristic of more mature lysosomes (Fig. 2e–h and Extended Data Fig. 6f,g). By contrast, as PAASs increased in size, their overall acidification and cathepsin D levels declined (Fig. 2e–h and Extended Data Fig. 6f,g), consistent with the accumulation of ELPVs, which have not acquired sufficient lysosomal proteases and acidic pH27. Overall, this suggests that spheroid enlargement could be mechanistically linked to the accumulation of ELPVs.
  
  Again, these all could have been perhaps better evaluated by plotting the potentially causal factor against the other piece, instead of doing a statistical test between high-low / presence - absence.
  
  I think the conclusions stand, but the language implies correlation, but the graphs only give statistical differences.
6. rmflight 20 Dec 2022
  
  in Public
  
  There was a notable correlation between the presence of ELPVs and the overall size of individual spheroids (Fig. 2d).
  
  The authors use "correlation" here, but did not actually plot percent ELPVs against the PAAS size. Instead they have bar graphs of PAAS size by ELPVs presence (+) and absence (-), and what looks like a p-value.
  
  I agree it looks like there is an association here, but a correlation implies that one plotted two things against each other and evaluated using an appropriate correlation metric.
7. rmflight 20 Dec 2022
  
  in Public
  
  However, in 5xFAD mice, we found that Ca2+ rise times in projecting axons were markedly delayed (Fig. 1i–j). This suggests that local AP conduction abnormalities caused by PAASs lead to a disruption in long-range axonal conduction.
  
  More evidence of issues from PAASs.
8. rmflight 20 Dec 2022
  
  in Public
  
  These experimental observations were consistent with our computational modelling demonstrating that the size of individual PAASs is a critical determinant of the degree of axonal conduction defects
  
  So the larger the PAAS, the more likely there are axonal conduction issues, which of course impact the neural network.
9. rmflight 20 Dec 2022
  
  in Public
  
  This supports the idea that PAASs are not a feature of degenerating axons but, rather, are stable structures that may affect neuronal circuits for extended intervals, while at the same time having the potential for reversibility.
  
  So PAAS are not directly indicative of degenerating axons, but affect neuronal circuits for extended intervals, and can be reversed.
10. rmflight 20 Dec 2022
  
  in Public
  
  plaque-associated axonal spheroids
  
  PAAS - plaque-associated axonal spheroids
  
  spheroids of material around axons that seem to be associated with AB deposits.
11. rmflight 20 Dec 2022
  
  in Public
  
  Thus, targeting PAAS formation could be a strategy for ameliorating neural network abnormalities in AD.
  
  To sum up the intro:
  
  amyloid deposits seem to cause PAASs, which are persistent and undergo changes in size. PAASs disrupt action potentials / connectivity PAAS size is influenced by endolysosomal biogenesis, which can be influenced in turn by PLD3 protein levels.
  
  Furthermore, controling PLD3 can reverse the endolysosomal bodies and PAAS formation.
Visit annotations in context

Annotators

rmflight

URL

nature.com/articles/s41586-022-05491-6
quarto.org quarto.org

Quarto – HTML Basics

1
1. rmflight 04 Dec 2022
  
  in Public
  
  epo you want to
  
  I'm trying to decide between hypothes.is and utterances. Utterances means I get a GH notification. Is there any way to be notified that someone annotated a part of your quarto document?
Visit annotations in context

Annotators

rmflight

URL

quarto.org/docs/output-formats/html-basics.html
Sep 2022
www.biorxiv.org www.biorxiv.org

SARS-CoV-2 and HSV-1 Induce Amyloid Aggregation in Human CSF

12
1. rmflight 20 Sep 2022
  
  in Public
  
  Supplementary Figure 1.
  
  Why is there no Supp Materials here in the preprint? I would suggest adding them to the manuscript and resubmitting.
2. rmflight 20 Sep 2022
  
  in Public
  
  differentially expression analysis conducted using LIMMA (68).
  
  Scripts used to manipulate the data should be provided, and the various inputs should be in a repository somewhere.
  
  I would include the mzML files, the numeric outputs from the proteomics workflow, and the inputs to limma.
3. rmflight 20 Sep 2022
  
  in Public
  
  custom-built peptide database.
  
  What is in the custom built database, and why is it necessary?
  
  Should you include peptide sequences for HSV-1 and Sars-CoV-2 to verify that there were viral specific proteins in the amyloid plaques and none in the control CSF samples?
4. rmflight 20 Sep 2022
  
  in Public
  
  identify the proteins that were enriched in the amyloid fraction triggered by the two viruses compared to the normal CSF proteome.
  
  Right, this implies that those proteins were more abundant in virus compared to control CSF. However, for two proteins that I can map from here to Figure 1, namely APOE and APLP1, they show depleted values in virus compared to CSF, the complete opposite of what is being claimed here.
  
  PGK1 is shown to be enriched in virus compared to controls in Figure 1. CP is depleted in virus / CSF.
  
  It would really help if the authors either used the gene symbol in the text, or put the gene symbols matching Figure 1 in brackets following their full names in the text.
5. rmflight 20 Sep 2022
  
  in Public
  
  HSV-1 and SARS-CoV-2 induce amyloid aggregation of proteins in human CSF.
  
  Several questions about Figure 1 A & B.
  
  RFU = relative fluorescence unit. What is this relative to? An internal standard specific to the run of measurements that is done for these type of ThT measurements?
  
  HSV-1 = 50 uL CSF + 150 uL ThT + 100 uL HSV = 300 uL; 48 hours incubation
  
  SARS = 25 uL CSF + 150 uL ThT + 25 uL SARS = 200 uL; 24 hours incubation
  
  Why the differing volumes and length of time of incubation? Text just states 48 hours.
  
  And why are the control curves of just CSF or just untreated medium so different between the two?
  
  It is unclear from either the figure or the methods how the two different CSF collections are used and results combined.
  
  It also feels like in both cases, values should be reported relative to one of the controls. I think this would make it easier to compare the two to each other.
  
  There is no discussion of why the curves for HSV and Sars controls are so different from each other, and why in the Sars case the shape of the virus only and non-infected + CSF shape match the virus + CSF, but differ only in overall intensity.
  
  Finally, why are the scales on the figures so different? Is this due to doing measurements in completely different runs?
  
  I would encourage the authors to also deposit all of the ThT measurements with the supplemental materials, or in a data platform such as zenodo or figshare.
6. rmflight 20 Sep 2022
  
  in Public
  
  we incubated live HSV-1 virus and UV-inactivated SARS-CoV-2 virus with CSF harvested from healthy individuals.
  
  Why not use UV inactivated viruses for both HSV-1 and SARS-CoV-2??
  
  If the model is that it's due to templating on the virus shell, then it shouldn't matter.
  
  Also, if the methods state 48 hours, why don't the graphs show out to 48 hours for both viruses?
7. rmflight 20 Sep 2022
  
  in Public
  
  In the proteomic analysis, we found that a large set of proteins was enriched in the virus-induced amyloid fractions compared to those present in untreated CSF (Figure 2. & Supplementary Table 1.)
  
  What exactly is the definition of enriched here? When I think of "enriched", I think of a statistical definition similar to hypergeometric enrichment, where things are present in this fraction more than expected by chance.
  
  It is also not clear either here or from the methods, how exactly one gets a comparison of proteins for differential analysis. The methods say that amyloid plaques are isolated using the SDS protocol and then digestion for proteomics. Then CSF only is prepared for proteomics with no amyloid isolation protocol?
  
  This is the only thing that makes sense to do, because based on the ThT assay in Figure 1, there were essentially no amyloid plaques observed in the CSF only samples after 24 - 48 hours.
  
  But the proteins associated with amyloid tangles that are previously shown to be important like APLP1, ApoE, etc all show statistically significant differential decreases in the virus samples compared to CSF?? If you are comparing amyloid plaque associated proteins to control CSF, they should be increased instead of decreased, correct? The only way I can see the results of Figure 1 making sense is if you are comparing the supernatant after removing the amyloid plaques, or the design matrix in limma was inverted. This would be possible to evaluate if the data was deposited somewhere appropriately.
  
  I realize this might be out of scope, but it would be nice to have proteomics quantification of: * virus induced amyloid tangles * CSF after removal of virus induced amyloid tangles * control CSF after doing amyloid tangle selection protocol * control CSF
  
  I think these 4 sets would make it easier to know where specific proteins are actually appearing and cross reference that the overall differential design results are correct and consistent.
  
  Finally, an actual enrichment test would be good to do. For example, using the set of proteins previously detected in amyloid plaques as the "annotated set", you could test using hypergeometric test that the differential proteins are enriched in this set more than expected by chance, based on the total set of proteins detected in both the amyloid plaques and the control CSF.
8. rmflight 20 Sep 2022
  
  in Public
  
  More than 40% (n= 113) of the enriched proteins were shared in the amyloid fractions induced by both viruses, while about 37% (n= 102) were unique for the HSV-1-induced amyloid fraction, and 23% (n= 64) unique for the SARS-CoV-2-induced amyloid fraction.
  
  I think an UpSet plot (see https://en.wikipedia.org/wiki/UpSet_Plot) would really help to quantify the various overlaps here.
9. rmflight 20 Sep 2022
  
  in Public
  
  Non-infected cell medium was prepared with the same procedure without viral infection.
  
  So this is the non-infected medium in Figure 1?
10. rmflight 20 Sep 2022
  
  in Public
  
  UV-inactivated SARS-CoV-2 was used instead of live SARS-CoV-2 for biosafety reasons since live SARS-CoV-2 was not safe to incubate in the spectrophotometer outside the BSL-3 laboratory.
  
  So UV-inactivated SARS-CoV-2 was used only because live is listed as BSL-3, in comparison to HSV-1 is listed as BSL-2.
  
  I still think the case would be stronger if both HSV and CoV were UV inactivated. Having that difference is another variable between them that is unnecessary.
11. rmflight 20 Sep 2022
  
  in Public
  
  The virus was harvested at day three, four and five post-infection and centrifuged at 1000 g for 6 min to remove cells debris. The clarified supernatant was further centrifuged at 45000 g for 4h.
  
  Which of these were used for UV inactivation and induction of plaques assay? Were all the collections combined together?
12. rmflight 20 Sep 2022
  
  in Public
  
  In the current study, we show that the incubation of HSV-1 and SARS-CoV-2 with human cerebrospinal fluid (CSF) leads to the amyloid aggregation of several proteins known to be involved in neurodegenerative diseases, such as: APLP1 (amyloid beta precursor like protein 1), ApoE, clusterin, α2-macroglobulin, PGK-1 (phosphoglycerate kinase 1), ceruloplasmin, nucleolin, 14-3-3, transthyretin and vitronectin. Importantly, UV-inactivation of SARS-CoV-2 does not affect its ability to induce amyloid aggregation, as amyloid formation is dependent on viral surface catalysis via HEN and not its ability to replicate.
  
  This, this is really freaky if the results support this conclusion.
Visit annotations in context

Annotators

rmflight

URL

biorxiv.org/content/10.1101/2022.09.15.508120v1
Oct 2021
arxiv.org arxiv.org

2109.09973.pdf

4
1. rmflight 07 Oct 2021
  
  in Public
  
  These advantages of a new language need to be bal-anced against the convenience of programmers who areable to tap into the collective knowledge of vast user com-munities.
  
  I think the authors have done a great job summarizing why Julia could definitely be a wonderful language for scientific computing, and appreciate that they included the numba JIT compiler for python comparisons.
  
  However, there is also cython, which purports to quickly convert python code to compiled C code, and Rcpp, which provides a large amount of synctatic sugar that makes working with vectorized data types in C++ very easy.
  
  I'm curious if the experience required to convert python -> cython and R -> Rcpp is greater than implementing these same algorithms in Julia in the first place, as well as the relative performance of each.
  
  This may be beyond the scope of the current manuscript, however.
2. rmflight 07 Oct 2021
  
  in Public
  
  This and similar performance featuresare now leading package authors of statistical and datascience libraries to recommend calling into Julia for suchoperations, such as the recommendation by the principalauthor of the R lme4 linear mixed effects library to useJuliaCall to access MixedModels.jl in Julia (bothwritten by the same author) for an approximately 200xacceleration[30]
  
  I do remember hearing about this, that lme4 couldn't be implemented in R b/c the speed of computation was too prohibitive, and therefore they were moving to Julia
3. rmflight 06 Oct 2021
  
  in Public
  
  Julia uses the “.” operator to signify element-wise actionof a function, and therefore the equation can be writtenas D. = A. ∗B. + C.
  
  Wow. OK, this starts to explain more and more the appeal of Julia.
4. rmflight 06 Oct 2021
  
  in Public
  
  Computers are tools. Like pipettes or centrifuges,they allow us to perform tasks more quicklyor efficiently; and like microscopes, NMR or mass-spectrometers, they allow us to gain new, more detailedinsights into biological systems and data.
  
  Yes!! And we need to be doing more to introduce undergraduate students to the proper inclusion of these tools and how to use them in reproducible ways!
Visit annotations in context

Annotators

rmflight

URL

arxiv.org/pdf/2109.09973.pdf
academic.oup.com academic.oup.com

KEA3: improved kinase enrichment analysis via data integration

8
1. rmflight 05 Oct 2021
  
  in Public
  
  An integer rank from 1 to k for each protein set in a library size of k indicates sets with the lowest and highest P-values accordingly. A scaled rank is computed by dividing each integer rank by k. Thus, for a single query, there is one kinase rank list for each protein set library in KEA3. False discovery rates (FDRs) are computed via the Benjamini-Hochberg correction for each library separately. Out of the 24 candidate libraries, rank lists for the 11 final KEA3 libraries which met the benchmarking threshold are integrated via the MeanRank and TopRank methods (23).
  
  Alternatively:
  
  For each source of kinase-substrate information, do hypergeometric enrichment to generate a p-value for each kinase-substrate list.
  
  Rank all p-values of enrichment within that source
  
  Create a combined ranking across sources using means, or taking top ranking (mean-rank and toprank respectively)
2. rmflight 05 Oct 2021
  
  in Public
  
  One aspect that seems particularly odd in this manuscript is the focus on top XXX genes. Every time that one would have to choose something based on a cutoff, I see cutoffs based on the top XXX genes, not a numeric cutoff for the value being computed.
3. rmflight 05 Oct 2021
  
  in Public
  
  Generating the benchmarking datasets
  
  For the benchmarking datasets, there is nothing in the methods that mention making sure that the benchmark data is not part of the data used for querying.
  
  We would expect that if the benchmark data is part of the kinase --> protein annotation set, then the process should be able to recover the benchmark information.
4. rmflight 05 Oct 2021
  
  in Public
  
  To create the KEA3 gene co-expression libraries, all GTEx RNA-seq
  
  So GTEx and ARCHS4 are actually co-expression sets of the kinase with other genes, where the co-expression is derived as being from the top 300 genes by Pearson correlation.
  
  I can't tell from this text if values were log-transformed first or not.
5. rmflight 05 Oct 2021
  
  in Public
  
  We also used the entire STRING database to form the STRING library, including physical interaction, co-expression, co-occurrence in the literature, and evolutionary co-occurrence, among other association types.
  
  Nothing noted here about how high the score in STRING had to be to keep the interaction. This bothers me a bit, as the scores span a wide range.
6. rmflight 05 Oct 2021
  
  in Public
  
  Within each dataset, all kinases are human kinases that have at least five distinct putative human protein substrates.
  
  Makes sense, if a kinase doesn't map to at least 5 substrates, then it may be a false positive.
7. rmflight 05 Oct 2021
  
  in Public
  
  The enrichment of known kinase substrates in a set of differentially phosphorylated proteins can serve as a potential marker of the upstream kinases' state and provide insights into physiological and pathophysiological mechanisms (14).
  
  So this is the goal, figure out what is potentially enriched to provide a set of targets.
8. rmflight 05 Oct 2021
  
  in Public
  
  Since kinases serve a critical and central role in regulating essentially all cellular processes (5), and their aberrant constitutive activation is recognized as a cause of many human cancers (6–10), identifying alterations in kinase state given results from phosphoproteomics experiments is critical.
  
  Right, we want to know what could be going on. There are experiments that can infer alterations in kinase state, but sometimes just have list of the inputs.
Visit annotations in context

Annotators

rmflight

URL

academic.oup.com/nar/article/49/W1/W304/6279841
Jun 2021
www.pnas.org www.pnas.org

Stewardship of global collective behavior

7
1. rmflight 22 Jun 2021
  
  in Public
  
  Proposed solutions in conservation biology and other crisis disciplines, no matter how elegant, are often stymied by inability to convert workable solutions into large-scale behavioral change. Clever solutions aimed at social system stewardship will face similar challenges. In this regard, social media’s influence provides a unique source of both risk and opportunity. Changes to a few lines of code can impact global behavioral processes.
  
  Yes! There is potential that changes to just a few lines of code can change lots of behavior. Although it would be awesome to see the records from Twitter, Facebook et al to see which changes to interfaces have actually changed user behavior, and which ones have not.
2. rmflight 22 Jun 2021
  
  in Public
  
  This raises the possibility that some business models may be fundamentally incompatible with a healthy society (154).
3. rmflight 22 Jun 2021
  
  in Public
  
  As most communication technology is privately owned, the ability to study its impact, much less enact evidence-based policy, is constrained by the willingness of companies to cooperate.
  
  Reminds me of the issues around Facebook NOT making public the information about which ads are served to which people, and then banning a plugin that provided this information to researchers who were interested in this data.
  
  The information about who sees which ads is extremely useful in our society where so many are willing to spend hard cash (or at least donated cash) to spread disinformation, but you can't study how the algorithms impact this without access to the data.
4. rmflight 22 Jun 2021
  
  in Public
  
  For instance, online vaccination or electoral registration programs risk relative disenfranchisement of groups that cannot take advantage of them.
5. rmflight 22 Jun 2021
  
  in Public
  
  We suggest that there is an urgent need for an equivalent of the Hippocratic oath for anyone studying or intervening into collective behavior, whether from within academia or from within social media companies and other tech firms.
  
  Yes! And mechanisms to hold them accountable!
  
  This makes me think of Google's recent(ish) firing of a person heavily involved in equity in AI, and the many issues around Facebook and Twitter and transparency.
6. rmflight 22 Jun 2021
  
  in Public
  
  In this sense, online communication technology increases the urgency of stewardship while providing opportunities to enact evidence-based policies at scale.
7. rmflight 22 Jun 2021
  
  in Public
  
  In sum, we are offloading our evolved information-foraging processes onto algorithms. But these algorithms are typically designed to maximize profitability, with often insufficient incentive to promote an informed, just, healthy, and sustainable society.
  
  Right! It's all about the money online. We let these networks form because of companies, in contrast to the early blog rings and page rings.
Visit annotations in context

Annotators

rmflight

URL

pnas.org/doi/10.1073/pnas.2025764118
Mar 2021
link.springer.com link.springer.com

Migrating from partial least squares discriminant analysis to artificial neural networks: a comparison of functionally equivalent visualisation and feature contribution tools using jupyter notebooks

4
1. rmflight 18 Mar 2021
  
  in Public
  
  (https://cimcb.github.io/MetabProjectionViz/).
  
  I think more journals need to have a policy of putting things in a more permanent spot, as GitHub repos can simply be deleted by the user.
2. rmflight 18 Mar 2021
  
  in Public
  
  We recently showed that ANNs have similar predictive ability to PLS across multiple diverse metabolomics data sets (Mendez et al. 2019c).
  
  OK, this sounds cool and will have to check it out.
3. rmflight 18 Mar 2021
  
  in Public
  
  PLS-DA
  
  Wait, wait, wait. I thought we were doing PLS, not PLS-DA. PLS w/out knowing classes would be fine for trying to interpret things, but PLS-DA is extremely biased and will find differences for anything and everything, including random data if not done the "right" way.
  
  This feels like a bait and switch ....
4. rmflight 18 Mar 2021
  
  in Public
  
  none of these methods have gained the popularity of PLS.
  
  I wonder why PCA is not mentioned here at all? Like seriously, does no one use PCA for metabolomics studies, or did PLS really win over the metabolomics people that much?
Visit annotations in context

Annotators

rmflight

URL

link.springer.com/article/10.1007/s11306-020-1640-0
Jan 2020
www.biorxiv.org www.biorxiv.org

BlackSheep: A Bioconductor and Bioconda package for differential extreme value analysis

1
1. rmflight 13 Jan 2020
  
  in Public
  
  sample annotation file
  
  Why does this need to be a file? Shouldn't a data.frame be appropriate?
Visit annotations in context

Annotators

rmflight

URL

biorxiv.org/content/10.1101/825067v2
Nov 2019
www.biorxiv.org www.biorxiv.org

A strategy to incorporate prior knowledge into correlation network cutoff selection

1
1. rmflight 01 Nov 2019
  
  in Public
  
  postulate that even a coarse, incomplete, or partially incorrect biological reference is suitable for this approach, as long as a sufficient amount of correct biological knowledge is covered
  
  This is an interesting idea. So, even in relatively unknown systems, use how some networks correspond to ground truth to tune cutoffs to be used even for unknown networks.
Visit annotations in context

Annotators

rmflight

URL

biorxiv.org/content/10.1101/792697v1
Mar 2019
arxiv.org arxiv.org

1903.07639.pdf

1
1. rmflight 27 Mar 2019
  
  in Public
  
  Code and code comment
  
  Although I agree these are going to be the two most commonly used elements, how necessary is it that a data scientist knows how to "code"? Could one be a data scientist using Excel, for example?
Visit annotations in context

Annotators

rmflight

URL

arxiv.org/pdf/1903.07639.pdf
Nov 2018
figshare.com figshare.com

Perspectives and Expectations in Structural Bioinformatics of Metalloproteins

1
1. rmflight 01 Nov 2018
  
  in Public
  
  but also unfair criticism that was put forth to support a particular view.
  
  This is just it. A critique was published on our work, and we were not notified, or given the chance to respond. And the authors lied about us making our data available.
Visit annotations in context

Annotators

rmflight

URL

figshare.com/articles/Perspectives_and_Expectations_in_Structural_Bioinformatics_of_Metalloproteins/4754263
chemrxiv.org chemrxiv.org

ChemRxiv: Year One and Beyond [Editorial]

1
1. rmflight 01 Nov 2018
  
  in Public
  
  Our strong vision to provide a premier preprint service tailored to chemists has resulted in this already robust support. We chose Figshare as its service provider to deliver a modern interface and ability to both host and interactively display data natively within the browser. Our authors and readers have made good use of these features by uploading crystal structures, computational files, videos, and more that can be processed and manipulated without the need for specialized software. ChemRxiv accepts all data types from authors—removing the limitations imposed by PDF and Word— providing a richer, more valuable reading experience for users. Since launch, we have added a number of new features, including a “Follow” feature, which allows readers to create notifications and RSS feeds based on precise search criteria, and an interactive citation-formatting tool. Our automated scans for plagiarism, viruses and malware, and improvements to the curation tools allow triage before posting to be quick, in fewer than two business days, and often in less than one day! Several new features will be available with the next release, including an interactive search widget to the “Funding” field. All of this, plus positive user feedback and the establishment of our global three-society governance, means that we are moving ChemRxiv from the beta stage to a full-service resource
  
  This is pretty neat!
Visit annotations in context

Annotators

rmflight

URL

chemrxiv.org/articles/ChemRxiv_Year_One_and_Beyond_Editorial_/7185149/1
Oct 2018
peerj.com peerj.com

Prediction of bacterial E3 ubiquitin ligase effectors using reduced amino acid peptide fingerprinting

20
1. rmflight 26 Oct 2018
  
  in Public
  
  conservative threshold
  
  Missing e-value. Know it is in the Results, but I think it should be included in the Methods.
2. rmflight 26 Oct 2018
  
  in Public
  
  range of different peptide
  
  Should the range be stated here? It is in the results, but seems relevant to the Methods.
3. rmflight 26 Oct 2018
  
  in Public
  
  later verified
  
  missing "be", should read "later be verified"
4. rmflight 26 Oct 2018
  
  in Public
  
  difficult characterize
  
  missing "to", should be "difficult to characterize"
5. rmflight 26 Oct 2018
  
  in Public
  
  controlling
  
  The use of "controlling" here seems redundant.
6. rmflight 26 Oct 2018
  
  in Public
  
  Ab
  
  This did not render over from whatever source was used to generate the PDF.
7. rmflight 26 Oct 2018
  
  in Public
  
  RAA1(Hydrophobicity)
  
  In the paper, the acronym RAA is used for reduced amino acid alphabet, in the software, [RED is used](https://github.com/biodataganache/SIEVE-Ub/blob/master/KmerFeatures.py#L173). I can't see a good reason for having two different systems of naming when they are both 3 letter acronyms ...., in addition, the numbering here is 1-5, whereas the software is 0-4. Given the tight link between this paper and the software, it seems one naming / numbering system should be used in both.
8. rmflight 25 Oct 2018
  
  in Public
  
  FW places all the samples belonging to a particular cluster in a 165single test set, while the classifier is trained using the remaining data.
  
  This is talking about a "cluster of related proteins", where the related proteins were decided by the BLASTP similarity cutoff. I think changing it to read "particular cluster of related proteins" would help the understanding here a lot. I didn't get it until I looked at the code with the data in it, and running it.
9. rmflight 25 Oct 2018
  
  in Public
  
  WP_012732629.10.500.82Corynebacterium kroppenstedtii360hypothetical protein
  
  As noted here, I can't recreate this result using the code and the model provided.
  
  There are two possible issues with my trying to recreate this result:
  
  I used the wrong genome / proteome. This is possible b/c the full list of genomes downloaded from PATRIC is not provided.
  
  I used the wrong model. The model discussed in the publication has 100 features, and the one in the data has 6493, and as I indicated elsewhere, trying to do the RFE procedure I get something that has no resemblance to Figure 2. So I used the model with 6493 features for my test.
10. rmflight 25 Oct 2018
  
  in Public
  
  PATRIC database
  
  Should the complete list of proteins and their SIEVE and SIEVEUb scores be provided as supplemental data?
11. rmflight 25 Oct 2018
  
  in Public
  
  We found that top-scoring 273peptides from our model matched the second zinc finger sequences for several RING/U-box E3 ligases 274including the LubX protein and the herpesvirus ICP0 protein
  
  The data to support that top-scoring peptides match second zinc-finger sequences isn't provided. Unless this would be part of the data for the retained features and their locations in the peptides (see this other comment). But having the locations of the zinc fingers in conjunction with the feature locations would really help support this result.
12. rmflight 25 Oct 2018
  
  in Public
  
  locations
  
  I could not find the list of retained features anywhere along with their locations. It would be very helpful if this is in the GitHub repo (or another data repo) that the full path to the file is identified.
13. rmflight 25 Oct 2018
  
  in Public
  
  100
  
  I tried to recreate the RFE procedure and Figure 2, as noted here.
  
  Running the code listed there, I got this figure instead:
14. rmflight 25 Oct 2018
  
  in Public
  
  different peptide 136lengths and peptides
  
  Should the actual range of kmers used be indicated here? Code and plots show 3-20
15. rmflight 25 Oct 2018
  
  in Public
  
  171 genomes that are 126listed as human pathogens and are representative reference genomes from PATRIC [45]. This set 127comprises 480,562 protein sequences excluding all of the proteins used in the training set above.
  
  Is there a list of which genomes were downloaded? PATRIC has different versions for many organisms, so having the list of genomes seems pretty important.
  
  I'm also curious just how big a file would be with all of the 480K protein sequences ...., as a way to provide the set of data that the manuscript actually used.
16. rmflight 25 Oct 2018
  
  in Public
  
  We identified a set of 168 confirmed bacterial or viral E3 ubiquitin ligase effectors from the UniProt 122database [30, 31]. Negative examples were 235 other bacterial effectors identified from literature [8, 20, 12324, 27, 30-44]. We include details on the dataset as Supplemental Data
  
  What exactly were the search criteria used to identify these from UniProt and the literature? The results seem to imply Pfam models, but it is not explicitly stated here or anywhere else that I can see.
  
  Also, which file contains the list is not explicitly identified here. I think it is data/FamiliesConservative.txt, is that correct?
17. rmflight 25 Oct 2018
  
  in Public
  
  To identify a minimal set of features that are important for classification of E3 ubiquitin ligases from 248other effectors we used recursive feature elimination, a standard machine learning approach [8]. Briefly, a 249model is trained on all features, then weights for each feature are used to discard 50% of the features with 250the lowest impact on model performance. The remaining features are then used in another model training 251round in which this process is repeated until all the features have been eliminated. The training 252performance results from the RFE on the RAA1-K14 model are shown in Figure 2. We chose to keep 100 253features in our final analysis given that this provided good training performance (AUC >0.9), but retained 254a small portion of the initial features (3%). These features are provided as Supplemental Data along with 255their locations in each of the positive and negative examples in our analysis set.
  
  If one loads the provided file in data/SIEVEUbModel.Rdata, it seems to have 6394 features in the model, not 100.
18. rmflight 25 Oct 2018
  
  in Public
  
  https://github.com/biodataganache/SIEVE-Ub.
  
  This is great that the code is provided, but this isn't really a proper data repository, as it could be deleted at the whim of the owner.
  
  Assuming the code is put in a proper data repository, the reference to that should also be provided.
19. rmflight 25 Oct 2018
  
  in Public
  
  Python script
  
  Which version of Python will these scripts work with, and which version of Python was used for this work?
  
  The scripts depend on Biopython, it should be cited, and the version used for this work indicated.
20. rmflight 25 Oct 2018
  
  in Public
  
  e1071 R library in our implementation.172173The area under the curve (AUC) and receiver-operator characteristic curve (ROC) calculation was 174performed using the R library pROC.
  
  Which versions of R, e1071, and pROC were used, and full references to the software packages should be provided.
Visit annotations in context

Annotators

rmflight

URL

peerj.com/preprints/27292.pdf
Sep 2018
www.biorxiv.org www.biorxiv.org

Overcoming systematic errors caused by log-transformation of normalized single-cell RNA sequencing data

3
1. rmflight 14 Sep 2018
  
  in Public
  
  mean of the log-counts is not generally the sameas the log-mean count [1
  
  Is this something different than the fact that taking the mean one way is the geometric mean? Will have to look at the reference.
2. rmflight 14 Sep 2018
  
  in Public
  
  suboptimal variance stabilization and arbitrariness in the choice of pseudo-count.
  
  Would like to see a reference for this statement.
  
  reference
3. rmflight 14 Sep 2018
  
  in Public
  
  non-zero differences in expression and artificial populationstructure in simulations.
  
  huh, didn't know that.
Visit annotations in context

Tags

reference

Annotators

rmflight

URL

biorxiv.org/content/biorxiv/early/2018/08/31/404962.full.pdf
Sep 2017
journals.iucr.org journals.iucr.org

CheckMyMetal: a macromolecular metal-binding validation tool

2
1. rmflight 09 Sep 2017
  
  in Public
  
  A survey shows that metal ions are modeled in ∼40% of all macromolecular structures deposited in the Protein Data Bank (PDB), yet the identification and accurate modeling of metals still pose significant challenges (Zheng et al., 2014). The development of any tools for systematic analysis based on the protein structures in the PDB should take into account that these structural data are not error-free. Failure to consider this may result in inaccurate conclusions, as happened in a recent study of zinc coordination patterns (Yao et al., 2015) that were shown to violate/ignore chemical and crystallographic knowledge (Raczynska et al., 2016).
  
  Here, Yao et al are called idiots, essentially.
2. rmflight 09 Sep 2017
  
  in Public
  
  Both structural and catalytic zinc sites are usually tetrahedral, although trigonal bipyramid cases exist, especially at catalytic sites in a less stable transition state (Yao et al., 2015).
  
  And here, in contrast, our paper cited as evidence that a particular phenomenon actually occurs.
Visit annotations in context

Annotators

rmflight

URL

journals.iucr.org/d/issues/2017/03/00/ba5266/index.html

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL