- Sep 2019
Note: The peer reviews in Peerage of Science are judged and scored for accuracy and fairness by other reviewers. The Weight -value indicates that, relative to the best review (Weight=1.00)
Review by Peer 4429 (Weight = 1.00)
Introduction: The manuscript evaluates the use of genomic prediction in rice to prevent the accumulation of arsenic in rice grains. This is a food safety issue. Genomic prediction could be an appealing strategy for breeding of rice varieties less prone to accumulate arsenic in grains. Genomic prediction could bridge between current strategies based on land management (genetic improvement is cumulative and permanent) and recently proposed genome editing (for which target causal mutations need to be identified first).
Merits: The study seems original in its proposal of genomic prediction for this particular problem. The authors contextualize in the Introduction the potential interest of genomic prediction against other strategies, including management and genome editing.
The manuscript is quite broad in scope, as it tackles (1) genetic variation of the traits, (2) genome-wide association study GWAS, and (3) genomic prediction.
Despite the low number of significant associations in the GWAS, some of the ones that are detected have annotation terms that could make them interesting candidates for further study.
References are appropriate for the study.
Critique: Because it covers so much ground, the manuscript is quite long and dense. I think it could be softened a little in some sections. Instead it feels a little bit rushed when it comes to genomic prediction, considering that several prediction methods and strategies are used.
While genomic prediction is contextualized against other strategies in the Introduction, some of the results are not discussed as compared with other strategies. For example, there could be a greater effort to discuss the results of GWAS in light of the identification of targets required for genome editing (building on L327-336). There should also be a greater effort in discussing the several methods used for genomic prediction and potentially how genetic architecture from GWAS may help explain the differences between methods; for instance, if genomic prediction is concluded to be the best strategy, which method of all tested is recommended?
I am not totally comfortable with the interpretation that the authors make of the comparison between phenotypic and genomic selection (L346-362). Phenotypic selection is producing 5 to 10% more genetic gain than the genomic (L344-345). This is a large difference that cannot be disregarded. The authors also claim that at equal cost of phenotyping and genotyping, genomic prediction would be preferred. While I agree with the logic that genomic data has the additional benefit that it can be applied to any trait, phenotyping of each of these potentials traits would also be needed with a certain routine to re-train the predictive equation. The authors acknowledge to some extent these points but, because overall phenotypic selection seems to be a better strategy for the specific case of arsenic tolerance and because the suitability of genomic prediction is established as dependent on genotyping costs, the title and conclusions seem a little bit misleading.
It is clear that the paper was written with the Materials and Methods after the Introduction and it was later moved to the end of the manuscript. As a consequence, abbreviations are not properly defined when first read.
Discussion: The manuscript offers a broad perspective on a topic of interest, affecting food safety, and proposes a sensible approach to mitigate it. The study is very detailed about the genetic variation of the traits and GWAS results and overall tackles all important points of discussion. However, it is slightly more vague on the genomic prediction section: several methods and strategies are tested but not described in the Methods section with enough detail and not thoroughly discussed. The authors conclude that genomic prediction would be a more suitable strategy to breed for arsenic-tolerant rice compared to other marker-assisted breeding strategies. However, it seems from the results that genomic prediction still underperforms compared to phenotypic selection and this should be put into context too. This manuscript contains some interesting research and it could be suitable for publication, but some revision is recommended as indicated.
Additional Comments for Authors
L38: Be explicit. Mitigation of what?
L59: Please define "Aus genetic group".
L96: Be explicit. Which three traits?
Also L96: The distributions in Fig 1 seem to depart from a normal distribution.
Genomic prediction results: There is an n>p problem here, considering that 100 to 300 accessions but ~20,000 markers were used. Bayes A (one of the methods highlighted as most promising) fits all the markers in every iteration; Bayes B and C fit a pre-defined proportion of markers "pi" (could the authors specify to what value that parameter "pi" was set?); etc.
Revise English. Several typos and minor grammar errors.
Review by Peer 1755 (Weight = 1.00)
Introduction: This paper presents a Bayesian model of mating in a fish, that combines behavioural data on encounters and matings with genetic parentage data. It contrasts this model with classical analyses that use only particular facets of these data.
Merits: In my opinion, this paper's most important merits are:
That the model makes conceptual sense, and is presented in a way that is fairly easy to follow.
That the authors share the model code and data. This will make the model a lot more useful for other researchers.
That the paper is well written.
Critique: Despite this, I think there are things that could be clarified or improved:
There seems to be a considerable skew in the reproduction data. This is expected, but this comes with a risk violating the assumptions of common statistical models. Does the models used adequately capture this? In particular, the correlation coefficients (Figure 1) must be largely driven by single influential data points.
Given the above skew and structure of the data and that the model results extrapolates quite a bit from what was observed, it would be nice to see more through checks and discussion about the validitiy of the model. How well the model can reproduce features of the data? The posterior predictions in Figure 4 seem to indicate that the model fits data rather poorly? But I may be mistaken, and the manuscript does not interpret these results much.
I got the JAGS model to run with only minor editing (that is, moving the data generating code to its own file). However, I can't, using the data in the script, recover the scatterplots and Pearson correlations displayed in Figure 1. I assume my analysis (see attached Sweave pdf output) is wrong somehow, suggesting a need for better documentation so that readers such as myself can understand the data. It may help to clarify what variables are what, which samples have been omitted (from what analyses and for what reasons), and store the data in tabular format in addition to the JAGS input format. It would also be a nice addition to have the code used for running the model and summarising the results -- it would save a user quite a bit of effort without much work on behalf of the authors.
The sample sizes for data on releasing of gametes are particularly small. One wonders how much information they contribute? Similarly, both observations (line 248) and modelling (line 305-307) suggest that many encounters were not observed. How does this affect conclusions? This ability to deal with incomplete data is highlighted as a feature of the model. Is there arguments or data that show that it is successful?
In the Introdution and Abstract, one of the motivations for this approach is to capture effects of interactions of the phenotypes within a pair. But then, "Unfortunately our dataset is too small to properly infer the effect of interaction" (line 428-429). First, previous the focus on this unused feature of the model seems misplaced. Second, it is not clear when a dataset is too small and how you know that (presumably by trying a model not shown?).
I think this paper would benefit from more illustration. Figures 1 and 3 are hard to read with small differently shaped symbols, line patterns, and overplotting. I would suggesting making separate plots for males and females to alleviate some of the clutter. Figure 1 b is particularly unreadable. The plots of posteriors are fine, and probably should be in the paper, but I think they should be supplemented with some descriptive graphics that give a feel for the structure of the data and the behaviour of the fish. I would even love to see some visuals of fish mating, maybe stills from the video recordings (or even a supplementary video). Of course, this may be limited by space requirements of the target journal, or nor to the author's taste. But I think you underestimate how cool some of these things are, especially if you aim for a wide audience not well versed in fish mating research.
Discussion: This is likely beyond the scope of this paper, but I feel that a lot of the questions about the model -- does it work on small datasets; does it successfully account for unobserved encounters; how does its parameters relate to the "classical" measures of sexual selection -- could better be answered with simulated than with real data. I sympathise the use of real data: a good biological example is a lot more convincing to biologists than simulations. However, I feel that there are often too many uncertainties in comparing methods on real data. Results of different methods differ, like the "classical" and the new analyses in this study. But which are right?
Additional Comments for Authors
The paper would benefit from a two sentence explanation of opportunity for selection, what it measures, and the distinction between opportunity for selection and opportunity for sexual selection.
L8-10: The opening of the abstract sets up the paper to be rather technical, jumping directly into marginal sums of matrices. I think you may want to rethink that approach if the goal is too reach, as the author message said, "a wide audience of ecologists and evolutionary biologists".
For the same reason, I'd advice against the introduction of a 3-dimensional array on line 34. Even if that is mathematically correct, it is immediately going to be summed to the a parental table. Therefore, the 3-dimensional structure doesn't really contribute much, except act as an obstacle to mathematically less savvy readers.
L48-49: "strong link" could be made more precise.
Line 123-124: "The experimental setup is the one used in the "constant environment" treatment in Gauthey et al. (2016)." What is the relationship between this work and Guthey et al 2016? Can this be made clearer?
Lines 226: "po" is not defined in this section. I think the manuscript would benefit from being checked an extra time for mathematical symbols, when they are defined, how they are referred to, and if they can be spelled out in text to help the reader.
Line 270: "Model output" is not a very informative subtitle. I'd suggest dividing the Results into one subsection on the data set, one on the "classical" analyses of sexual selection, and one on the model.
Some of the chocies about model structure (specifically, use of informative priors) is discussed in comments in the model code, but not in the Methods. They should be in the Methods too.
Review by Peer 1765 (Weight = 0.88)
Introduction: This paper aims to solve a long-standing issue in sexual selection studies in natural populations: that genetic and behavioural data tell us different things about separate stages of sexuals selection and, therefore, often focus on different processes in sexual selection. While behavioural data tend to focus on mate sampling and mate choice, genetic data provide evidence on the resulting mating/reproductive success. This paper makes an important step in trying to combine both types of data in order to analyse the complete process of sexual selection. Such a tool could substantially advance the field of sexual selection in natural populations. I was very enthousiastic about this approach, until I arrived at Figure 4, which shows that the predictions from model the authors suggest does not correlate at all with the observed data from their case study, suggesting the model is possibly very well thought through, but does not represent the data well. Without empirical evidence, I do not see any reason to put the results of the model above those of the classical methods.
Merits: The paper describes the model used in a way that is mostly very clearly understandable for non-modelers, which is important for the general use of the proposed method. Moreover they include a case-study which very nicely links the theory to experimental data.
Critique: The suggested model provides different results from more classical methods of analysing the data. The authors then go on to defend the model as a better way to analyse the data, because they find different results. However, they do not provide evidence that the results from the model fit the data better than the results from the classical analyses. In fact, Figure 4 shows that the model is actually rather bad in predicting observed encounter rates, gamete releases and offspring numbers, because there seems to be no correlation whatsoever between observed and predicted data. For example, many females that did sire large numbers offspring were not predicted to have any offspring according to the model (Fig. 4c). This is not discussed in the paper. I do commend the authors for testing their model on a case study, and combine a theorethical appraoch with an experimental one, but the difference between predicted and observed data should be discussed. The authors could compare the model predictions to the predictions from the classical analyses and see which analyses fit best with the observed data.
Terminology: Encounter rate is a term that is generally reserved for random events depending on population density and sex ratio. However, the way it is used in the case study (which is certainly the most practical for field observations) includes a certain effect of attraction. In most species, males and females do not generally end up close to a spawning ground/ nest without being attracted by some aspect of the individual or this particular nest. The authors are likely aware of this, because they test for an effect of female size on encounter-rate. The fact that they do not find such an effect does not exclude that their may have been attraction to other characteristics of the female or the nest-site. Therefore, I would suggest to use another word for encounter (for example inspection or visit) to avoid confusion between an event where individuals have likely already been attracted to each other (as used in the case study) and a random "encounter". The latter is, however, impossible to quantify in the field, because it is generally impossible to spot whether two individuals have noticed each other and I see no reason to include it in the model.
Discussion: The paper addresses a very important issue in the study of sexual selection: how to combine behavioural and genetic data to study the strength of sexual selection. As the authors rightly argue, both types of data omit important processes in sexual selection and very few studies manage to get both types of data for all (or even most) mating events. The model they suggest would make use of incomplete behavioural and genetic data to explain the underlying processess. Such a model could provide an important tool for sexual selection studies. However, the case study the authors provide suggests that the model is not very good at predicting real case scenarios. Therefore, the autors should investigate how the model could be changed to reflect their experimental data. Doing so would provide an important paper that would be very valuable to the field.
Review by Peer 1758 (Weight = 0.85)
Introduction: This manuscript offers a statistical alternative to classical sexual selection gradient analysis by using Bayesian inference that allows accounting for male and female effects simultaneously. Furthermore, the authors highlight that mating success is generally underestimated because it is based on the genetic assignment of offspring. The authors use their own data on the mating behaviour and reproductive output of brown trout to compare the results from classical selection analysis with their Bayesian model and find differences between the two.
Merits: This manuscript is relevant because it highlights limitations of classical sexual selection gradient analysis, and offers a statistical alternative to empiricist with suitable data. I have the following suggestions, which I hope will be useful in revising the authors' original contribution. Also, I welcome that the authors made their research transparent by adding their data and code. However, I want to make clear that I could not review their code because of incompatibilities with JAGS and my software.
Critique: The authors statistical alternative is motivated by two shortcomings to (a) account for the interdependence of females and males in sexually reproducing species and (b) getting a grip on the copulatory behaviour instead of inferring it from offspring data. Whilst I agree that (b) is pressing, (a) depends on the mating systems, e.g. in strictly monogamous species, male and female identity overlap and fitting both would not be informative or appropriate for the analysis of sexually selected individual phenotypic traits. Hence, the applicability of the authors' model would profit from information on its suitability for different mating systems, i.e. expand on "a variety of biological systems", l24, in the discussion. Also, the authors approach also relies on empirical data. In other words, the best model does not change that if mating success lacks behavioural observations, and it usually does, we can only make incomplete inferences. In my view, the main contribution of this manuscript is thus to serve as an important reminder of the complexities at play and the importance of comprehensive data collection, rather than a new tool for measuring sexual selection. Also, the pitfalls and shortcomings, (e.g. bias in stochasticity, what is the null model, operational sex ratio) when measuring sexual selection have been comprehensively illustrated here (Klug, Heuschele, Jennions, & Kokko, 2010) and here (Jennions, Kokko, & Klug, 2012). So, I recommend a more inclusive portrait of the matter and attuning with published jargon (e.g. Table 1 in (Klug, Heuschele, Jennions, & Kokko, 2010).
I advocate that the full results of the linear regression analyses as well as the alternative JAGS model are presented in table format in the main text. Results in the supporting information get missed easily, and plots cannot substitute full estimates.
The authors could expand more on discussing their most interesting finding, which is the discrepancy between their results using classical regression analyses and Bayesian analysis.
Discussion: This manuscript is motivated by two shortcomings of the classical sexual selection gradient analysis. I agree with the relevance of one of them (i.e. measuring mating success) and yet argue that the relevance of accounting for the additive effects of the sexes for reproductive success is highly dependent on the species mating system, which the authors should address. I also think that the authors should make clearer that their analysis still depends on empiricists collecting data on mating success. I welcome the authors approach to use their own data to compare whether body size of male and female brown trout might be sexually selected. If the authors revise the current version, their manuscript will serve as an important reminder of what to look out for when analysing potentially sexually selected traits.
References Jennions, M. D., Kokko, H., & Klug, H. (2012). The opportunity to be misled in studies of sexual selection. Journal of Evolutionary Biology. http://doi.org/10.1111/j.1420-9101.2011.02451.x
Klug, H., Heuschele, J., Jennions, M. D., & Kokko, H. (2010). The mismeasurement of sexual selection. Journal of Evolutionary Biology. http://doi.org/10.1111/j.1420-9101.2009.01921.x
Schlicht, E., & Kempenaers, B. (2013). Effects of social and extra-pair mating on sexual selection in blue tits (Cyanistes caeruleus). Evolution, 67(5), 1420-1434. http://doi.org/10.1111/evo.12073
Additional Comments for Authors l14: be clearer on "costly" or delete because costs were not measured
l27: add or consider selection gradient, see Table 1 in Klug et al 2010
l44: ambiguous "to do so". Which of the indices exactly?
l52 infertile not unfertile
l53 reference "cost of reproduction"
l64 reference costs
l65 back up the claim of "are essential to understand..."
l68 better name the "fourth definition"
l93 define "a pair", e.g. socially monogamous? This could be an opportunity to introduce the mating system you want to target
l116 in brown trout? Please add citation
l120 "a" semi-natural...
l120-123 split into two sentences to improve readability, e.g. This period represents the trout...
l124: chemically communicated?
l129: highly female biased, which might be biological meaningful or a catching bias, please explain. Plus this skew in adult sex ratio will affect the variance in mating success, i.e. "chance variation in mating success is higher when there are fewer potential mates per individual of the focal sex" (Jennions et al 2012), this affects both your statistical approaches but it nowhere mentioned
l132 how did you sex? Molecularly?
l145: one or multiple observers? also "taken" not "took"
l148 any proof? repeatability tests? references for the claim?
l149 say how you dealt with the 30% for analyses
l150 rephrase "the zone", e.g. female nesting/egg release site, etc.
l156 consider "spawning" or gamete release instead of copulating
l159 "degree day" reads misplaced, only use estimate of time after spawning
l186 consider making clearer that zero's were included
l247 depending on where you want to submit avoid fish jargon: "redd"
l249 give output of all linear regression analyses in table
l271 I suggest moving these to the main text
l278 why not report Credible Intervals instead of SDs? Also, SDs show high uncertainty in estimates, which should be addressed in the discussion
l336 rephrase "to account for..."
l335 give time unit, e.g. over the course of the experiment
l336 Comment: I disagree because sexual selection is commonly referred to as the opportunity for evolutionary change, which is the variance in relative fitness and should consider all reproductively mature adults, hence should be measured among individuals that do and do not interact/mate. Especially the latter is usually omitted, but ignoring unmated individuals in a population will automatically inflate the variance of the successful subset (see also (Schlicht & Kempenaers, 2013)).
l418-19 rephrase, unclear
Plots: General comment: It might be the pdfs but the quality of plots is low and generally offsetting the raw data a bit, e.g. jittering would help viewing individual data points
Review by Peer 1761 (Weight = 0.67)
Introduction: The authors point out how the study of mating systems only using behavioural observations or genetic data usually fails to explain accurately the breeding processes and reproductive outcomes, as well as their relationship with sexual selection features.
They propose a model that combines both behavioural and genetic data, and a phenotypic trait linked to sexual selection, using brown trout as model species.
Their model includes several breeding variables behavioural and genetic, and it very adaptable as is able to incorporate other environmental or biological variables if needed.
They show how genetic and behavioural results analyzed separately may differ. Also, how the results from their model and the classic regression analyses to analyse this data also differ, and so, they aim to explain why.
Merits: The model they have built seems flexible enough to be adapted to multiple taxa and systems.
Critique: There is no reference at all about ethics permissions to perform the described experiment. I am quite shocked about this since high numbers of individuals from a wild population were killed.
There is no mention on the conservation status of the species, the permits obtained to carry out the capture and experiment, the effect of the capture system on the ecosystem, or the explanation/justification for the use of lethal methods.
For example, I find electrofishing highly non-targeted and I wonder how was its impact on other non-target fish (and non-fish) species. I believe that assembling a team of fishermen to get the same number of adult specimens would be easy enough to arrange.
My point is not whether the methods were ethically acceptable or not (that is for the journals' ethics committees to decide) but to, at least, justify and explain their use.
Model testing: I understand that in ecology studies usually researchers don't get all behavioural or all genetic data, and that is what the models try to compensate for. However, when testing models in a biological system the ideal situation is to work in a system where almost all information can be collected (ussualy under lab conditions), build a model with all that information, and then subsample the data (as to simulate a real ecological study) to test the model performance.
In this study, however, the initial sampling for the data is quite small, specially for behavioural observations (30min/day). Then, the results from the model are quite different from the results obtained from more classic approaches. The authors offer some hypotheses to explain these differences, but they can't be really tested to see whether the authors' model results are better in explaining the system or not.
All that said, I have to admit that I lack the mathematical background to fully understand and evaluate the model design and performance, and a more qualified researcher should do that.
Discussion: Although the experimental approach to test the validity of the model predictions could have been better, their attempt to combine behavioural and genetic data in mating system studies and relate it to sexual selection is an important step forward in the behavioural field.
Hopefully, more efforts like this will be made to reconcile both aspects of the study of mating systems that rapidly changed from behavioural observations only to genetic analyses only.
Review by Peer 1773 (Weight = 0.51)
Introduction: In accordance with traditional approach to estimate the effect of sexual selection on phenotypic trait the number of mates should be regressed on a target phenotypic trait in a separate model for each sex. Such analysis ignores common investment of the sexes into mating success. The authors propose a new approach, which allow combining behavioral and genetic data, thereby enabling to gather information through the successive processes of encounter, gamete release and offspring production.
Merits: The new approach accounted for the three-dimensional structure of the data: males, females and mating occasions. This allowed a qualified definition of mating success and disentangling the joint effects of male and female phenotypes on the different components of reproductive success. Three important features that lack in the traditional approach characterize the authors' model:
1) conditioning of each process (encounter, gamete release and offspring production) on the preceding one,
2) simultaneous estimation of the effect of male and female phenotype,
3) random individual effects.
The authors tested their model on a brown trout and obtained quite different results for the two approaches.
The model can be used for a variety of biological systems where behavioral and genetic data are available.
Critique: The model should be tested on a larger sample.
The title of the manuscript is not very successful.
There is a couple of misprints: p. 7 l. 139 and p. 8 l. 159.
Discussion: This is very important when new algorythms allow to obtain more information from the same set of data. Hopefully, it would be of great importance if the model can be developed to account for real behavior traits in species presenting complex courtship behavior like Drosophila for instance.