Reviewer #2 (Public Review)
I have separated my issues with the manuscript into three sub-headings (Conceptual Clarity, Observational Detail and Analysis) below.
1) Conceptual clarity
There are a number of areas where it would greatly benefit the manuscript if the authors were to revisit the text and be more specific in their intentions. At present, the research questions are not always well-defined, making it difficult to determine what the data is intended to communicate. I am confident all of these issues could be fixed with relatively minor changes to the manuscript.
For example, Line 104: Question 1 is not really a question, the authors only state that they will "investigate innovation and extraction of eating the food", which could mean almost anything.
Question 2a (line 98) is also very vague in it's wording, and I'm left unclear as to what the authors were really interested in or why. This is not helped by Line 104 which refuses to make predictions about this research question because it is "exploratory". Empirical predictions are not simply placing a bet on what we think the results of the study will be, but rather laying out how the results could be for the benefit of the reader. For instance, if testing the effects of 10 different teaching methods on language acquisition-rate: Even if we have no a priori idea of which method will be most effective, we can nevertheless generate competing hypotheses and describe their corresponding predictions. This is a helpful way to justify and set expectations for the specific parameters that will be examined by the methods of the study. In fact, in the current paper, the authors in fact had some very clear a priori expectations going into this study that immigrant males would be vectors of behavioural transmission (clear that is from the rest of the introduction, and the parameters used in their analysis, which were not chosen at random).
The multiple references to 'long-lived' species in the abstract (line 16 and introduction (39, 56) is a bit confusing given the focus of this study. Although such categorisations are arbitrary by nature (a vervet is certainly long-lived compared to a dragonfly), I would not typically put vervet monkeys (or marmosets, line 62) in the same category as apes (references 8 and 9) or humans (line 62) in this regard. This contributes a little towards the lack of overall conceptual focus for the manuscript: beginning in this fashion suggests the authors are building a "comparative evolutionary origins" story, hinting perhaps at the phylogenetic relevance of the work to understanding human behaviour, but the final paragraph of the study contextualises the findings only in terms of their relevance to feeding ecology and conservation efforts. I would recommend that the authors think carefully about their intended audience and tailor the text accordingly. This is not to say that readers interested in human evolution will not be interested in conservation efforts, but rather that each of these aspects should be represented in each stage of the manuscript (otherwise - conservationists may not read far into the Introduction, and cultural evolution fans will be left adrift in the Conclusion).
2) Observational detail
There are a number of areas of the manuscript which I found to be lacking in sufficient detail to accurately determine what occurred in these experimental sessions, making the data difficult to interpret overall. All of this additional information ought to be readily available from the methods used (the experiments were observed by 3-5 researchers with video cameras (line 341)) and is all of direct relevance to the research questions set out by the authors.
While I appreciate that it will take quite a bit of work to extract this information, I am certain that it would greatly improve the robustness and explanatory power of this study to do so.
The data on who was first to innovate/demonstrate successful extraction of the food in each group (Question 1) and subsequent uptake (Question 2), as well as the actual mechanism by which that uptake occurred (the authors strongly imply social learning in their Discussion, but this is never directly examined) is difficult to interpret based on the information presented. Some key gaps in the story were:
- Which/how many individuals encountered the food and in what order? I.e., were migrants/innovators simply the first to notice the food?<br />
- Did any individuals try and fail to extract the food before an "innovator" successfully demonstrated?<br />
- How many tried and failed to extract the nuts before and after observing effective demonstrators?<br />
- Were individuals who observed others interact with the food more likely to approach and/or extract it themselves?<br />
- Did group-members use the same methods of extraction as their 'innovators'?<br />
- How many tried and succeeded without having directly observed another individual do so (i.e. 'reinvention' as per Tennie et al.)?
The connective tissue between the research questions set out by the authors is clearly social learning. In short: the thesis is that Migrants/Innovators bring a novel behaviour to the group, then there is 'uptake' (social learning), which may be influenced by demographic factors and muzzle-contact (biases + mechanisms). Given this focus (e.g. lines 224-264 of the Discussion), I would expect at least some of the details above to be addressed in order to provide robust support for these claims.
Question 2a (Lines 136-146): This data is hard to interpret without knowing how much of the group was present and visible during these exposures.
For example: 9% update in NH group does not sound impressive, but if only 10% of the total group were present while the rest were elsewhere, then this is 90% of all present individuals. Meanwhile if 100% of BD group were present and only experienced 31% uptake, then this is quite a striking difference between groups.
Of course, there is also an issue of how many individuals can physically engage with the novel food even if they want to - the presence of dominant individuals, steepness of hierarchy within that group, etc, will significantly influence this (and is all of interest with regards to the authors' research questions).
Muzzle-contact behaviour: The authors use their data to implicate muzzle-contact in social learning, but this seems a leap from the data presented (some more on this in the Analysis section).
For example:<br />
- What is the role of kinship in these events?<br />
- Did they occur when the juvenile had free access to the food (i.e. not likely to be chased off by a feeding adult)?<br />
- Did they primarily occur when adults had a mouthful of food? (i.e. could it simply be attempted pilfering/begging)<br />
- What proportion of PRESENT (not total) individuals were naïve and knowledgeable in each group for each trial (if 90% present were knowledgeable, then it is not surprising that they would be targeted more often)?<br />
- Did these events ever lead to food-sharing (In other words, how likely are they to simply be begging events)?<br />
- Did muzzle-contact quantifiably LEAD to successful extraction of the food? If the authors wish to implicate muzzle-contact in social learning, it is not sufficient to show that naïve individuals were more likely to make muzzle-contact, they must also show that naïve individuals who made more muzzle-contact were more likely to learn the target behaviour.
3) Analysis
There are a number of issues with the current analysis which I strongly recommend be addressed before publication. Some of these are likely to simply require additional details inserted to the manuscript, whereas others would require more substantial changes. I begin with two general points (A & B), before addressing specific sections of the manuscript.
A) My primary issue with each of the analyses in this manuscript is that the authors have fit complex statistical models for each of their analyses with no steps to ascertain whether these models are a good fit for the data. With a relatively small dataset and a very large number of fixed effects and interactions, there is a considerable risk of overfitting. This is likely to be especially problematic when predictor variables are likely to be intercorrelated (age, sex and rank in the case of this analysis).
The most straightforward way to resolve this issue is to take a model-comparison approach. Fitting either a) a full suite of models (including a 'null' model) with each possible permutation of fixed effects and interactions (since the authors argue their analysis is exploratory) or b) a smaller set of models which the authors find plausible based on their a priori understanding of the study system. These models could then be compared using information criterion to determine which structure provides the best out-of-sample predictive fit for the data, and the outputs of this model interpreted. Alternatively, a model-averaging approach can be taken, where the effects of each individual predictor are averaged and weighted across all models in the set. Both of these approaches can be performed easily using the r package 'MuMIn'. There are also a number of tutorials that can be found online for understanding and carrying out these approaches.
B) It does not seem that interobserver reliability testing was carried out on any of the data used in these analyses. This is a major oversight which should be addressed before publication (or indeed any re-analysis of the data).
Line 444: Much more detail is needed here. What, precisely, was the outcome measure? Was collinearity of predictors assessed? (I would expect Age + Rank to be correlated, as well as Sex + Rank).
Line 452. A few comments on this muzzle-contact analysis:
"We investigated muzzle contact behaviour in groups where large proportions of the<br />
groups started to extract and eat peanuts over the first four exposures"
What was the criteria for "a large proportion"?
The text for this muzzle-contact analysis would indicate that this model was not fit with any random effects, which would be extremely concerning. However, having checked the R code which the authors provided, I see that Individual has been fit as a random effect. This should be mentioned in the manuscript. I would also strongly recommend fitting Group (it was an RE in the previous models, oddly) and potentially exposure number as well.
Following on from this, if the model was fit with individual as a random effect it becomes confusing that Figure 3 which represents this data seemingly does not control for repeated measures (it contains many more datapoints than the study's actual sample size of 164 individuals). This needs to be corrected for this figure to be meaningfully interpretable.
Finally, would it make sense to somehow incorporate the number of individuals present for this analysis? Much like any other social or communicative behaviour, I would predict the frequency of occurrence to depend on how many opportunities (i.e. social partners) there are to engage in it.
Line 460: "For BD and LT we excluded exposures 4 and 3, respectively, due to circumstances resulting in very small proportions of these groups present at these exposures"
What was the criterion for a satisfactory proportion? Why was this chosen?
Line 461: "We ran the same model including these outlier exposures and present these results in the supplementary material (SM3)."
The results of this supplemental analysis should be briefly stated. Do they support the original analysis or not?
Line 465: "Due to very low numbers of infants ever being targets of muzzle contacts, we merged the infant and juvenile age categories for this analysis."
This strikes me as a rather large mistake. The research question being asked by the authors here is "How does age influence muzzle-contact behaviour?"<br />
Then, when one age group (infants) is very unlikely to be a target of muzzle-contact, the authors have erased this finding by merging them with another age category (juveniles). This really does not make sense, and seriously confounds any interpretation of either age category.
Lines 466-474: Why was rank removed for the second and third models? Why is Group no longer a random effect (as in the previous analysis)? The authors need to justify such steps to give the reader confidence in their approach.
Furthermore - because of the way this model is designed, I do not think it can actually be used to infer that these groups are preferentially targeted, merely that adult female and adult males are LESS likely to target others than to be targeted themselves, which is a very different assertion.
Because the specific outcome measure was not described here, this only became apparent to me after inspecting Figure 3, where outcome measure is described as "Probability of (an individual) being a target rather than initiator" - so, it can tell us that adults are more often targeted rather than initiating, but does not tell us if they are targeted more frequently than juveniles (who may get targeted very often, but initiate so often that this ratio is offset).
Lines 467-473: "Our first simple model included individuals' knowledge of the novel food at the time of each muzzle contact (knowledgeable = previously succeeded to extract and eat peanuts; naïve = never previously succeeded to extract and eat peanuts) and age, sex and rank as fixed effects. Individual was included as a random effect. The second model was the same, but we removed rank and added interactions between: knowledge and age; and knowledge and sex. The third model was the same as the second, but we also added a three-way interaction between knowledge, age and sex."
This is a good example of some of the issues I describe above. What is the justification for each of these model-structures? The addition and subtraction of variables and interactions seems arbitrary to the reader.