- Jul 2018
-
europepmc.org europepmc.org
-
On 2017 Apr 28, Hilda Bastian commented:
This is an interesting methodological approach to a thorny issue. But the abstract and coverage (such as in Nature glosses over the fact that the results measure the study method's biases more than they measure scientists on Twitter. I think the method is inferring people who are a subset of people working in limited science-based professions.
The list of professions sought is severely biased. It includes 161 professional categories and their plural forms, in English only. It was based on a U.S. list of occupations (SOC) and an ad hoc Wikipedia list. A brief assessment of the 161 titles in comparison with an authoritative international list shows a strong skew towards social scientists and practitioners of some science-based occupations, and away from meical science, engineering, and more (United Nations Educational, Scientific and Cultural Organization (UNESCO)'s nomenclature for fields of science and technology, SKOS).
Of the 161 titles, 17% are varieties of psychologist, for example, but psychiatry isn't there. Genealogists and linguists are there, but geometers, biometricians, and surgeons are not. The U.S. English language bias is a major problem for a global assessment of a platform where people communicating with the general public.
Influence is measured in 3 ways, but I couldn't find a detailed explanation of the calculations or a reference to one, in the paper. It would be great if the authors could point to that here. More detail on the "Who is who" service used in terms of how up-to-date it is would be useful as well.
I have written more about this paper at PLOS Blogs, and point to key numbers that aren't reported, for who was excluded at different stages. The paper says that data sharing is limited by Twitter's terms of service, but it doesn't specify what that covers. Providing a full list of proportions in the 161 titles, and descriptions of more than 15 of the communities they found (none of which appear to be medical science circles), seem unlikely to be affected by that restriction. More data would be helpful to anyone trying to make sense of these results, or extend the work in ways that minimize the biases in this first study.
There is no research cited that establishes the representativeness of data from a method that can only classify less than 2% of people who are on multiple lists. The original application of the method (Sharma, 2011) was aimed at a very different purpose, so representativeness was not such a big issue there. There was no reference in this article to data on list-creating behavior. There could be a reason historians came out on top in this group: list-curating is probably not a randomly-distributed proclivity.
It might be possible with this method to better identify Twitter users who work in STEM fields. Aiming for "scientists", though, remains, it seems to me, unfeasible at scale. Methods described by the authors as product-centric (e.g. who is sharing links to scientific articles and/or discussing them, or discussing blogs where those articles are cited), and key nodes such as science journals and organizations seem essential.
I would also be interested to know the authors' rationale for trying to exclude pseudonyms - as well as the data on how many were excluded. I can see why methods gathering citations for Twitter users exclude pseudonyms, but am not sure why else they should be excluded. A key reason for undertaking this kind of analysis is to understand to what extent Twitter expands the impact of scientific knowledge and research. That inherently means looking to wider groups, and the audiences for their conversations. Thank you to the authors, though, for a very interesting contribution to this complex issue.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2017 Apr 28, Hilda Bastian commented:
This is an interesting methodological approach to a thorny issue. But the abstract and coverage (such as in Nature glosses over the fact that the results measure the study method's biases more than they measure scientists on Twitter. I think the method is inferring people who are a subset of people working in limited science-based professions.
The list of professions sought is severely biased. It includes 161 professional categories and their plural forms, in English only. It was based on a U.S. list of occupations (SOC) and an ad hoc Wikipedia list. A brief assessment of the 161 titles in comparison with an authoritative international list shows a strong skew towards social scientists and practitioners of some science-based occupations, and away from meical science, engineering, and more (United Nations Educational, Scientific and Cultural Organization (UNESCO)'s nomenclature for fields of science and technology, SKOS).
Of the 161 titles, 17% are varieties of psychologist, for example, but psychiatry isn't there. Genealogists and linguists are there, but geometers, biometricians, and surgeons are not. The U.S. English language bias is a major problem for a global assessment of a platform where people communicating with the general public.
Influence is measured in 3 ways, but I couldn't find a detailed explanation of the calculations or a reference to one, in the paper. It would be great if the authors could point to that here. More detail on the "Who is who" service used in terms of how up-to-date it is would be useful as well.
I have written more about this paper at PLOS Blogs, and point to key numbers that aren't reported, for who was excluded at different stages. The paper says that data sharing is limited by Twitter's terms of service, but it doesn't specify what that covers. Providing a full list of proportions in the 161 titles, and descriptions of more than 15 of the communities they found (none of which appear to be medical science circles), seem unlikely to be affected by that restriction. More data would be helpful to anyone trying to make sense of these results, or extend the work in ways that minimize the biases in this first study.
There is no research cited that establishes the representativeness of data from a method that can only classify less than 2% of people who are on multiple lists. The original application of the method (Sharma, 2011) was aimed at a very different purpose, so representativeness was not such a big issue there. There was no reference in this article to data on list-creating behavior. There could be a reason historians came out on top in this group: list-curating is probably not a randomly-distributed proclivity.
It might be possible with this method to better identify Twitter users who work in STEM fields. Aiming for "scientists", though, remains, it seems to me, unfeasible at scale. Methods described by the authors as product-centric (e.g. who is sharing links to scientific articles and/or discussing them, or discussing blogs where those articles are cited), and key nodes such as science journals and organizations seem essential.
I would also be interested to know the authors' rationale for trying to exclude pseudonyms - as well as the data on how many were excluded. I can see why methods gathering citations for Twitter users exclude pseudonyms, but am not sure why else they should be excluded. A key reason for undertaking this kind of analysis is to understand to what extent Twitter expands the impact of scientific knowledge and research. That inherently means looking to wider groups, and the audiences for their conversations. Thank you to the authors, though, for a very interesting contribution to this complex issue.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-