103 Matching Annotations
  1. Dec 2020
  2. Feb 2020
    1. Three process improvements to help the engineering team support the business better.

      The "How"

    2. Seven objectives that steer larger technical decisions and direction over the next 12 to 18 months. Seven is not a magic number here, but it helps to keep this list below ten.

      The "What"

    3. The engineering team’s mission and how it relates to the business’ overall mission.

      The "Why"

  3. Dec 2019
    1. we should be clear about who’s in the driver’s seat as they do

      this is a fair point.

    2. It will appear on the largest publishers’ websites

      surely the point is to make these links appear elsewhere?

    3. Now, they want to maintain that status quo for as long as possible, translate their legacy role into APCs,

      I'm not sure how this translates. GetFTR seems to be about access to non OA content mostly.

    1. Today, a group of the largest scholarly publishers is announcing a new effort to improve discovery and access, fight piracy, compete with ResearchGate, and position their platforms for an open access ecosystem. Their new “Get Full Text Research”

      in brief, unless one thinks that other players in the scholarly ecosystem, (institutions / funders) will implement this, then this is a bad idea.

  4. Oct 2019
    1. is largely to generate profit

      What is the right balance between surplus and profit?

    2. commercial enterprises in order to survive

      I feel there is an implicit context within which this vision statement exists, and that is about what are valid systems of economic support. Much needs to be made about setting up a stall in contrast to commerical/capitalist models. What the new models and the existing models will probably retain in common is that they will both be judged on perceived value.

    3. costly

      A lot of the costs - to scholars - come from having a "value-based" pricing system rather than a cost-based pricing system. There needs to be something here that shows how open infrastructure can affect change in the reward mechanisms of scholarship, otherwise changes on the cost base are going remain modest.

    4. non-profit

      ACS is not for profit!

    5. It will support global connections and consistency where it is appropriate, and local and contextual requirements where that is needed

      This seems very hard to balance.

    6. strategic

      Again, a key term here. This indicates that the goals of this project should be directed and specific.

    7. take ownership

      Sometimes this happens in a contingent, rather than a deliberate, way.

    8. fully

      This is a key term here. It supposes that we might know what this looks like, in contrast to saying, for example, "better enabled"

  5. Jul 2019
    1. For example, in our migration study, we migrated 29 API calls that access REST endpoints (distributed over seven clients) to 24 GraphQL queries, which therefore does not represent a major reduction; (2) we reveal that client-specific queries can lead to a drastic reduction in the size of JSON responses returned by API providers. On the median, in our study, JSON responses have 93.5 fields, against only 5.5 fields after migration to GraphQL, which represents a reduction of 94%. In terms of bytes, we also measure an impressive reduction: from 9.8 MB (REST) to 86 KB (GraphQL). Altogether, our findings suggest that API providers should seriously consider the adoption of GraphQL. We also see space for tool builders and researchers, with interest on providing support and improving the state-of-the-practice on GraphQL-based API development.

      Nice overview of how graphql helps reduce response load. The paper didn't talk about developer experience or cognitive overhead of using this tool.

  6. Nov 2018
    1. All these studies show the consequences of applying a narrow understanding of excellence to evaluate research regardless of the context, for example in terms of country, discipline, and language.

      this is the key point of this blog post.

    2. In terms of research subjects or topics, WoS and Scopus focus on molecular biology, traditional genetics, and industry-related consumption, whereas CABI focuses more on productivity, plant nutrition, plant characteristics, and plant protection. The foci of WoS and Scopus seem to be related to the research interests of seed companies and food industry, while the foci of CAB are more related to potential interests of local farmers and communities. In this case, research in journals indexed by WoS and Scopus seems to better cover the interests of industrial stakeholders than the interests of small, poorer farmers.

      specific example of domain bias in WoS

    1. These are the people we hold responsible for staffing the product teams.  This means sourcing, recruiting, interviewing, on-boarding, evaluating, and when necessary, replacing, the members of the teams.  If you have an HR function at your company, they are there to support the managers with these activities, but they are in no way a substitute for the manager on these responsibilities.
    2. The issue is that they are often not allowed to work as they need to.  Specifically, in so many companies, they are not truly empowered to work as they need to.
    1. The Chinese place a higher value on community good versus individual rights, so most feel that, if social credit will bring a safer, more secure, more stable society, then bring it on
    1. A traditional business intelligence department consists of three roles: ETL engineers, report developers, and DBAs. ETL engineers move the data into the data warehouse. They are obsessed with Kimball and his guide to dimensional modeling. Report Developers, on the other hand, are folks who have made a career around designing reports in a specific tool (e.g. Microstrategy, et al). They are specialists. DBAs (and a team of other tool administrators) do their best to just keep things running.

      So true.

    2. Unless you need to push the boundaries of what these technologies are capable of, you probably don’t need a highly specialized team of dedicated engineers to build solutions on top of them. If you manage to hire them, they will be bored. If they are bored, they will leave you for Google, Facebook, LinkedIn, Twitter, … – places where their expertise is actually needed. If they are not bored, chances are they are pretty mediocre. Mediocre engineers really excel at building enormously over complicated, awful-to-work-with messes they call “solutions”. Messes tend to necessitate specialization.
    1. One way to identify cycles is to build a dependency graph representing all services in the system and all RPCs exchanged among them. Begin building the graph by putting each service on a node of the graph and drawing directed edges to represent the outgoing RPCs. Once all services are placed in the graph, the existing dependency cycles can be identified using common algorithms such as finding a topological sorting via a depth-first search. If no cycles are found, that means the services' dependencies can be represented by a DAG (directed acyclic graph).
    2. Dependency cycles are most dangerous when they involve the mechanisms used to access and modify a service. The operator knows what steps to take to repair the broken service, but it's impossible to take those steps without the service.
  7. Oct 2018
    1. Great doc about assessment in the netherlands



    1. OA can provide benefits, but we need to broaden the debate and articulate principles of collegiality and professional organisation that are integral to public value.

      Agree on this point.

    2. One response has been to suggest “submission charges” in order that successful publications do not subsidise unsuccessful ones, a proposal which would further disadvantage early-career colleagues and those from less wealthy countries.

      I agree with this.

    3. The underlying constraint of print sets a limit on numbers of articles that can be published and, in consequence, there is a diversity of journal titles reflecting the diversity of disciplines and interdisciplinary areas. This diversity is served by editors and peer review processes associated with the different epistemological standards, producing an effective evaluation and validation of knowledge claims despite diversity. In OA, there is no constraint on publication, which takes the form of a stream, where each platform maximises its revenue by maximising the number of papers it publishes.

      Non print journals can also be non-OA. The argument here seems to be that constraints are good, print provides a constraint => OA is bad, but I think the argument here would be better as one between online only and print, with OA being a different aspect to the argument.

    4. Moreover, the funding will increasingly be directed toward university research strategies (more properly: revenue strategies), in which topics of research and vehicles for disseminating it are managerially directed. This replaces academic autonomy – an integral part of academic freedom – with managerial autonomy.

      I think OA is somewhat irrelevant to this argument, as this direction is kind of happening anyway.

    5. First, there is the matter of APCs. How will they be funded? OA material will have global reach, but producers outside the currently-dominant centres of academic production are unlikely to have access to publish their research.

      in the idea of a proper flip we assume the costs flow equitably in some way, and that the existing money is in the system already. There are also other models like low/no cost APC models and the OLH model.

    6. (have they learned nothing from the Cambridge Analytica scandal?).

      I think this is misdirection, given the way the CA operated really was not in conjunction with any academic projects.

    7. Notwithstanding, it is important to recognise that OA is itself being promoted in the name of commercial interests.

      There is one example given in this para, but I think that the claim here is weak, as the sentence kind of implies specific commercial entities. Who are they? How are they different in operation from scholarly societies that generate large profits?

    8. OA may have benefits from the point of view of the consumption of knowledge, but it is less clear that it has equivalent benefits from the perspective of the production of knowledge

      this is probably the "grand challenge" of OA, and a reason behind awards and competitions that demonstrate the impact of the reuse of material

    9. What of publications in which no commercial use resides, such as those in the humanities and social sciences, where there is usually no patentable research involved? And what of the legitimate business interest of existing publishers, many not-for-profit, where revenues have helped construct academic infrastructure (for example, activities associated with university presses and learned societies)? What is at issue is the relation between two academic roles, that of consumer and producer of research, with the former now coming to dominate over the latter.

      I think this is a strong point.

    1. Inputs: the investment dollars and employee time devoted to innovation, along with the number of ideas that are gener­ated internally each month or sourced from customers, suppliers, and other out­siders. Throughputs: the number and quality of ideas that enter the pipeline after initial screening, the time it takes for those ideas to move from concept to proto­type to reality, and the notional value of the innovation pipe­line. Outputs: the number of innovations that reach the market in a given period, the percentage of revenue derived from new products and services, and the margin gains that are attributable to innovation. Leadership: the percentage of executive time that gets devoted to mentor­ing innovation projects, and 360-degree survey results that reveal the extent to which execu­tives are exhibiting pro-innovation behaviors. Competence: the percentage of employees who have been trained as business innovators, the percentage of employees who have qualified as innova­tion “black belts,” and changes in the quality of ideas that are being generated across the firm. Climate: the extent to which the firm’s management processes facilitate or frustrate innovation, and the progress that is being made in remov­ing innova­tion blockages. Efficiency: changes over time in the ratio of innovation outputs to inputs. Balance: the mix of different types of innova­tion (product, service, pricing, distribution, operations, etc.); differ­ent risk cate­go­ries (incremental improvements versus speculative ventures); and differ­ent time horizons.

      Some nice metrics for innovation in enterprise.

  8. Jun 2018
    1. Messaging architectures are becoming increasingly common in medium to large scale data infrastructures (the whole of scholarly publishing could be considered to be a poorly implemented messaging platform with a pub-sub model ;). Kafka is rapidly becoming a go to technology for implementing distributed, scalable and fault tolerant messaging queues.

      Kafka in a nutshell provides a really nice, short and clear overview of the system.

  9. Oct 2017
    1. This following contains some interesting data on the state of data science.

      I found interesting:

      • Most are still working on relational data
      • top three tools come from the school of statistics, with neural networks coming fourth
      • dirty data is the biggest challenge, 24% say their work is not used by decision makers!
      • list of learning resources (obviously skewed by being a kaggle survey) is is a bit skewed. Online courses feature well
    1. How library collections budgets work By Library Loon 27 October 2017 Library as organization, Scholarly communication 3 Comments “Why can’t open-access initiatives get some of that sweet, sweet library budget money?” the Loon was asked (well, entitledly whinged at, but it comes to much the same thing). Short answer: The librarians in charge of allocating collections money have no incentive to support open access, and the librarians (supposedly) in charge of changing scholarly communication have either zero budget or strictly-earmarked budgets that do not permit this use. QED.

      This is a great article on the structure of library budgets. I think one of the most interesting reflections is that the creation of buying consortia is a response to the structure of scholarly publishing, so the two kind of fit hand in glove. Moving away from that structure is going to be very challenging.

    1. What is this again? What Google Drive should be. What Dropbox should be. What file systems can be. The way we unify our data access across companies, services, programs, and people. The way I want to live and work.

      I think that this is interesting, but idealistic. The code repo on GitHub is quite active, but how does a technology like this gain traction?

    1. People with scientific training are adopting these practices as well, either by offering services on sites such as Upwork or finding projects through their previous academic networks.
    1. I can’t go on

      but I must go on!. Is this the future we are heading towards?

    1. Organize your research. Find experts to perfect your projects.

      another full service scholarly writing service. Interesting that they integrate copy editing services.

    1. The only correlating factor has to be the effectiveness of your actions.

      I like what this article says, but this sentence nails the weakness in the article. There is no strong evidence for the assertion.

    1. DVC makes your data science projects reproducible by automatically building data dependency graph (DAG). Your code and the dependencies could be easily shared by Git, and data - through cloud storage (AWS S3, GCP) in a single DVC environment.

      software and data dependency graphs, nice!

  10. Sep 2017
    1. Through Open Humans, you can gather valuable data about yourself and find cool projects to share it with.
    1. We found that data citations appear in the references section of an article less frequently than in the main text, making it difficult to identify the reward and credit for data authors (i.e. data sharers). Consistent data citation formats could not be found. Current data citation practices do not (yet) benefit data sharers.
    2. Genetics and Heredity is the top field with almost 2.3 million records (representing public datasets, software, data studies, and data repositories).


    3. Given that researchers benefit from publicly shared data through data reuse in their own research, researchers who provide access to data should be acknowledged for their contributions, much in the same way that authors are recognized for their research publications through citation. Researchers who use shared data or other shared research products (e.g. open access software, tissue cultures) should also acknowledge the providers of these resources through formal citation. At present, data citation is not widely practiced in most disciplines and as an object of study remains largely overlooked.
    1. We’re delighted to announce that the California Digital Library has been awarded a 2-year NSF EAGER grant to support active, machine-actionable data management plans (DMPs).
    1. Wikimania in Africa - an opportunity for engaging communities of researchers, practitioners and WikimediansDaniel Mietchen

      wikipedia is the "front matter" for all of research

    2. A policy level helping hand to deal with research softwareStephan Janosch, Jürgen Fuhrmann, and Björn Brembs

      Which policy authors are looking at software right now?

    1. FORCE2017-5Metadata 2020: Advancing the Maturity ModelGinny Hendricks, Cameron Neylon, and John Chodacki

      This could be a thing that SAGE does on behalf of societies/libraries?

    2. Changing scholarly communication through a greater understanding of academic career incentivesJuan Pablo Alperin

      Juan is always worth listening to.

    1. Making Open Citations workStephanie Dawson

      Ask about Science Open

    2. Are we ready for a Scholarly Commons?Maryann Martone, Fiona Murphy, Bianca Kramer, Jeroen Boseman, Daniel O'Donnel, Ian Bruno, Chris Chapman, Bastien Greshake, Robin Champieux, and Nate Jacobs

      Really interesting. What of the tragedy of this commons?

    3. FORCE2017-11Unlocking references from the literature: The Initiative for Open CitationsDario Taraborell

      The Open Citation project is amazing

    4. FORCE2017-35(GO)FAIR--Annotations as Research Objects: Ensuring Findable, Indexable, Accessible and ReusableHeather Staines, Francesca Di Donato, Jennifer Lin, and Maryann Martone

      Really interesting sounding project

    5. FORCE2017-105Ubiquitous Open Access: Changing culture by integrating OA into user workflowsJason Priem, Heather Piwowar, and Don Sechler

      how is it going?

    1. John Choacki

      Worth checking in with.

    2. Sünje Dallmeier-Thiessen

      is she still working at CERN?

    3. Lyubomir Penev

      Will this be an expanded version of what he talks about at #futurepub11?

    4. FORCE2017-55A Science-Based Writing Across the Curriculum ProgramEric Jandciu

      I wonder what Eric is up to these days?

    5. FAIR principles in practice at the ENCODE data portal

      have they moved on from using Virtual Machines?

    6. Martin Fenner

      worth connecting with.

    7. Open Knowledge Maps - A Visual Interface to the World's Scientific KnowledgePeter Kraker

      I wonder if this is built on Mendeley data?

    8. Discover Soacial Science DataBrigitte Hausstein

      sounds interesting

    1. I guess the big take away for me is that what I have heard from our community compels me to try to shift my focus from satisfying immediate user needs by continually improving the tools at hand to making progress and supporting progress towards a discovery environment we can’t yet imagine (because most of us are not Muriel Cooper) but which provides fun, intuitive, maybe immersive opportunities for discovery.

      I am interested in this sentiment, and how I might support it through our work at SAGE.

    2. connections

      Connections is the key thing, in a world of information abundance. How do we trust those connections, understand them, and act on them?

    1. A data lake management service with an Apache licence. I am particularly interested in how well the monitoring features of this platform work.

    1. We are advancing,

      is this the only note of encouragement?

    2. If we didn't do this, and allowed the uncertainty to continue, and pushed these difficult subjects to the end of the negotiations, then we would run the risk of failure in the absence of trust between us

      I feel that trust is moving downwards at the moment.

    3. There is therefore only one year left

      this is deeply worrying

    1. Do you have questions about how best to moderate your online community? CivilServant, software created at the MIT Center for Civic Media, helps online communities do your own A/B tests of moderation practices.

      This is an interesting SaaS system for exploring how to create good moderation systems.

    1. NixOS is a Linux distribution with a unique approach to package and configuration management.

      This is another approach to systems management and software as a services. I don't really understand in detail the difference between NixOS and docker, but googling NixOS vs Docker shows that its a topic that is ripe for a bunfight.

    1. Singularity containers can be used to package entire scientific workflows, software and libraries, and even data.

      Very interesting, basically Singularity allows containers to run in HPC environments, so that code running in the container can take advantage of the HPC tools, like massive scale and message passing, while at the same time keeping the stuff in the container safer.

    1. Signposting is an approach to make the scholarly web more friendly to machines. It uses Typed Links as a means to clarify patterns that occur repeatedly in scholarly portals. For resources of any media type, these typed links are provided in HTTP Link headers. For HTML resources, they are additionally provided in HTML link elements. Throughout this site, examples use the former approach.

      A kind of light-weight linked data approach to connecting web pages?

  11. www.softwareheritage.org www.softwareheritage.org
    1. This is interesting, could it become something like the LOCCS / CLOCCS for software? I like that you can check if your own code is already in their archive.

      It's a French initiative, and was founded by https://en.wikipedia.org/wiki/French_Institute_for_Research_in_Computer_Science_and_Automation. I don't know what their long term sustainability model is going to be.

    1. Spectral Python (SPy) is a pure Python module for processing hyperspectral image data. It has functions for reading, displaying, manipulating, and classifying hyperspectral imagery. It can be used interactively from the Python command prompt or via Python scripts
    1. ScienceDirect Topic Pages

      This is the new product from Elsevier. I wonder how much of this is curated, vs machine generated.

    1. A system-wide Share extension

      This is something I would like for my iPhone as I use the share sheets a lot, but I'm not sure how widely they are used.

    1. Larry analyzes your historical and real-time data to create an entire social media strategy for you.

      The company is providing services for a large number of publishers worldwide. They basically write and send your content based tweets for you using deep learning.

  12. Jan 2016
    1. Lambda

      In a sustem where the perfect components have been created and are available for atomic composition to create a publishing workflow, then Lambda is probably a great target architecture, however if you are in the process of building an infrastructure, then you will find that you want to run integration and test servers. For this reason at eLife we are mainly running with Ec2 reserved instances at the moment.

    2. Typesetting

      Another CPU intensive aspect of the process that has not been modelled in this calculation is the process of creating zip archives, unzipping files, and FTPing zipped archives. A critical component of the publishing process is downstream delivery, to sources such as PMC and Crossref. Computationally we have found zipping and unzipping to be an area that is moderately heavy.

    3. 50 emails using AWS Simple Email Service (SES), and 100 requests to put and get all components of the submission from S3 three times

      I feel that this estimation is highly misleading. The submission and composition stage involves, in reality, a large degree of communication with authors, tied to an online version of their manuscript. So far we have found that simply hosting the manuscript in an online bucket is not sufficient to do this effectively, and so the software that is used to maintain state of the manuscript and it's artefacts has not been one that we could see at this point in time as being one that could effectively run in Lambda. In addition, if we are talking about marginal costs of production, that cost usually has to include the cost of operating these systems, and the act of communicating with the author. The vendor figures quoted, and compared against here, usually include the costs of the operator, as well as the hosting and the software, and in this way I feel that the underlying approach at comparison if flawed.

    4. image transcoding

      We have recently been dong some work in this area, and image transcoding is currently one of the bottlenecks of the process. From our own experience with the number of images that we have per article, I would rate 60CPS as very optimistic.

    5. 4 images

      I think that this number is artificially low, as it is probably mediated by print restrictions on articles. When you move to become an online only publisher these restrictions become increasingly arbitrary.

    6. provided

      I'm a bit confused by this flowchart. Whey email 12 vendors when you already have pricing of 7/18 already?

  13. Jun 2014
    1. profile


    2. Our measurements show that the interplay of fluid shear

      I am interested in fluid shear, can you provide an overview?