34 Matching Annotations
  1. Aug 2016
    1. Orphans make sense

      It makes sense to me on a technical level but isn't something I would provide in an interface aimed at the general public. My inclination would be to have the default interface not show orphan annotations at all. The new york times front page is a great example. If things started to work, you would rapidly have far more orphans than anything else. Without the context of the annotation its hard to imagine these would provide much value. Perhaps this is something that could be added to a different, non-default view that provided some sort of historical perspective on the URL under consideration.

      This argues for capturing a bit more of the context around any given annotation for preservation as part of the annotation. e.g. 150 characters up and downstream.

    2. I don't have the orphans tab enabled in my account yet.

  2. Apr 2016
    1. Discovering Biomedical Semantic Relations in PubMed Queries for Database Curation and Literature Retrieval

      PubMed search engine logs are a gold mine of data for mining relations. Could it be made public ?

    2. Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes.

      This is going to be a good talk. Get your coffee, open your eyes, and open your mind! A pattern that could actually scale up - worth a try! Disagree? reply here.

    3. We demonstrate that annotation removal or reassignment is rarely linked to incorrect annotation by the curator, but are necessitated by both changes in the underlying data and improved published scientific knowledge

      This might be relevant to Toby regarding his GO analysis of aging processes. Also relevant to CAMDA assessments

    4. TextpressoCentral

      Could this be used as a front end to adding content to wikidata ?

    5. Described DARPA-funded NLP research. 'Big Mechanism'. Crowd annoyed that they used untrained humans in the study, thus setting up the machines to look better.

    6. glycans

      are the entities in this database

    1. Below you will find a pretty thorough listing of helpful links for understanding wikidata. Please extend, its openly editable.

    1. The leaderboard thingy here doesn't seem to be updating anymore.

    1. Contest is a fun idea, but would work better if you gave the audience some more specific objective. e.g. build a collection of all Zika related papers. Or identify and tag all of the papers that cite your database. Just saying 'go annotate the web' is too vague.

    1. is moving towards a more expressive way of describing the function of gene products

      Does this mean that the annotations that are currently in the GO database will be converted to use the new, more thorough structure? What will the process be for doing that? Can any aspect of it be automated?

  3. Jul 2015
    1. hniques. In creating PolySearch2's list of system filter words, we tagged the occurrence of all b

      notes

    2. convert PolySearch2 from a simple association discovery tool to a more general knowledge extraction tool.

      http://knowledge.bio provides a prototype of an application that brings these two features together.

    3. users may indicate certain associations to be false positives and in subsequent runs PolySearch2 should ideally learn from these negative examples and adapt itself to match a user's specific search needs and thereby achieve higher accuracy

      Yup. We would like to be doing this as well..

    4. The improvement in association accuracy can be attributed to the tightness measure we introduced to further discriminate matched association patterns, the assignment of weight boosting to database records in contrast to text articles and the imposition of more stringent cut-offs to boost precision at the expense of recall (precision-recall trade-off).

      Would be great to see these claims quantified.

    5. PolySearch2 achieved a 3–12% improvement in its association accuracy.

      Why? How much of this is due to adding more databases and how much is due to better text mining or just more text?

    6. PolySearch2's f-measures in these tasks were 88.95, 89.75, 93.79 and 90.74,

      What Z-score or relevancy threshold was used? Why not show an ROC where this is varied? Can they get to perfect recall again?

    7. All of these thesauri may be searched via PolySearch2's Thesaurus page, and all may be downloaded via PolySearch2's Download page

      This could be very useful. http://polysearch.cs.ualberta.ca/downloads

    8. PolySearch2, we have significantly expanded the number of thesauri from 9 to 20 categories, and from just 57 706 terms to over 1.13 million term entries with more than 2.84 million synonyms.

      One would expect an increase in recall and decrease in precision from this step. Would also like to know how they did it.

    9. Figure 1

      Basically useless unless highly zoomed in...

    10. raw relevancy score (see (7))

      If some one had time to look this up and explain it, that would be nice..

    11. empirically determined ‘weight boost’

      okay, how was the weighting determined?

    12. improves the scoring of co-occurrences and enhances PolySearch2's ability to distinguish genuine associations from incidental co-occurrences that arise by chance

      Okay, what is the evidence?

    13. key limitation with PolySearch has been the long search times (2–3 min), its limited synonym set (thesauri) and its relatively small number of searchable databases

      These are the reasons its being updated.

    14. disease–gene discovery (13,14), protein–protein interaction studies (15,16), microarray data analysis (17), metabolome annotation (9,11,12,18), biomarker discovery (19), as well as in building and assessing other biomedical text-mining tools (20,21).

      Applications of concept association mining.

    15. free-text collections

      but I only want to do public highlighting...

    16. Given X, find all associated Ys

      purpose

    1. Our results showed that the f-measure for PolySearch alone was 70.2, the f-measure for PolySearch with its GAD and OMIM options turned on was 78.5, the f-measure for EBIMed was 66.0, the f-measure for LitMiner was 5.8 and for GAD it was 27.5

      8.3% improvement from adding the database lookup - lower than expected.

    2. A text mining tool is only useful if it gives accurate results and extensive coverage in less time than what could be performed using alternative (i.e. non-computational) or competing computational methods.

      good to remember..