  1. Mar 2017
    1. 178 deg C (sublimes)

      Is it feasible a data quality score could be calculated by applying an algorithm across all sources contributing verified property information? That score could then be displayed as a relative confidence indicator within aggregators like pubchem. Michael

    1. Sublimation is one way to purify the sample, because caffeine has the ability to pass directly from the solid to vapor and reverse to form a solid all without undergoing the liquid phase. Caffeine has the ability to undergo sublimation under different conditions than the impurities, and can thus be isolated

      This may be a reason PubChem reports Sublimation Point for under BP for Caffeine. Michael

    1. PubChem implements both manual and automated processes

      If a Legacy tag doesn't neccesarily mean the content is outdated, no longer relevant or erroneous, how can its application to compounds reliably be automated? How often are contributions that are perfectly valid and current marked Legacy due to their vendors not updating Pubchem? Michael

    2. If a data contributor is designated as “legacy”, all records deposited by the contributor are also designated as “legacy”. 

      Are their cost implications for vendors regarding the legacy tag for their submissions? Michael

    1. For example, to access the old version of the Compound Summary page for Aspirin:

      The old summary page and the new don't appear appreciably different. Is this link redirecting to the new compound summary format? Michael

    1. A substance is a contributed chemical substance sample description from a particular PubChem data provider

      To maintain the quality and integrity of the substance content, what is the validation process PubChem applies to providers and their substance submissions? Michael

  2. Feb 2017
    1. lthough JCAMP-DX has not formally been standardized, it is currently the de facto standard for sharing spectral data and all the major databases store their data in the format.

      How & WHY did the JCMAP format become the de facto standard even though it was not formally standardized?

    1. While there are a number of styles of database the most common currently are relational.

      Graph databases and other NoSQL information stores (non relational) seem to be gaining traction in filling gaps where traditional relational database struggle. MongoDB may be a good tool to include as part of future material for this course. {edit} just noticed you mention NoSQL later in the text...apologies!)

    2. n the context of discussing the common ‘data types’ we are going to reference those that are used in the relational database software ‘MySQL’.

      How important are the details of MySQL datatypes? Are these datatypes consistent across other relational databases?

    3. STIX fonts

      How far back in history does the STIX fonts project consider for mapping languages? For example, if one wanted to have unicodes for scientific texts from ancient egyptian era, would STIX provide a glpyh?

    1. Oneofthemostimportantapplicationsofgraphtheorytochemoinfor-maticsisthatofgraph-matchingproblems.Itisoftendesirableinchemoin-formaticstodeterminedifferingtypesofstructuralsimilaritybetweentwomolecules,oralargersetofmolecules.

      I have been looking into chematics, which looks to be graph theory (network) applied to goal of automating chemical synthesis. How ready for primetime is this technology? Do we envision its likely to be militarized/weaponized or remain in the corporate domain? What are thoughts on how these developments will impact chemists in the near term (5 to 10 years). Interesting article from last year on the topic:


    1. The addition of bonds with an order of 0 would alleviate this problem significantly, but this value is disallowed, and there is no way to work around it, which means that most inorganic, organometallic and non-Lewis conformant molecules cannot be represented in a meaningful way.

      Noted as one potential scenario where modern formats may be more suited for data exchange. The question is how can any of the modern formats reach a critical mass in usage to become a new defacto standard? Is there enough pain with existing MDL/SDF to even require broad acceprtance of new standards?

    1. Efforts have been madeto establish an agreed standard format but theyhave not been generally unsuccessful.

      I was attempting to find CML file for aspirin and convert it to different formats. I found a link on the below link on the cactus toolkit site. https://cactus.nci.nih.gov/blog/?p=68

      The number of different structure formats seems almost overwhelming!

    2. Hydrogenatoms are not necessarily included explicitly in aconnection table: they may be implicit.

      If connection tables are concerned with canonicalization, why wouldn't H atoms always be explicitly specified?