  1. Mar 2017
    1. PubChem Substructure FingerprintV1.3 http://pubchem.ncbi.nlm.nih.govPage 4 of 21 5/1/2009 7:21:06 AM PubChem Substructure Fingerprint Description (cont.)

      In studying PubChem Substructure Fingerprint, do we need to know all the Bit position and Bit substructure for further use? (Daniel)

    1. Tanimoto coefficient6-8

      I'm trying to understand how to use the Tanimoto coefficient. I don't see any example to reference to. (Daniel)

      I did a SMILES query under the substructure and superstructure search for [CH2][CH2][OH] and i got this results. From what i read it is not what i was expecting. (Daniel)


      I have been using the beta testing database to try and search for some compounds. In PubChem homepage, it comes up fine but not here. Is it because it is still in the testing phase? (Daniel)

    1. Partition coefficient log P in −0.4 to +5.6 range

      Due to the extension because of improvement in druglikeness, for partition coefficient do we stick with the range of not greater than 5, or 5.6. Daniel

    1. MeSH (Medical Subject Headings)

      Is there a difference between the National Library of Medicine MeSH and this MeSH? I was confused when reading and found out that there are two sections where it is discussed. Under Synonym and under classification. Daniel

    1. DrugBank CAMEO Chemicals The National Institute for Occupational Safety and Health - NIOSH HSDB ILO-ICSC Human Metabolome Database OSHA Chemical Sampling Information OSHA Occupational Chemical DB NIOSH-PocketGuide

      Is this all the information needed for when we do a PUG-REST request? Daniel

    1. The E-utilities are therefore the structured interface to the Entrez system

      In other words Entrez database system relies on E-Utilities. Daniel

    1. For example, a well-annotated SWISS-PROT record for a particular protein may have fields that describe other protein or GenBank records from which it was derived.

      From this article on Entrez, is SWISS-PROT record for proteins gotten from GenBank? Daniel

    1. he table of contents on a PubChem summary page lists the categories of information that are available for the particular substance or compound. A PubChem Substance summary page is based on the data submitted by an individual depositor. A PubChem Compound summary page, on the other hand, displays data organized by NCBI automated data processing, serving as a hub of information for each unique chemical structure. PubChem Compound summary pages therefore tend to list more topics in their table of contents.

      Looking at Table of Contents to distinguish PubCherm compound from PubChem substance. Daniel

    1. PubChem is organized as three interconnected databases: Substance, Compound, and BioAssay

      Why is the focus on Substance and Compound but not BioAssay? Daniel

    1. Therefore, 3-D neighboring may offer comple-mentary views on structural similarity between molecules withsimilar biological activities.

      In researching on the 2-D and 3-D neighbors, i couldn't find which is better because they both have their advantages and disadvantages. Plus since they rely on different methods, which one is used more often? Daniel

    2. Forexample, ChEMBL [49] manually extracts bioactivity datafrom peer-reviewed papers published in journals in themedicinal chemistry and natural product domains.

      Based on this passage in Section 2.2 on Bioactivity , it will be preferable to use ChEMBL than PubChem for high quality data extraction. Daniel

    1. Therefore, databases need to document the provenance of the data and devise a way to notify users of that information.  In turn, users should always pay attention to the data provenance issue when using a database.  

      Since documenting the provenance of data is still a working progress, can we say public databases are not completely reliable? Daniel

  2. Feb 2017
    1. Peak Search

      Depending on the data, doesn't new software or programs affect how we can transfer information into mass bank for peak search?

    1. Although it grew out of the relational database model

      Does this imply that the older relational softwares are no longer in use?

    2. This is for mass spectrometry and chromatography data

      For representing and managing Digital Spectra, is ANDI only use for mass spectrometry and chromatography data?

    1. . Ring aromaticity is handled in SMILES at the atomic level, not at the bond level (i.e. an atom is considered aromatic rather than a

      In SMILES, why is aromaticity not considered on bond level? Given that the bonds in the ring structures counts for it's aromaticity.

    2. All InChIs currently are prefixed with “INCHI=”. Following this, a designator of “1/” or “1S/” indicates whether the InChI is non-standard or standard (i.e. with fixed standardized options in the software)

      In using InChI ,when exactly does it matter in adding the standard or non standardized option as needed?

    1. another language, based on conventions in SMILeS, has also been devel-oped for rapid substructure searching, called SMiles arbitrary target Spec-ification (SMartS). Similarly, SMIrKS has also been defined as a subset of SMILeS that encodes reaction transforms. SMIrKS does not have a defini-tion, but plays on the SMILeS acronym. SMartS and SMIrKS will be consid-ered in more detail in later chapters.

      SMARTS used for rapid substructure searching is noted as another language based on SMILES conventions. What is the meaning of SMIRKS and is it used in a similar form as SMARTS?

    1. Rings are represented by breaking one single or aromatic bond in each ring, and designating this ring-closure point with a digit immediately following the atoms connected through the broken bond.Atoms in aromatic rings are specified by lower cases letters.Therefore, cyclohexane and benzene can be represented by the following SMILES

      Due to the complexity of aromatic compounds and ring structures, is it better to use Kekule structures or lower cases letters for SMILES?



    1. As a starting point, this section will introduce a simplified form of connection table, which we’ll call an “SCT”. This SCT does not correspond directly to any existing file format (at least as far as we know!). Rather, it is a convenient model that we will use just for the purpose of this demonstration.

      Is an SCT table similar or the same as an MDL table?

    1. Another notation, called rep-resentation of structure diagram arranged linearly(ROSDAL),52,53was written to transfer structuresquickly in a compact form over a network to enablesearching of the Beilstein database online.54ROSDALis still supported by InfoChem, and by Elsevier (Am-sterdam, The Netherlands) in Reaxys (vide infra) andthe Beilstein CrossFire structure editor.

      I noticed that the ROSDAL notation is still supported by ELsevier. Doesn't Elsevier uses SMILES like most other information systems widely use today?