  1. Mar 2017
    1. Although each compound has up to 500 conformers (depending on the molecular size and flexibility),

      is this true? (nwume)

    2. PubChem homepage (http://pubchem.ncbi.nlm.nih.gov) PubChem Chemical Structure Search (https://pubchem.ncbi.nlm.nih.gov/search/search.cgi) PubChem Search (https://pubchem.ncbi.nlm.nih.gov/search/).

      i noticed that anything you search using these three different search interface gives you the same result.(nwume).

    3. Shape-Tanimoto (ST): quantifies steric shape similarity between two conformers. Color-Tanimoto (CT): quantifies the overlap of functional groups between two conformers, such as hydrogen bond donors and acceptors, cations, anions, rings, and hydrophobes. Combo-Tanimoto (ComboT): the sum of ST and CT scores between two conformers.  It takes into account the shape similarity (ST) and functional group similarity (CT) simultaneously. 

      whiich one is the best among these three metrics ? (nwume)

    4. The Structure Clustering tool

      please i dont understand how to make use of this structure clustering tools.(nwume)

    1. Document Version HistoryV1.3 – 2009May01 – Updated introduction to describe how to identify the PubChem Substructure Fingerprint property in a PubChem Compound record. V1.2 – 2007Aug30 – Added section on decoding PubChem fingerprints. V1.1 – 2007Aug06 – Corrected and expanded documentation of bits with SMARTS patterns used. V1.0 – 2005Dec02 - Initial release.

      from here looking at the document version history ,it was last updated on may1, 2009. is this still the normal thing till now or is there any new update.(nwume)

    1. why the “legacy” designation was introduced in PubChem

      :…. Pubchem does not allow anyone other than the data contributor to modify the provided information and that makes some of the records in PubChem persist with outdated or incorrect data. so in other to help correct this the “legacy” designation was introduced.(nwume)

    2. ChEMBL:

      how does CHEMBL create the chemical structure for each compound? (NWUME)

    3. 1.2. Primary databases vs. secondary databases

      please can someone classify the pulic databases mentioned in module 4 as primary and secondary databases.

    4. Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID  

      is it possible for someone to update substance data in PubChem and i hope once you update it ,it will not affect the SID. (NWUME)

    5. ChemIDPlus26,27 is a dictionary of over 400,000 chemical records (names, synonyms, and structures) and provides access to the structure and nomenclature files used for the identification of chemical substances in the TOXNET system and other NLM databases.  The Hazardous Substances Data Bank (HSDB)28,29 focuses on the toxicology of potentially hazardous chemicals, providing information on human exposure, industrial hygiene, emergency handling procedures, environmental fate, regulatory requirements, nanomaterials, and related areas. All HSDB data are referenced and derived from a core set of books, government documents, technical reports and selected primary journal literature. Importantly, HSDB is peer-reviewed by the Scientific Review Panel (SRP), a committee of experts in the major subject areas within the data bank's scope. The Comparative Toxicogenomics Database (CTD)30,31  contains manually curated data describing interactions of chemicals with genes/proteins and diseases.  This database provides insight into the molecular mechanisms underlying variable susceptibility for environmentally influenced diseases

      since currently 16 databases are integrated into TOXNET SYSTEM but only 3 were shown here ,how can i find the rem 13 databases. (NWUME)

    6. All information in the Substance database is submitted by individual data depositors.  However, the Compound database does contain information that are not submitted by data depositors,

      Are there sample files for submitting substances in PUBCHEM since substance database can be submitted by individual data depositors (NWUME)

    7. The Protein Data Bank (PDB) is an archive of the experimentally determined 3-D structures of large biological molecules such as proteins and nucleic acids.

      Aside protein and nucleic acid so protein Data Bank cannot be used to determine 3D structure of another biological molecule. ( NWUME)

    1. Synonyms

      please can someone give me a clear difference between a MeSH synonym and a depositor-supplied synonym (nwume)

    1. BioAssay classification tool

      the bioassay clasificational tools are available at this link https://goo.gl/3jYFXA and this link https://goo.gl/JcGyLf so you can check the tools and find out how to use them.(nwume)

    1. In practice, the identifier exchange service may be used as a quick approach to search the PubChem Compound database using multiple queries, although this type of task may be performed programmatically (for example, using PUG-REST,10 which will be discussed in Module 7).

      since pug rest query can be used as a quick approach to search the pubchem compounds. can I also use it to get the list of all pubchem IDs, if yes pls how?.(nwume)

    2. “[synonym]” index returns additional 97 compounds. 

      i noticed that when i search Aspirin[synonym] i get 97 hits but if i search Aspirin[synonyms] i get 102 hits which is the same number "aspirin" gave me. can someone explain why its like that, because i thought it will give me the same number of hits as aspirin[synonym] (.Nwume)

  2. Feb 2017
    1. instrument data and metadata

      please, can someone tell me the difference between instrument data and metadata.

    2. Although we still ‘use’ ASCII today, in reality we use something called UTF-8.  This is easier to say than how it is derived - Universal Coded Character Set + Transformation Format - 8-bit.  Unicode (see http://unicode.org) started in 1987 as an effort to create a universal character set that would encompass characters from all languages and defined 16-bits, two bytes -> 216 -> 256 x 256 = 65536 possible characters – or code points. Today, the first 65536 characters are considered the “Basic Multilingual Plane”, and in addition there are sixteen other planes for representing characters giving a total of 1,114,112 code points.  Thankfully, we don’t need to worry because if something is UTF-8 encoded it is backward compatible with the first 128 ASCII characters.

      please i want know if it is possible that all computer can make use of UTF-8 and uincode

    1. Many databases such as PubChem17, ChemSpider18, ChEBI19, and NIST Chemistry Webbook20 accept InChI and InChIKey strings as queries to search for chemical structures.  InChIs and InChIKeys can also be used as queries in UniChem21 to produce cross-references between chemical structure identifiers from different databases.

      when all these databases, pubchem ,chemspider chEBI and NIST are used to search for the inchis or inchikey of a particular structure are they going to give the same result

    2. the standard InChI

      when open smiles and standard inchi were not in existence what was the previous scientist doing to avoid different result in their project when they use different software?

    3. hey are widely used in Cheminformatics because computers can more easily process linear strings of data. Examples of line notations include the Wiswesser Line-Formula Notation (WLN)1, Sybyl Line Notation (SLN)2,3 and Representation of structure diagram arranged linearly (ROSDAL)4,5.  Currently, the most widely used linear notations are the Simplified Molecular-Input Line-Entry System (SMILES)6-9 and the IUPAC Chemical Identifier (InChI)10-13, which are described below.

      since smiles and inchi are currently used the linear notation ,does it mean that WLN, SLN and ROSDAL when used now can yield a wrong result since they are no more in existence.

    1. A connection table can represent multiple distinct compounds.


    1. MDL Molfile (extension *.mol) or structure-data file (SDF, extension *.sdf

      what is the main difference between MDL and SDF

    2. 3.3

      is adjacency matrix still in existence or its outdated because since it has up to two advantages more than the connection table why do you prefer connection table instead of it.

    3. CheMBL and SureCheMB

      please i want to understand if chEMBL and SureCHEMBL are doing the same work or do they work differently

    1. (InChI) and InChIKey

      please i want to know if there is any difference between inchi and inchi key

    2. Hydrogenatoms are not necessarily included explicitly in aconnection table: they may be implicit.

      what will happen if HYDROGEN atom is included explicitly in a connection table please i need a clear explanation.