31 Matching Annotations
  1. Apr 2017
    1. Getting a list of CIDs for compounds with a given substructure

      Could SMILES be used here instead of the CIDs? Emily

    2. The input identifiers can also be specified by SMILES or InChI strings, although special care needs to be taken because these identifiers contain special characters (such as “/”) that cause conflicts with the URL syntax.4 

      Why use these identifiers if they can cause conflicts? Emily

    3. Getting a list of CIDs for compounds identical to a query compound

      This only shows the structures with identical things to the CID provided, how would one only find those that are similar? Emily


      What are the difference between these two, and in what situations where would one use PUG-SOAP instead of PUG-REST? Emily

    5. In PUG-REST, these three pieces of information are encoded into an URL in the following format:

      Is there a way to make sure the inquires work, other than just trying and not getting anything? Emily

    1. PubChem’s PUG (Power User Gateway), documented elsewhere, is an XML-based interface suitable for low-level programmatic access to PubChem services, wherein data is exchanged through a relatively complex XML schema that is powerful but requires some expertise to use. PUG SOAP contains much of the same functionality, but broken down into simpler functions defined in a WSDL (http://www.w3.org/TR/wsdl), using the SOAP protocol (http://www.w3.org/TR/soap) for information exchange. This WSDL/SOAP layer is most suitable for SOAP-aware GUI workflow applications (Taverna, Pipeline Pilot) and programming languages (C#/.NET, Perl, Python, Java, etc.). See the Tips & Tricks section at the end of this document for more information on specific clients.

      Does this mean that PUG-SOAP is harder to use if the person has no programming background, and is more meant for professonals that do? Emily

  2. Mar 2017
    1. heat map-style layout,

      Is there an article that explains how to read the data shown below? Emily

    2. the Structure Clustering tool and Structure-Activity Relationship (SAR) Analysis tool.

      Where would these tool be used the most? Meaning in what kind of research would these tools be the most helpful? Emily

    3. 1.3. Substructure and superstructure search

      Could this search be used to find reactions to build superstructures from substructures? Emily

    1. Lead-like

      This is also called Congreve’s rule of 3. For those having trouble finding it, also see the link at top of page for rule of three. Emily

    2. Components of the rule

      This gives the actual guide lines however use this webpage for a easier way to read them. https://goo.gl/Wjl7Sm Emily

    1. In analogy to the rule of five, it has been proposed that ideal fragments should follow the 'rule of three' (molecular weight < 300, ClogP < 3, the number of hydrogen bond donors and acceptors each should be < 3 and the number of rotatable bonds should be < 3).[1] Since the fragments have relatively low affinity for their targets, they must have high water solubility so that they can be screened at higher concentrations.

      This is a great description of what Congreve’s rule of 3 actually consists of. Emily

    1. Entrez Nodes Are Intended for Linking

      Can this also be used to link chemical to reactions from research projects that use them? Emily

    1. The term “data provenance” refers to a record trail that describes the origin or source of a piece of data and the process by which it entered in a database.1 

      Is this the same as a substance record on PubChem? Emily

    2. (d) Explain the reason why the “legacy” designation was introduced in PubChem in two or three sentences.

      The best explanation for this in 2.4 in the article at the bottom of the page. Emily

    3. Some records in PubChem are “non-live”, meaning that they are “not searchable”, although they do exist in the database.  This exercise is designed to help students better understand what non-live records are.

      Here is a great explanation on what "non-live" records are in PubChem. Emily

    1. PubChem standardization process in which unique chemical structures are extracted from the Substance database and stored in the Compound database.

      This is a very great idea that needs to be used more in other databases, they just give the data and not actualy try to make sure it is true. Emily

    1. Lipinski’sruleof5[50].Among them, 10.3 million (12% of the total) are fragment-like ones, which satisfy Congreve’sruleof3[51].

      What are the Lipinski and Congreve rules? Emily

    1. The PubChem Substance database contains substance information (often including the chemical structure) that was submitted to NCBI by individual submitters (depositors). The PubChem Compound records comprise a non-redundant set of standardized and validated chemical structures. A PubChem Compound record may link to more than one PubChem Substance record if different depositors supplied the same structure. Chemical names shown in PubChem Compound records are a composite derived from all linked substances, with default ranking of names by weighted frequency of use.

      This is the best explanation, I have found, of the difference between PubChem substance and PubChem compound databases. Emily

  3. Feb 2017
    1. JCAMP-DX format

      If this format is out of date, why is it still used as often as it is?

    1. It was recognized in 2004 that there needed to be a successor to JCAMP-DX because of i) advances in technology, ii) a recognized need to represent data from all analytical techniques, and iii) issues with variants of JCAMP-DX that made interoperability of the files difficult.  AnIML files consist of up to four data sections; SampleSet, ExperimentStepSet, AuditTrailEntrySet, and SignatureSet.  By design very little data/metadata is required so that legacy data, which may not have much or any metadata to describe it, can be stored in the AnIML format. An example ‘minimum’ AnIML file is shown below:

      What are the major differences between JCAMP-DX and its successor AnIML, and how have the problems with JCAMP-DX been corrected with AnIML?

    2. widgets

      Why haven’t the major chemical database created widgets, the ones that can be downloaded and used as plug-ins on search engines like google chrome?

    1. There are currently six InChI layer types, each different class of structural information: the main layer, a charge layer, a stereochemical layer, an isotopic layer, a fixed-H layer and a reconnected layer. 

      I understand the first four layers of InChI, but what does it mean about the last 2 a fixed-H layer and a reconnected layer?

    2. SMARTS is useful for substructure searching, which finds a particular pattern (subgraph) in a molecule.

      Can you give some examples of the substructures that are searched for?

    1. aromatic bonds are implied between aromatic atoms, but may be explicitly defined using the ‘:’ symbol.

      When would you use the colon instead of lowercase letters for aromatic bonds?

    1. To make things even more complicated, software may account for the chirality of a stereocenter atom when generating a MOL file but ignore it when rendering a MOL file!

      Is there a way to catch this if it happens?

    2. Each of the two Kekulé structures for the benzene ring shows up as a different set of single and double bonds (MOL I, MOL IV). The Bond Tables are different: The MOL file format uses the number 4 to indicate bonds that are explicitly labeled as aromatic (MOL V). This has the advantage of differentiating aromatic bonds from single and double bonds without requiring the chemist to write a script to identify and label the alternating single and double bonds of a Kekulé structure.

      When would you use the aromatic structure instead of the Kekulé structures?

    3. The MOL file format uses the number 4 to indicate bonds that are explicitly labeled as aromatic (MOL V). This has the advantage of differentiating aromatic bonds from single and double bonds without requiring the chemist to write a script to identify and label the alternating single and double bonds of a Kekulé structure. However, some software may not be built to handle this convention. (You might even run into cases in which it’s interpreted as a quadruple bond!)

      The Kekulé A and Kekulé B connection tables for the bond types is very confusing, when I tried to switch from one to the other. The introduction to the third type using just the number four is great.

    1. Of course, chemical structuresand topological graphs are not entirely equivalent: aconnection table is akin to a description of a singlevalence bond structure and does not take account,for example, of delocalized bonds.Alternative approaches have been suggested.

      Why hasn’t there been a bigger deal made about the fact that connection tables do not take in account delocalized bonds?

    1. n InChI Keycan also be generated for a compound. This is completely separate from the InChI linear notation, and is used to provide an identifier for a compound that is particularly suitable for use in Web search engines. It is an ASCII character string based on a hashing of the InChI linear notation, but is of fixed length and uses only characters not normally conside

      I was wondering when you would use the InChI Key instead of regular InChI?