22 Matching Annotations
  1. Mar 2017
    1. Options Identical Structures Similar Compounds, score >= 95% Similar Compounds, score >= 90% Similar Compounds, score >= 80% Identical Structures Similar Structures with  same connectivity any tautomer same stereoisomer same isotopical labels same stereochemistry and isotopes non-conflicting stereochemistry same isotopes and non-conflicting stereoisomers   threshold >=  99% 98% 97% 96% 95% 93% 90% 85% 80% 70% 60%   Substructure Superstructure Substructure Superstructure match stereochemistry  Ignore Exact Relative Nonconflicting  and  Match isotopes Match charges Match tautomers Ringsystems not embedded Single/double bonds match aromatic bonds Chain bonds match ring bonds Strip hydrogen   Molecular Formula with  exact stoichiometry allow other element   Sort results by: Shape-then-featureFeature-then-shapeShape-and-featureConformer Id Output to: PubChem 3D Alignment ViewerNCBI EntrezTable Summary Time Limit(seconds):  unlimited 30 60 90 300 600 3,600      Result Limit: 10 50 100 500 1,000 10,000 100,000 2,000,000   Filters

      i don't know what am doing wrong, i really dont understand how to work on question 1

      Esther

    1. As an alternative to 2-D similarity search, 3-D similarity search can also be performed using the “3D conformer” tab in PubChem Chemical Structure Search.  3-D similarity methods use the 3-D structures (that is, conformations) of molecules.  PubChem’s 3-D similarity method is based on the atom-centered Gaussian-shape comparison method by Grant and coworkers,9-12 implemented in the Rapid Overlay of Chemical Structures (ROCS).13,14  While the underlying mathematics of this approach is beyond the scope of this module, what this method essentially does is to find the “best” alignment of the 3-D structures of two molecules, which gives the maximized overlap between them.  The 3-D similarity method quantifies the 3-D molecular similarity using three metrics

      what are the differences of 2D and 3D structures?

      Esther

    2. wo-dimensional (2-D) similarity methods

      I don't really understand this 2D similarity method.

      Esther

    1. wwPDB data centers serve as deposition, annotation, and distribution sites of the PDB archive. Each site offers tools for searching, visualizing, and analyzing PDB data. PDBe Protein Data Bank in Europe Rich information about all PDB entries, multiple search and browse facilities, advanced services including PDBePISA, PDBeFold and PDBeMotif, advanced visualisation and validation of NMR and EM structures, tools for bioinformaticians.

      How can i search for PDB entries of a protein with specific Swiss-Prot ID

      Esther

    2. Since 1971, the Protein Data Bank archive (PDB) has served as the single repository of information about the 3D structures of proteins, nucleic acids, and complex assemblies. The Worldwide PDB (wwPDB) organization manages the PDB archive and ensures that the PDB is freely and publicly available to the global community. Learn more about PDB HISTORY and FUTURE.

      Are there any copyright restrictions to use images or data on PDB website?

      Esther

    1. Rule of five" redirects here. For the rule of thumb as it applies to the C++11 programming language, see Rule of three (C++ programming). Lipinski's rule of five also known as the Pfizer's rule of five or simply the Rule of five (RO5) is a rule of thumb to evaluate druglikeness or determine if a chemical compound with a certain pharmacological or biological activity has properties that would make it a likely orally active drug in humans. The rule was formulated by Christopher A. Lipinski in 1997, based on the observation that most orally administered drugs are relatively small and moderately lipophilic molecules.[1][2] The rule describes molecular properties important for a drug's pharmacokinetics in the human body, including their absorption, distribution, metabolism, and excretion ("ADME"). However, the rule does not predict if a compound is pharmacologically active.

      what is the significance of Lipinski's rule of five in Pharmaceutical field? Esther

    1. Entrez links are cross links or associations between records in different Entrez databases, or within the same database.  These links may be applied to an entire search result list (via the “find related data” section at the right column of a DocSum page) or to an individual record (via links at the bottom of each record presented on the DocSum page).  The Entrez links provide a way to discover relevant information in other Entrez databases based on a user’s specific interests.  Equivalently, one may think of this as a way to transform an identifier list from one database to another based on a particular criterion. Note that there are limits to how many records may be used as input in a link operation.  To process a large amount of input records and/or to expect a large amount of output records associated with the input records, one should use the FLink tool (https://www.ncbi.nlm.nih.gov/Structure/flink/flink.cgi).         A complete list of the Entrez links available for the three PubChem databases can be retrieved in the XML format through these links http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pccompound (for Compound) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pcsubstance (for Substance) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pcassay (for BioAssay).  

      what is XML format, and how can it be applied? Esther

    1. PubChem organizes its data into three inter-linked databases: Substance, Compound, and BioAssay (See Table 1), which can be searched from either the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) or the web page of one of the three PubChem databases.   Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID   Individual data contributors deposit information on chemical substances to the Substance database (https://www.ncbi.nlm.nih.gov/pcsubstance).  Different data contributors may provide information on the same molecule, hence the same chemical structure may appear multiple times in the Substance database.  To provide a non-redundant view, chemical structures in the Substance database are normalized through a process called “standardization” and the unique chemical structures are identified and stored in the Compound database (https://www.ncbi.nlm.nih.gov/pccompound).  The difference between the Substance and Compound databases is explained in more detail in this blog post.

      How do we use a process called standardization to normalize substance database?

      Esther

    2. 1.2. Primary databases vs. secondary databases     Databases are often categorized into primary and secondary databases.  Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”).  In essence, these databases serve as archives that keep original data.  Therefore, they are also known as archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data.  These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature.  Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data.  For this reason, they are also known as curated databases, or knowledgebase.

      Can we classify PubChem as secondary database since it collects data from other sources?

      Esther

    3.   PubChem organizes its data into three inter-linked databases: Substance, Compound, and BioAssay (See Table 1), which can be searched from either the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) or the web page of one of the three PubChem databases.   Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID   Individual data contributors deposit information on chemical substances to the Substance database (https://www.ncbi.nlm.nih.gov/pcsubstance).  Different data contributors may provide information on the same molecule, hence the same chemical structure may appear multiple times in the Substance database.  To provide a non-redundant view, chemical structures in the Substance database are normalized through a process called “standardization” and the unique chemical structures are identified and stored in the Compound database (https://www.ncbi.nlm.nih.gov/pccompound).  The difference between the Substance and Compound databases is explained in more detail in this blog post.

      what is the difference between Upload ID, Registry ID,SID, CID and AID in PubChem? Esther

    4. 1.2. Primary databases vs. secondary databases     Databases are often categorized into primary and secondary databases.  Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”).  In essence, these databases serve as archives that keep original data.  Therefore, they are also known as archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data.  These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature.  Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data.  For this reason, they are also known as curated databases, or knowledgebase.

      Can i allow my associates or reviewers to access my on-hold assay data? Esther

    5. ChEMBL: literature-extracted biological activity information         ChEMBL (https://www.ebi.ac.uk/chembl/)8,9 is a large bioactivity database, developed and maintained by the European Bioinformatics Institute (EBI), which is part of the European Molecular Biology Laboratory (EMBL).  The core activity data in ChEMBL are “manually” extracted from the full text of peer-reviewed scientific publications in select chemistry journals, such as Journal of Medicinal Chemistry, Bioorganic Medicinal Chemistry Letters, and Journal of Natural products.  From each publication, details of the compounds tested, the assays performed and any target information for these assays are abstracted.  ChEMBL also integrates screening results and bioactivity data from other public databases (such as PubChem BioAssay) and information on approved drugs from the U.S. FDA Orange Book10 and the NLM’s DailyMed

      Is it possible for ChEMBL to work when used to search for chemical structures? Esther

    6. Public Chemical Databases         These days many public online databases provide chemical information free of charge and the databases mentioned in this module are only a few examples of them.  Note that these databases vary in size and scope.   2.1. PubChem: chemical information repository at the U.S. NIH         PubChem (https://pubchem.ncbi.nlm.nih.gov)2-4 is a public repository of information on small molecules and their biological activities, developed and maintained by the National Library of Medicine (NLM), an institute within the U.S. National Institutes of Health (NIH).  Since its launch in 2004 as a component of the NIH’s Molecular Libraries Roadmap Initiatives, it has been rapidly growing, and now serves as a key chemical information resource for researchers in many biomedical science areas, including cheminformatics, chemical biology, and medicinal chemistry.  Detailed information on PubChem can be found in these three papers:

      if i want to submit manuscript to a journal, what pubchem identifier can i use? Esther

  2. Feb 2017
    1. #YFACTOR=  9.5367E-7 … ##XYDATA= (X++(Y..Y))  4400   68068800 68092800 68145600 68100800 68140800 68232000  4394   68304000 68316800 68195200 68152000 68182400 68176000  4388   68240000 68252800 68156800 68156800 68236800 68292800  4382   68302400 68265600 68233600 68214400 68224000 68284800  4376   68353600 68334400 68219200 68230400 68315200 68276800  4370   68259200 68264000 68257600 68316800 68292800 68339200  

      i dont understand the actual meaning of this DATA table and it's format

    1. Then there is the much more sophisticated Visual Basic for Applications, (VBA), which sits behind Excel and is used to record the Macro’s run in Excel.  VBA is much more a true programming language allowing for declaration of variables, loop structures, if-then-else conditionals and user defi

      please i do not understand the practical way of using excel to check for stings and to tell if the strings is a valid InChI string or not.

    1. In SMILES, atoms are represented by their atomic symbols.  The second letter of two-character atomic symbols must be entered in lower case.  Each non-hydrogen atom is specified independently by its atomic symbol enclosed in square brackets, [ ] (for example, [Au] or [Fe]).  Square brackets may be omitted for elements in the “organic subset” (B, C, N, O, P, S, F, Cl, Br, and I) if the proper number of “implicit” hydrogen atoms is assumed.  “Explicitly” attached hydrogens and formal charges are always specified inside brackets. A formal charge is represented by one of the symbols + or -.  Single, double, triple, and aromatic bonds are represented by the symbols, -, =, #, and :, respectively.  Single and aromatic bonds may be, and usually are, omitted.  Here are some examples of SMILES strings

      According to Smiles specification rules, atom with two characters are enclosed in a square bracket.why is CL, BR not included.?

    2. Line notations represent structures as a linear string of characters.  They are widely used in Cheminformatics because computers can more easily process linear strings of data. Examples of line notations include the Wiswesser Line-Formula Notation (WLN)1, Sybyl Line Notation (SLN)2,3 and Representation of structure diagram arranged linearly (ROSDAL)4,5.  Currently, the most widely used linear notations are the Simplified Molecular-Input Line-Entry System (SMILES)6-9 and the IUPAC Chemical Identifier (InChI)10-13, which are described below.

      In this context, does it mean that WLN, SLN and ROSDAL line notations are no longer in existence, since SMILES and Inchi are widely used now.

    3. The Simplified Molecular-Input Line-Entry System (SMILES)6-9 is a line notation for describing chemical structures using short ASCII strings. 

      What is the meaning of ASCII strings?

    1. the Molfile contains the atoms and the bonding patterns between those atoms, but also includes xyz co-ordinate information so the 3D structure can be explicitly encoded and stored for subsequent use. the file format was orig-inally developed by MDL Information Systems, which through a number of acquisitions and mergers, Symyx technologies and accelrys, respectively, is now subsumed with Biovia, a subsidiary of Dassault Systems.the Molfile is split into distinct lines of information, referred to as blocks. the first three lines of any Molfile contain header information: molecule name or identifier; information regarding its generation, such as software, user, etc.; and the comments line for additional information, but in practice this is often blank. the next line always encodes the metadata regarding the connection table and must be parsed to identify the numbers of atoms and bonds, respectively. the first two digits of this line encode the numbers of atoms and bonds, respectively

      I need a clearer explanation on the use of MDL format

    2. 21Structure Representationsystems can encode and decode systematic names such that they follow the naming conventions, but this representation is not commonly used in prac-tice for computational work.Chemical structures are the explicit representation and the chemist’s lingua franca. Chemical structure drawings appear in many scientific publications,

      What is the difference between SMILES and Aromatic Smiles?

    1. NOTATIONSLine notations represent structures as a linear stringof alphanumeric symbols. Their compactness was anadvantage in the early days of cheminformatics whenstorage space was at a premium, and even nowa-days, it can be faster to enter a structure as a no-tation instead of using a chemical structure drawingprogram. Several notations20–22were proposed in the1950s and 1960s, but only one, the Wiswesser Line-Formula Notation (WLN; see Figure 1)23–28becamewidely used,29–47despite the fact that the Dyson no-tation was formally adopted by IUPAC.20,21WLNstarted to fall out of use in the early 1980s.

      What are the examples of Line notation and can i get a clearer explanation on it?

    2. To the practicing chemist, the language ofchemistry is the two-dimensional (2D) structure dia-gram and most chemical information systems featuregraphical input and output of chemical structures; themachine-held representation need not be meaningfulto the synthetic chemist. In the ideal (unique) repre-sentation there is only one ‘code’ for a given struc-ture and any one code can be interpreted to give onlyone structure. A unique representation is essential forchemical registration systems in which the novelty ofa structure is determined before it is recorded in adatabase. Some representations, for example, molec-ular formulas, are not unique; one molecular for-mula will generate more than one full structure. Somenonunique representations (e.g., molecular formulasand fragment codes) do, however, play a part in cer-tain chemical information systems, even though theydo not represent the full topology of a structure

      In this context, i don't understand when they say "Code can be interpreted to give one structure, what does it really mean?