- Feb 2017
-
olcc.ccce.divched.org olcc.ccce.divched.org
-
GENSAL (GENeric Structure LAnguage
Why GENSAL and not GENSLA?
-
Note that aromaticity is not a measurable physical quantity, but a concept without a unanimous mathematical definition. As a result, different aromaticity detection algorithms often disagree with each other on whether a given molecule is aromatic or not, making it difficult to interchange information between databases that use different aromaticity detection algorithms for SMILES generation
Do any of the aromaticity detection algorithms employ Huckel's Rule (4n+2pi electron rule) for predicting aromaticity?
-
Hashing is a one-way mathematical transformation typically used to calculate a compact fixed length digital representation of a much longer string of arbitrary length.
This is very similar to how I use to protect passwords on my server before SSH keys became the standard. We used similar protocols under the MD5 standard so its very interesting to see the same thing used to make something easier to find with search engines as it was to keep something from being found when used as a security measure.
-
Another extension of SMILES is SMIRKS28,29, which is a line notation for generic reactions.
Can you provide more details of SMIRKS and SMARTS with examples? How can we generate any reaction using this extension?
-
Actually, it is very common that there are a lot of SMILES strings that represent the same structure, whether it has a ring or not, because one can start with any atom in a molecule to derive a SMILES string. Therefore, it is necessary to select a “unique SMILES” for a molecule among many possibilities. Because this is done through a process called “canonicalization”, this unique SMILES string is also called the “canonical SMILES”.
How can we do canonicalization to get unique SMILES?
-
In SMILES, atoms are represented by their atomic symbols. The second letter of two-character atomic symbols must be entered in lower case. Each non-hydrogen atom is specified independently by its atomic symbol enclosed in square brackets, [ ] (for example, [Au] or [Fe]). Square brackets may be omitted for elements in the “organic subset” (B, C, N, O, P, S, F, Cl, Br, and I) if the proper number of “implicit” hydrogen atoms is assumed. “Explicitly” attached hydrogens and formal charges are always specified inside brackets. A formal charge is represented by one of the symbols + or -. Single, double, triple, and aromatic bonds are represented by the symbols, -, =, #, and :, respectively. Single and aromatic bonds may be, and usually are, omitted. Here are some examples of SMILES strings
According to Smiles specification rules, atom with two characters are enclosed in a square bracket.why is CL, BR not included.?
-
Line notations represent structures as a linear string of characters. They are widely used in Cheminformatics because computers can more easily process linear strings of data. Examples of line notations include the Wiswesser Line-Formula Notation (WLN)1, Sybyl Line Notation (SLN)2,3 and Representation of structure diagram arranged linearly (ROSDAL)4,5. Currently, the most widely used linear notations are the Simplified Molecular-Input Line-Entry System (SMILES)6-9 and the IUPAC Chemical Identifier (InChI)10-13, which are described below.
In this context, does it mean that WLN, SLN and ROSDAL line notations are no longer in existence, since SMILES and Inchi are widely used now.
-
The Simplified Molecular-Input Line-Entry System (SMILES)6-9 is a line notation for describing chemical structures using short ASCII strings.
What is the meaning of ASCII strings?
-
Many databases such as PubChem17, ChemSpider18, ChEBI19, and NIST Chemistry Webbook20 accept InChI and InChIKey strings as queries to search for chemical structures. InChIs and InChIKeys can also be used as queries in UniChem21 to produce cross-references between chemical structure identifiers from different databases.
when all these databases, pubchem ,chemspider chEBI and NIST are used to search for the inchis or inchikey of a particular structure are they going to give the same result
-
the standard InChI
when open smiles and standard inchi were not in existence what was the previous scientist doing to avoid different result in their project when they use different software?
-
hey are widely used in Cheminformatics because computers can more easily process linear strings of data. Examples of line notations include the Wiswesser Line-Formula Notation (WLN)1, Sybyl Line Notation (SLN)2,3 and Representation of structure diagram arranged linearly (ROSDAL)4,5. Currently, the most widely used linear notations are the Simplified Molecular-Input Line-Entry System (SMILES)6-9 and the IUPAC Chemical Identifier (InChI)10-13, which are described below.
since smiles and inchi are currently used the linear notation ,does it mean that WLN, SLN and ROSDAL when used now can yield a wrong result since they are no more in existence.
-
Generic structures are commonly used in chemistry texts as well as in chemical patents in which the inventor claims a whole class of related compounds. Generic structures are more often called “Markush” structures after Dr. Eugene A. Markush, who involved in a legal case which set a precedent in the USA for generic chemical structure patent filing.
This might sound a bit dumb! Can Inchi be generate from a Markush structure? I know generic structure and Inchi are quite contradicted from each other. Markush can be quite proprietary and InCHI is open science.
-
c1ccccc1 Benzene (C6H6)
if there are different substituent group on a benzene. Will SMILE indicate its position such as Ortho, Para, Meta?
-
As a result, different aromaticity detection algorithms often disagree with each other on whether a given molecule is aromatic or not, making it difficult to interchange information between databases that use different aromaticity detection algorithms for SMILES generation.
What process or services would be needed for the databases to perform this interchange?
-
There are currently six InChI layer types, each different class of structural information: the main layer, a charge layer, a stereochemical layer, an isotopic layer, a fixed-H layer and a reconnected layer.
I understand the first four layers of InChI, but what does it mean about the last 2 a fixed-H layer and a reconnected layer?
-
SMARTS is useful for substructure searching, which finds a particular pattern (subgraph) in a molecule.
Can you give some examples of the substructures that are searched for?
-
-
olcc.ccce.divched.org olcc.ccce.divched.org
-
The first block of 14 (out of total 27) characters for anInChIKey encodes core molecular constitution, as de-scribed by formula, connectivity, hydrogen positions andcharge sublayers of the InChI main laye
Can you search the web with the first block of the InchI key and find all isomer of the compound?
-
-
olcc.ccce.divched.org olcc.ccce.divched.org
-
Linenotations are not the only way of communicating structure: also popular are file-based formats such asMDL's MOL File19(and its variant, the SD File), and Chemical Markup Language20(CML, avariant of XML
One of the big drawbacks that I often hear when using XML with databasing is that it quickly starts making very large files in terms of storage. Would the same hold true for using CML with large chemical databases? If so, what size reduction could be expected for the same size collection of chemicals stored using connection tables?
-
All InChIs currently are prefixed with “INCHI=”. Following this, a designator of “1/” or “1S/” indicates whether the InChI is non-standard or standard (i.e. with fixed standardized options in the software)
Can one compound or molecule have both standard and non-standard InCHIs?
-
. Ring aromaticity is handled in SMILES at the atomic level, not at the bond level (i.e. an atom is considered aromatic rather than a
In SMILES, why is aromaticity not considered on bond level? Given that the bonds in the ring structures counts for it's aromaticity.
-
-
olcc.ccce.divched.org olcc.ccce.divched.org
-
Main layer○Chemical formula, no prefix○atom connections, prefix ‘c’○hydrogen atoms, ‘h’●Charge layer○proton sublayer, ‘p’○Charge sublayer, ‘q’●Stereochemical layer○Double bonds and cumulenes, ‘b’○tetrahedral stereochemistry of atoms and allenes, ‘t’ or ‘m’○Stereochemistry information type, ‘s’●Isotope layer, ‘I’, ‘h’, and ‘b’, ‘t’ and ‘m’ for stereochemistry of isotopes●Fixed-h layer, ‘f’
Will a future release version of InChI include a layer to include inorganic molecules that can be standardized? I have come across things in the past saying that it really only works efficiently for organic molecules. I have the understanding that some inorganic things have been included, but not the more complex structures.
-
While the InChI representation is normally too complex for a human to decode, it is impossible for even a computer to extract the chemical structure from the InChIKey. therefore, it is important that the InChI repre-sentation is also included in any database.
In instances were chemical structures can't be determined with the provided InChI or InChIKey, are there any tips for searching with an InChI?
-
WLN:WiswesserLineNotatio
I havent encountered this line notation at all. Wonder if is there any database systems still use this as a part of history showcase? (know that it is already out of favor)
-
aromatic bonds are implied between aromatic atoms, but may be explicitly defined using the ‘:’ symbol.
When would you use the colon instead of lowercase letters for aromatic bonds?
-
another language, based on conventions in SMILeS, has also been devel-oped for rapid substructure searching, called SMiles arbitrary target Spec-ification (SMartS). Similarly, SMIrKS has also been defined as a subset of SMILeS that encodes reaction transforms. SMIrKS does not have a defini-tion, but plays on the SMILeS acronym. SMartS and SMIrKS will be consid-ered in more detail in later chapters.
SMARTS used for rapid substructure searching is noted as another language based on SMILES conventions. What is the meaning of SMIRKS and is it used in a similar form as SMARTS?
-
-
Local file Local file
-
Rings are represented by breaking one single or aromatic bond in each ring, and designating this ring-closure point with a digit immediately following the atoms connected through the broken bond.Atoms in aromatic rings are specified by lower cases letters.Therefore, cyclohexane and benzene can be represented by the following SMILES
Due to the complexity of aromatic compounds and ring structures, is it better to use Kekule structures or lower cases letters for SMILES?
-