Hypothesis

238 Matching Annotations

Jan 2021
pubs.acs.org pubs.acs.org

Teaching Cheminformatics through a Collaborative Intercollegiate Online Chemistry Course (OLCC)

1
1. rebelford 11 Jan 2021
  
  in Public
  
  Discussions during the 2019 OLCC were entirely on the hypothes.is web annotation service, which had been integrated into LibreTexts, and was set up to send an email to students and faculty whenever an annotation was created. In future OLCCs, this notification needs to be extended to other types of social media because “email” is often not students’ preferred means for online communication, and these communications need to be across web platforms.
  
  We are essentially going to use the model that evolved out of the cheminformatics OLCC, using hypothes.is with LibreText. But I expect much more use of external links as students figure how to do code, and use code libraries as they try and set up sensors to Raspberry Pis.
  
  IOCT
Visit annotations in context

Tags

IOCT

Annotators

rebelford

URL

pubs.acs.org/doi/10.1021/acs.jchemed.0c01035
Sep 2020
pubs.acs.org pubs.acs.org

Big Data and Chemical Education

1
1. rebelford 30 Sep 2020
  
  in Public
  
  y means of an online cheminformatics learning course (OLCC)
  
  mentions OLCC
Visit annotations in context

Annotators

rebelford

URL

pubs.acs.org/doi/10.1021/acs.jchemed.5b00524
Feb 2020
gallery.mailchimp.com gallery.mailchimp.com

Untitled document

1
1. rebelford 09 Feb 2020
  
  in Public
  
  Fall 2019 Cheminformatics OLCC
  
  This is a poster worth seeing!!!
Visit annotations in context

Annotators

rebelford

URL

gallery.mailchimp.com/8d634c27f70db71b7ca955002/files/f1870264-97d4-4993-a41f-b4587aa33a71/AR_BIC_2020_Program_DIGITAL_2.4.2020.pdf
Apr 2019
olcc.ccce.divched.org olcc.ccce.divched.org

Module 6: How to Search PubChem for Chemical Information (Part 2) | DivCHED CCCE: Cheminformatics OLCC

2
1. rebelford 29 Apr 2019
  
  in Public
  
  Because both the ST and CT scores range from 0 (for no similarity) to 1 (for identical molecules), the ComboT score may have a value from 0 to 2 (without normalization to unity)
  
  the Combo T has a range of 0 to 2
2. rebelford 29 Apr 2019
  
  in Public
  
  The clustering threshold may be adjusted by clicking an appropriate position on the similarity score axis (the horizontal line above/below the dendrogram).
  
  This section and the next have dendrograms, that range from 1 (similar) to 0 (dissimilar)
Visit annotations in context

Annotators

rebelford

URL

olcc.ccce.divched.org/Spring2017OLCCModule6
Feb 2019
docs.wixstatic.com docs.wixstatic.com

118443_564e7405be4642f8ba908acc9e222a26.pdf

1
1. rebelford 23 Feb 2019
  
  in Public
  
  Symp. 8.2: Digital Chemistry and the Lab of the Future
  
  This is an obvious place for an OER talk, but it would not really be outreach, as these would be people who know about it anyways, and it would be more of a "report", which is not our goal.
  
  This may also be a place for a talk on the Cheminformatics OLCC, especially if we wish to run another this Fall.
  
  Belford2019IUPAC INCHIOER2019IUPAC
Visit annotations in context

Tags

Belford2019IUPAC

INCHIOER2019IUPAC

Annotators

rebelford

URL

docs.wixstatic.com/ugd/118443_564e7405be4642f8ba908acc9e222a26.pdf
May 2017
olcc.ccce.divched.org olcc.ccce.divched.org

Module 6: How to Search PubChem for Chemical Information (Part 2) | DivCHED CCCE: Cheminformatics OLCC

1
1. olccs16 20 May 2017
  
  in Public
  
  Two-dimensional (2-D) similarity methods
  
  In the 2D structure similarity below, I wonder how the fingerprint indicate the number of bond. For instance, like C-C bond, how do you know which one since both of them has C-C bond? - Phuc
  
  Spring2017OLCCModule6 devlon1
Visit annotations in context

Tags

devlon1

Spring2017OLCCModule6

Annotators

olccs16

URL

olcc.ccce.divched.org/Spring2017OLCCModule6
Apr 2017
olcc.ccce.divched.org olcc.ccce.divched.org

Using Tags to Associate Hypothes.is Annotations to a Specific OLCC Page | DivCHED CCCE: Cheminformatics OLCC

1
1. judell 24 Apr 2017
  
  in Public
  
  Spring2017OLCCWebTutorialsTLO5
  
  For example,this annotation carries the specified tag and should appear in the viewer on this page.
  
  Spring2017OLCCWebTutorialsTLO5
Visit annotations in context

Tags

Spring2017OLCCWebTutorialsTLO5

Annotators

judell

URL

olcc.ccce.divched.org/Spring2017OLCCWebTutorialsTLO5
ioct.tech ioct.tech

Reaxys_SubstanceSearch_270117.pdf

2
1. rebelford 18 Apr 2017
  
  in Public
  
  *fulleren*
  
  Note, you can't use stars, but need to use the "contains" option
2. rebelford 18 Apr 2017
  
  in Public
  
  EMTREE drug terms listed in the Index Terms
  
  EMTREE is like Mesh, but how do you get to the index terms, Unfortunately, h. will not link inside of Reaxys, and so I can't give a link to where I am
Visit annotations in context

Annotators

rebelford

URL

ioct.tech/dev/sites/default/files/2017-06/File_3_Reaxys_SubstanceSearch_270117.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

Template for Electronic Submission to ACS Journals

1
1. rebelford 18 Apr 2017
  
  in Public
  
  Get the full record (in XML) of CIDs that have the substructure “C3=NC1=C(C=NC2=C1C=NC=C2)[N]3”
  
  Once again, I am not clear on how to read this file, https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastsubstructure/smiles/C3=NC1=C(C=NC2=C1C=NC=C2)[N]3/record/xml
  
  Spring2017OLCCModule7
Visit annotations in context

Tags

Spring2017OLCCModule7

Annotators

rebelford

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/2017OLCCModule7AssignmentWork.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

7. Programmatic Access to Public Chemical Databases | DivCHED CCCE: Cheminformatics OLCC

13
1. rebelford 18 Apr 2017
  
  in Public
  
  identity search.
  
  For similarity search, use, https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastsimilarity_2d/cid/334/cids/txt?Threshold=95
  
  Spring2017OLCCModule7
2. rebelford 18 Apr 2017
  
  in Public
  
  Getting molecular properties of a set of compounds
  
  It seems most of the things we can get are from the advanced search (entrez?) options. Can we get other items, like boiling points?
  
  Spring2017OLCCModule7
3. rebelford 18 Apr 2017
  
  in Public
  
  https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchikey/CSCPPACGZOOCGX-UHFFFAOYSA-N/record/XML?record_type=3d
  
  How do you read this file? Can you pull a mol file from it?https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/inchikey/CSCPPACGZOOCGX-UHFFFAOYSA-N/record/XML?record_type=3d
  
  Spring2017OLCCModule7
4. apcornell 11 Apr 2017
  
  in Public
  
  One can perform partial synonym matching, by setting this option to “word”.
  
  When using the word option, does pubchem limit the number of results to a specific number like 10,000 for example. I have seen some other systems limit results unless being accessed from a verified user with a token or some key for verification that they are not a robot.
  
  Spring2017OLCCModule7
5. apcornell 10 Apr 2017
  
  in Public
  
  When a list of compounds are specified as the input, the image of only the first compound on the list will be returned.
  
  Is there a way to get images for a list of chemicals submitted or would they need to be submitted as separate individual searches? I notice a quick way to do this would be as shown below in the spreadsheet, but that looks like it is submitted as separate searches.
  
  Spring2017OLCCModule7
6. OLCCS10 09 Apr 2017
  
  in Public
  
  Getting a list of CIDs for compounds with a given substructure
  
  Could SMILES be used here instead of the CIDs? Emily
  
  Spring2017OLCCModule7
7. OLCCS10 09 Apr 2017
  
  in Public
  
  The input identifiers can also be specified by SMILES or InChI strings, although special care needs to be taken because these identifiers contain special characters (such as “/”) that cause conflicts with the URL syntax.4
  
  Why use these identifiers if they can cause conflicts? Emily
  
  Spring2017OLCCModule7
8. OLCCS10 09 Apr 2017
  
  in Public
  
  Getting a list of CIDs for compounds identical to a query compound
  
  This only shows the structures with identical things to the CID provided, how would one only find those that are similar? Emily
  
  Spring2017OLCCModule7
9. OLCCS10 09 Apr 2017
  
  in Public
  
  PUG-SOAP PUG-REST
  
  What are the difference between these two, and in what situations where would one use PUG-SOAP instead of PUG-REST? Emily
  
  Spring2017OLCCModule7
10. OLCCS10 09 Apr 2017
  
  in Public
  
  In PUG-REST, these three pieces of information are encoded into an URL in the following format:
  
  Is there a way to make sure the inquires work, other than just trying and not getting anything? Emily
  
  Spring2017OLCCModule7
11. olccs16 03 Apr 2017
  
  in Public
  
  In PUG-REST, these three pieces of information are encoded into an URL in the following format:
  
  I'm guessing these basic piece of information would be similar to the CACTUS chemical resolver URL.-Phuc
  
  Spring2017OLCCModule7
12. olccs16 03 Apr 2017
  
  in Public
  
  https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/180/record/PNG
  
  I know it through PUG-REST, but how are these generated? same for those link afterward.-Phuc
  
  Spring2017OLCCModule7
13. olccs16 03 Apr 2017
  
  in Public
  
  Entrez Utilities (also called E-Utilities or E-Utils) Power User Gateway (PUG) PUG-SOAP PUG-REST
  
  Are these suitable with any kind of programmatic platform? -Phuc
  
  Spring2017OLCCModule7
Visit annotations in context

Tags

Spring2017OLCCModule7

Annotators

apcornell

rebelford

olccs16

OLCCS10

URL

olcc.ccce.divched.org/Spring2017OLCCModule7
olcc.ccce.divched.org olcc.ccce.divched.org

Module 6: How to Search PubChem for Chemical Information (Part 2) | DivCHED CCCE: Cheminformatics OLCC

1
1. rebelford 15 Apr 2017
  
  in Public
  
  PubChem subgraph fingerprints,
  
  Why are these called a "Subgraph"? I see this from , wikipedia. Is it that each bit of the fingerprint represents a subgraph of the molecule, which in it self is the graph? That is, its existance/nonexistance of this subset within the entire?.
  
  Spring2017OLCCModule6
Visit annotations in context

Tags

Spring2017OLCCModule6

Annotators

rebelford

URL

olcc.ccce.divched.org/Spring2017OLCCModule6
Mar 2017
olcc.ccce.divched.org olcc.ccce.divched.org

Module 6: How to Search PubChem for Chemical Information (Part 2) | DivCHED CCCE: Cheminformatics OLCC

25
1. OLCCS198 28 Mar 2017
  
  in Public
  
  In 2-D similarity methods, structural similarity between two molecules is estimated by comparing their molecular fingerprints.
  
  So, will each structure on PubChem have a fingerprint and a molfile? lyndsie
  
  Spring2017OLCCModule6
2. OLCCS198 28 Mar 2017
  
  in Public
  
  targets
  
  what is a target? lyndsie
  
  Spring2017OLCCModule6
3. OLCCS198 28 Mar 2017
  
  in Public
  
  In substructure search, one provides an input substructure as a query to find molecules that contain the query substructure
  
  My question is if I were trying to find a large superstructure with more than one substructure, can I input more than one substructure to search? i.e. If I want to find compounds that contain and aldehyde and an alcohol. Lyndise
  
  Spring2017OLCCModule6
4. olcc197 28 Mar 2017
  
  in Public
  
  Although each compound has up to 500 conformers (depending on the molecular size and flexibility),
  
  is this true? (nwume)
  
  spring2017OLCCModule6
5. olcc197 28 Mar 2017
  
  in Public
  
  PubChem homepage (http://pubchem.ncbi.nlm.nih.gov) PubChem Chemical Structure Search (https://pubchem.ncbi.nlm.nih.gov/search/search.cgi) PubChem Search (https://pubchem.ncbi.nlm.nih.gov/search/).
  
  i noticed that anything you search using these three different search interface gives you the same result.(nwume).
  
  spring2017OLCCModule6
6. olcc197 28 Mar 2017
  
  in Public
  
  Shape-Tanimoto (ST): quantifies steric shape similarity between two conformers. Color-Tanimoto (CT): quantifies the overlap of functional groups between two conformers, such as hydrogen bond donors and acceptors, cations, anions, rings, and hydrophobes. Combo-Tanimoto (ComboT): the sum of ST and CT scores between two conformers. It takes into account the shape similarity (ST) and functional group similarity (CT) simultaneously.
  
  whiich one is the best among these three metrics ? (nwume)
  
  spring2017OLCCModule6
7. olcc197 28 Mar 2017
  
  in Public
  
  The Structure Clustering tool
  
  please i dont understand how to make use of this structure clustering tools.(nwume)
  
  spring2017OLCCModule6
8. rebelford 27 Mar 2017
  
  in Public
  
  without minor isotopes by combining the retrieved search results [from (a)] with the following Entrez indices:
  
  once you have done this, click on the number of hints, go to the doc sum page, go to advanced, and use the advanced query builder to perform the filters.
9. rebelford 27 Mar 2017
  
  in Public
  
  ? [Note that a compound may have multiple EC50 values determined from different experiments (likely under different experimental conditions)
  
  There are two ways to get rid of duplicates. You can simply paste the CIDs into the pubchem search, and search, I think this requires Chrome.
  
  or
  
  You can go to Data/data tools in excel, and choose the "Remove duplicates" tool
  
  Spring2017OLCCModule6
10. rebelford 27 Mar 2017
  
  in Public
  
  To perform an identity search for Cymbalta (CID 60835), go to the Chemical Structure Search pag
  
  so if many CIDs relate to specific isotopes, do the normal molar masses use ave values, like the periodic table?
  
  Spring2017OLCCModule6
11. OLCCS10 27 Mar 2017
  
  in Public
  
  heat map-style layout,
  
  Is there an article that explains how to read the data shown below? Emily
  
  Spring2017OLCCModule6
12. rebelford 27 Mar 2017
  
  in Public
  
  PubChem Search
  
  the beta search link of the homepage
13. rebelford 27 Mar 2017
  
  in Public
  
  PubChem Chemical Structure Searc
  
  Direct Link is Under Services on the homepage
14. OLCCS10 27 Mar 2017
  
  in Public
  
  the Structure Clustering tool and Structure-Activity Relationship (SAR) Analysis tool.
  
  Where would these tool be used the most? Meaning in what kind of research would these tools be the most helpful? Emily
  
  Spring2017OLCCModule6
15. OLCCS10 27 Mar 2017
  
  in Public
  
  1.3. Substructure and superstructure search
  
  Could this search be used to find reactions to build superstructures from substructures? Emily
  
  Spring2017OLCCModule6
16. OLCCS15 27 Mar 2017
  
  in Public
  
  Tanimoto coefficient6-8
  
  I'm trying to understand how to use the Tanimoto coefficient. I don't see any example to reference to. (Daniel)
  
  2017OLCCModule6
17. OLCCS17 27 Mar 2017
  
  in Public
  
  As an alternative to 2-D similarity search, 3-D similarity search can also be performed using the “3D conformer” tab in PubChem Chemical Structure Search. 3-D similarity methods use the 3-D structures (that is, conformations) of molecules. PubChem’s 3-D similarity method is based on the atom-centered Gaussian-shape comparison method by Grant and coworkers,9-12 implemented in the Rapid Overlay of Chemical Structures (ROCS).13,14 While the underlying mathematics of this approach is beyond the scope of this module, what this method essentially does is to find the “best” alignment of the 3-D structures of two molecules, which gives the maximized overlap between them. The 3-D similarity method quantifies the 3-D molecular similarity using three metrics
  
  what are the differences of 2D and 3D structures?
  
  Esther
  
  Spring2017OLCCModule6
18. OLCCS17 27 Mar 2017
  
  in Public
  
  wo-dimensional (2-D) similarity methods
  
  I don't really understand this 2D similarity method.
  
  Esther
  
  Spring2017OLCCModule6
19. axnakarmi 21 Mar 2017
  
  in Public
  
  However, molecular formula search implemented in some databases, including PubChem Chemical Structure Search, has an option to allow other elements in returned hits (e.g., C6H6O or C6H6N2O for the “C6H6” query).
  
  Why is it has option to allow other elementsin returned hits when we type C6H6 or any other related molecules? Amita
  
  2017OLCCModule6 Spring2017OLCCModule6
20. axnakarmi 21 Mar 2017
  
  in Public
  
  The most common types of molecular fingerprints are structural keys, which encode structural information of a molecule into a binary string (that is, a string of 0’s and 1’s). The position of each number in this string corresponds to a particular fragment. If the molecule has a particular fragment, the corresponding bit position is set to 1, and otherwise to 0. Note that there are many different ways to design molecular fingerprints, depending on what fragments are included in the fingerprint definition. PubChem uses its own fingerprint called PubChem subgraph fingerprints.
  
  I am confused with binary string and fingerprint. How does it work to recognize molecules? Amita
  
  2017OLCCModule6 Spring2017OLCCModule6
21. axnakarmi 21 Mar 2017
  
  in Public
  
  PubChem provides two web-based tools that allow users to perform a cluster analysis of PubChem data: the Structure Clustering tool and Structure-Activity Relationship (SAR) Analysis tool.
  
  I am confused using structure clustering tool, how can we use this? I practised to use this tool but I did not get any results. Amita
  
  2017OLCCModule6 Spring2017OLCCModule6
22. axnakarmi 21 Mar 2017
  
  in Public
  
  The Structure-Activity Relationship (SAR) Analysis tool
  
  Confusion on using this tool. Amita
  
  2017OLCCModule6 Spring2017OLCCModule6
23. axnakarmi 21 Mar 2017
  
  in Public
  
  On the contrary, superstructure search returns molecules that comprise or make up the provided chemical structure query (that is, substructures that is contained in the query superstructure). It should be noted that substructure search does not give you substructures of the query and that superstructure search does not return superstructures of the query.
  
  How do we know which one is substructure and superstructure? Amita
  
  2017OLCCModule6
24. olccs16 20 Mar 2017
  
  in Public
  
  MMFF94s
  
  In case if you curious about MMFF94s. and this might answering my previous question about certain element limitation. -Phuc
  
  Spring2017OLCCModule6
25. olccs16 20 Mar 2017
  
  in Public
  
  Consist of only supported elements (H, C, N, O, F, Si, P, S, Cl, Br, and I).
  
  So any organometallic compounds such as cis-platin don't have 3D structure? and why we have to set this limit? -Phuc
  
  Spring2017OLCCModule6
Visit annotations in context

Tags

2017OLCCModule6

spring2017OLCCModule6

Spring2017OLCCModule6

Annotators

rebelford

OLCCS15

OLCCS10

OLCCS17

olccs16

OLCCS198

axnakarmi

olcc197

URL

olcc.ccce.divched.org/Spring2017OLCCModule6
olcc.ccce.divched.org olcc.ccce.divched.org

Template for Electronic Submission to ACS Journals

8
1. rebelford 27 Mar 2017
  
  in Public
  
  EC50”
  
  Half maximal effective concentration (EC50) refers to the concentration of a drug, antibody or toxicant which induces a response halfway between the baseline and maximum after a specified exposure time
2. rebelford 27 Mar 2017
  
  in Public
  
  s. Click the “Pharmacological Actions” link under “BioMedical Annotation” to choose the c
  
  Note you get 2 hits, the search 5090 and 5472495. This tells you that there is a molecule that is 80% similar and has been tested as active.
3. rebelford 15 Mar 2017
  
  in Public
  
  n, use the smallest EC50 value (for simplicity) if there are multiple values
  
  Is there a quick way to recognize duplicates in Excel? In the case of CID5152, the compound CID9892481 was repeated 3 times
4. rebelford 15 Mar 2017
  
  in Public
  
  mentioned at the top of t
  
  444036 is now getting 5 targets
5. rebelford 15 Mar 2017
  
  in Public
  
  What is the difference between the non-common components of the compounds in (a)?
  
  Is there a quick way to compare smiles to do this?
6. rebelford 15 Mar 2017
  
  in Public
  
  Provide the query CID in the search box and run the search. Repeat the search with the “same isoto
  
  note in the query, this gives #24 and #23, which in the advanced search,
  
  23 1:1 CovalentUnitCount
  
  24 56 documents
  
  A quick scan showed only 1 covalent unit in -24, so was -23 done just because it is good practice?
  
  Also, the entrez history shows here too.
  
  Also, in the first search with 56 results, the first three did not have isotopes specified, while in the seconds search, those were the only ones to come out. Does it always give priority in the search to ones without isotopic labels?
  
  This leads to a bigger question. are the molar masses from the periodic table, or are they related to specific isotopes?
7. rebelford 15 Mar 2017
  
  in Public
  
  2.1. The Structure Clustering tool
  
  Private note - useful for GHS?
8. rebelford 15 Mar 2017
  
  in Public
  
  this URL
  
  Available through the services menu of PubChem homepage
Visit annotations in context

Annotators

rebelford

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/2017OLCCModule6.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

In Silico Medicinal Chemistry_1.pdf

1
1. rebelford 27 Mar 2017
  
  in Public
  
  1 SimilarityCoefficients
  
  Good description of similarity methods
  
  Spring2017OLCCModule6
Visit annotations in context

Tags

Spring2017OLCCModule6

Annotators

rebelford

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/In Silico Medicinal Chemistry_1.pdf
onlinelibrary.wiley.com onlinelibrary.wiley.com

A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape

1
1. apcornell 27 Mar 2017
  
  in Public
  
  The method is also used to quantify the degree of chirality of asymmetric molecules and to investigate the chirality of biphenyl and the amino acids.
  
  This article was cited in the OLCC Cheminformatics Class Module 6 for atom-centered Gaussian-shape comparison method. I will see if i can access the full article from campus tomorrow, but its very difficult for me to understand how causing a gaussian adaptation to a molecule would help to provide information about the degree of chirality on certain molecules. for example, it would be difficult to understand the shape of a miniature model sailboat inside of a glass bottle if the only shape you see is the outer bottle and not whats contained inside.
  
  Andrew
  
  2017OLCCModule6 Spring2017OLCCModule6
Visit annotations in context

Tags

2017OLCCModule6

Spring2017OLCCModule6

Annotators

apcornell

URL

onlinelibrary.wiley.com/doi/10.1002/(SICI)1096-987X(19961115)17:14<1653::AID-JCC7>3.0.CO;2-K/abstract
www.eyesopen.com www.eyesopen.com

ROCS - Shape Similarity for Virtual Screening & Lead Hopping

1
1. apcornell 27 Mar 2017
  
  in Public
  
  ROCS is a fast shape comparison application, based on the idea that molecules have similar shape if their volumes overlay well and any volume mismatch is a measure of dissimilarity. It uses a smooth Gaussian function to represent the molecular volume [5], so it is possible to routinely minimize to the best global match.
  
  I assume that the ROCS description here that talks about volume overlays is what is mentioned on the olcc module 6 page about finding the best alignmentfor a 3D structure overlay. it seems that through the gaussian blur around the chemical being matched against others that many of the individual bonds would not be compared in the matching process.
  
  Andrew
  
  2017OLCCModule6 Spring2017OLCCModule6
Visit annotations in context

Tags

2017OLCCModule6

Spring2017OLCCModule6

Annotators

apcornell

URL

eyesopen.com/rocs
olcc.ccce.divched.org olcc.ccce.divched.org

2017OLCCModule5Assignment.pdf

10
1. rebelford 16 Mar 2017
  
  in Public
  
  PDB and CSD
  
  PDB is biological compounds while CSD is material sciences oriented.
2. rebelford 16 Mar 2017
  
  in Public
  
  How many compounds
  
  Use OR Bolean in advanced search
3. rebelford 16 Mar 2017
  
  in Public
  
  How many compounds in
  
  Use "AND" of advanced search, the above classification browser results will be in search history
4. rebelford 16 Mar 2017
  
  in Public
  
  For example, the protein-bound 3-D structure of penicillin V can be accessed via the following UR
  
  This can be done from PubChem TOC of Classification browser, Biomolecular Interactions and Pathways and Protein Bound 3-D Structures The next link is same, but Chemical and Physical Properties/Crystal structures
5. rebelford 16 Mar 2017
  
  in Public
  
  ummarize what Lipinski’s rule of 5 i
  
  </= 5 H Bond donors </= 10 H Nond acceptors </= 500 Daltons LogP </= 5
6. rebelford 16 Mar 2017
  
  in Public
  
  the “MeSH Synonyms” se
  
  This chemical has depositor supplied syntony P071 fro AK Scientific,P071
  
  While Cetrizine has Mesh synonym of P071
7. rebelford 16 Mar 2017
  
  in Public
  
  zyrtec[completesynonym
  
  exact name match (zyrtec-D will not work, it is a partial match)
8. rebelford 16 Mar 2017
  
  in Public
  
  t is the CID of that covalent-bonded unit? [A covalently-bound unit (or simply called covalent unit) consists of a group of
  
  note, two of these are mixtures, which could be filtered out by setting covalent units to 1:1
9. rebelford 16 Mar 2017
  
  in Public
  
  The use of covalently bonded units (instead of components) removes this ambiguity because it is well accepted that NaCl is bonded through an ionic bond.)
  
  sodium chloride comes up with 1:2 not 1:1 covalent units.
10. rebelford 16 Mar 2017
  
  in Public
  
  zyrtec
  
  gets MESH and Depositor supplied synonyms
Visit annotations in context

Annotators

rebelford

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/2017OLCCModule5Assignment.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

Template for Electronic Submission to ACS Journals

9
1. rebelford 15 Mar 2017
  
  in Public
  
  grouping
  
  group
2. rebelford 15 Mar 2017
  
  in Public
  
  These conformer models aim to generate bioactive conformers,which would be found in protein-ligand complexes. For this reason, these conformers are often very different from their experimental structures determined in the gas phase
  
  How do you choose which are best? Both, from the perspective of creating them for the databank, and for using them as a client (in your research).
3. rebelford 15 Mar 2017
  
  in Public
  
  following condition
  
  https://www.ncbi.nlm.nih.gov/pubmed/21933373 https://www.ncbi.nlm.nih.gov/pubmed/21272340 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702940/
4. rebelford 15 Mar 2017
  
  in Public
  
  n the ST-optimization approach, the shape overlap between the molecules (that is, the ST score) are maximized and the single-point CT score is evaluated at that superposition. On the contrary, the CT-optimization considers both ST and CT scores to find the best superposition between molecules, and the single-point ST score is computed at that superposition
  
  Can there be some sort of superposition using an algorithm that is sort of in the middle, maximizes both ST and CT?
5. rebelford 15 Mar 2017
  
  in Public
  
  here NAand NBare the number of bits set in the fingerprints for molecules A and B, respectively, and NABis the number of bits set in both fingerprints.
  
  Is there a service that generates fingerprints from smiles strings?
6. rebelford 15 Mar 2017
  
  in Public
  
  ubChem uses its ownfingerprint called PubChem subgraph fingerprints.
  
  If you follow the link you see the fingerprint key, and this is confusing. How would you represent a molecule with on carbon, 1 oxygen and 2 hydrogen (formaldehyde) when the lowest C count is 2, and H count is 4
7. rebelford 15 Mar 2017
  
  in Public
  
  choose adesireddegree of “sameness” from several predefined options. To see these options, one need to expand the options section by clicking the “plus” button next to the “option” section headin
  
  I get confused between indices and filters. If you open the filters, you see what I thought were Indices, (HBD,HBA,...).
  
  Also, what is the criteria for similar structures? I don't see a 90%, 80%,....
8. rebelford 15 Mar 2017
  
  in Public
  
  PubChem Searc
  
  "Try PubChem Search Beta" from homepage
9. rebelford 15 Mar 2017
  
  in Public
  
  PubChem Chemical
  
  Under Services of homepage
Visit annotations in context

Annotators

rebelford

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/Module06_Structure_Search_v2017_0310 (1).pdf
olcc.ccce.divched.org olcc.ccce.divched.org

5. How to Search PubChem for Chemical Information (Part 1) | DivCHED CCCE: Cheminformatics OLCC

15
1. rebelford 14 Mar 2017
  
  in Public
  
  f the query is a phrase or a name with non-alphanumeric characters, double quotes should be used around the query.
  
  text searches with numbers, should use double quotes
  
  2017OLCCModule5
2. rebelford 14 Mar 2017
  
  in Public
  
  http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pccompound (for Compound) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pcsubstance (for Substance) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pcassay (for BioAssay).
  
  These links are the same as the above section on indices
  
  2017OLCCModule5
3. rebelford 14 Mar 2017
  
  in Public
  
  retrieved in the XML format, using the eInfo functionality in E-Utilities (which will be covered in Module 7):
  
  to open these xml files in Google sheets use: =importxml(A1,"//Field/FullName") where in A1 you place the link below
  
  2017OLCCModule5
4. rebelford 14 Mar 2017
  
  in Public
  
  For numeric indices, a search for a range of values can be done by using minimum and maximum values separated by a colon and followed by the bracketed index name (e.g., “100:105[MolecularWeight]”).
  
  use of numeric indices X:Y
  
  2017OLCCModule5
5. axnakarmi 14 Mar 2017
  
  in Public
  
  FLink tool
  
  The flink tool is also useful for 3D structure, macromolecular structure search. https://goo.gl/tgS4QB Amita
  
  2017OLCCModule5
6. axnakarmi 14 Mar 2017
  
  in Public
  
  However, some special filters, such as the "lipinski rule of 5" filter, or the “all” filter, are not link-based.
  
  What is "lipnski rule of 5" filter and why is it not link based? Amita
  
  2017OLCCModule5
7. axnakarmi 14 Mar 2017
  
  in Public
  
  The Entrez Search and Retrieval System
  
  This link provided here gives detail about Entrez, how it works. http://www.ncbi.nlm.nih.gov/books/NBK184582/ Amita
  
  2017OLCCModule5
8. olcc197 14 Mar 2017
  
  in Public
  
  In practice, the identifier exchange service may be used as a quick approach to search the PubChem Compound database using multiple queries, although this type of task may be performed programmatically (for example, using PUG-REST,10 which will be discussed in Module 7).
  
  since pug rest query can be used as a quick approach to search the pubchem compounds. can I also use it to get the list of all pubchem IDs, if yes pls how?.(nwume)
  
  2017OLCCModule5
9. apcornell 14 Mar 2017
  
  in Public
  
  3.2. Identifier Exchange Service
  
  Can this service be accessed by other methods instead of using the form?
  
  Andrew
  
  2017OLCCModule5
10. olcc197 14 Mar 2017
  
  in Public
  
  “[synonym]” index returns additional 97 compounds.
  
  i noticed that when i search Aspirin[synonym] i get 97 hits but if i search Aspirin[synonyms] i get 102 hits which is the same number "aspirin" gave me. can someone explain why its like that, because i thought it will give me the same number of hits as aspirin[synonym] (.Nwume)
  
  2017OLCCModule5
11. OLCCS10 14 Mar 2017
  
  in Public
  
  MeSH Synonyms
  
  Why are some of the MeSH synonyms not the same as the depositor-provided synonyms Emily
  
  2017OLCCModule5
12. olccs16 13 Mar 2017
  
  in Public
  
  1.2.6. Entrez History
  
  Is there a way for Entrez to save my search history permanently? -Phuc
  
  2017OLCCModule5
13. OLCCS17 13 Mar 2017
  
  in Public
  
  Entrez links are cross links or associations between records in different Entrez databases, or within the same database. These links may be applied to an entire search result list (via the “find related data” section at the right column of a DocSum page) or to an individual record (via links at the bottom of each record presented on the DocSum page). The Entrez links provide a way to discover relevant information in other Entrez databases based on a user’s specific interests. Equivalently, one may think of this as a way to transform an identifier list from one database to another based on a particular criterion. Note that there are limits to how many records may be used as input in a link operation. To process a large amount of input records and/or to expect a large amount of output records associated with the input records, one should use the FLink tool (https://www.ncbi.nlm.nih.gov/Structure/flink/flink.cgi). A complete list of the Entrez links available for the three PubChem databases can be retrieved in the XML format through these links http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pccompound (for Compound) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pcsubstance (for Substance) http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pcassay (for BioAssay).
  
  what is XML format, and how can it be applied? Esther
  
  2017OLCCModule5
14. axnakarmi 13 Mar 2017
  
  in Public
  
  aspirin[completesynonym] (1 hit, as of Feb. 26, 2017)https://www.ncbi.nlm.nih.gov/pccompound/?term=aspirin%5Bcompletesynonym%5D aspirin[synonym] (98 hits)https://www.ncbi.nlm.nih.gov/pccompound/?term=aspirin%5Bsynonym%5D aspirin (103 hits)
  
  I wonder why do we get different numbers of compound lists when we hit different type of queries? Amita
15. OLCCS15 12 Mar 2017
  
  in Public
  
  MeSH (Medical Subject Headings)
  
  Is there a difference between the National Library of Medicine MeSH and this MeSH? I was confused when reading and found out that there are two sections where it is discussed. Under Synonym and under classification. Daniel
  
  2017OLCCModule5
Visit annotations in context

Tags

2017OLCCModule5

Annotators

apcornell

rebelford

OLCCS10

olcc197

OLCCS15

OLCCS17

olccs16

axnakarmi

URL

olcc.ccce.divched.org/2017OLCCModule5
olcc.ccce.divched.org olcc.ccce.divched.org

4. Understanding Public Chemical Databases | DivCHED CCCE: Cheminformatics OLCC

30
1. olcc197 14 Mar 2017
  
  in Public
  
  why the “legacy” designation was introduced in PubChem
  
  :…. Pubchem does not allow anyone other than the data contributor to modify the provided information and that makes some of the records in PubChem persist with outdated or incorrect data. so in other to help correct this the “legacy” designation was introduced.(nwume)
  
  2017OLCCModule5
2. apcornell 07 Mar 2017
  
  in Public
  
  (22) ToxNet (http://toxnet.nlm.nih.gov/) (Accessed on 2/19/2017).
  
  I am really glad that this source was used in the write-up. Im not sure why I was so unfamiliar with some of the things that this resource provides given all of the articles i have read from TOXNET, but I will be using it more in the future.
  
  Andrew
  
  2017OLCCModule4
3. OLCCS198 07 Mar 2017
  
  in Public
  
  Therefore, it is very common that database groups exchange their information with each other.
  
  Does this mean PubChem is sharin information between BioAssay and Compound databases or PubChem is sharing information with ChemSpider, etc.? Lyndsie
  
  2017OLCCModule4
4. OLCCS198 07 Mar 2017
  
  in Public
  
  Although the data provenance information is critical in the reliability of a data source (and its data), this information is not easy to manage.
  
  How do we determine the reliability of a source? Like, I've gone through and looked at the data provenance, how do I say this peice of info is reliable, but this conflicting information on a different database is not based on their data provenance?
  
  Lydndsie
  
  2017OLCCModule4
5. OLCCS198 07 Mar 2017
  
  in Public
  
  In turn, users should always pay attention to the data provenance issue when using a database.
  
  Since we are focusing on PubChem and this sentence tell us, as users, to pay attention to the sources, how do we find the source on PubChem? Furthermore, say that source leads to another souce, how far do we go back?
  
  Lyndsie
  
  2017OLCCModule4
6. apcornell 07 Mar 2017
  
  in Public
  
  TOXNET (http://toxnet.nlm.nih.gov/)22-25, maintained by the National Library of Medicine (NLM) at NIH, is a group of databases covering toxicology, hazardous chemicals, toxic releases, environmental and occupational health, risk assessment. Currently, 16 databases are integrated into the TOXNET system, and users can search all these databases either at once or individually. While all the 16 databases provide valuable information, three of them may be worth mentioning in the context of this course.
  
  If i were a chemical supply vender and wanted to update the SDS sheets for important information updates, would this be recommended as the first place to perform a search? It seems like a perfect source, but with a lot of entries in SDS sheets, they may state that something is a possible carcinogin for example. Its not really clear to me if TOXNET includes possible hazards or only the ones that have been confirmed through scientific means.
  
  Andrew
  
  2017OLCCModule4
7. apcornell 07 Mar 2017
  
  in Public
  
  The error propagation issue is a serious, but very common, problem.39,40 Therefore, when using information in these databases, one should keep in mind various data accuracy and quality issues prevalent in these databases.
  
  Over the years, we have seen some of the errors mentioned when pulling data sets and using APIs, especially when we would use database entries as the basis to perform conversions. The most difficult involved either, identifiers that had multiple possibilities or through use of a chemicals common name which had a lot of varyance in different databases.
  
  Andrew
  
  2017OLCCModule4
8. apcornell 07 Mar 2017
  
  in Public
  
  As of February 2017, PubChem contains more than 235 million depositor-provided substances, 94 million unique chemical structures, and one million biological assays, which cover about 10 thousand protein target sequences.
  
  I know that PubChem houses a lot of data and also pulls data from many other sources. Due to this, would the 235 million deposited chemical structures mean that it holds that many in its own database or is that number a sum of entries held in many separate databases?
  
  Andrew
  
  2017OLCCModule4
9. apcornell 07 Mar 2017
  
  in Public
  
  2.5. NIST Webbook: thermodynamic and spectroscopic data of chemicals
  
  I find it really fascinating that the NIST Webbook has versions that go back to 1996 for their database. Do they offer direct access to this database or would bulk data only be acquired through their product services or web scraping?
  
  Andrew
  
  2017OLCCModule4
10. apcornell 07 Mar 2017
  
  in Public
  
  It is very common that a primary database curates its data with information drawn from secondary databases.
  
  I was unaware that this was very common. Its easy to wrap my head around secondary databases pulling from primary or even from other secondary sources, but I wonder if pulling secondary into primary would then make it primary data based on the way it may be used in the new database. Maybe the data exchange and integration mentioned in the following sentence causes a new way to directly use data in a primary way.
  
  Andrew
  
  2017OLCCModule4
11. axnakarmi 07 Mar 2017
  
  in Public
  
  TOXNET (http://toxnet.nlm.nih.gov/)22-25, maintained by the National Library of Medicine (NLM) at NIH, is a group of databases covering toxicology, hazardous chemicals, toxic releases, environmental and occupational health, risk assessment.
  
  Does Toxnet also deals with nanomaterials and environmental pollutions? Amita
  
  2017OLCCModule4
12. axnakarmi 07 Mar 2017
  
  in Public
  
  2.6. DrugBank: comprehensive information on drug molecules
  
  More details about DrugBank can get from this link. (http://www.drugbank.ca/about) Amita
13. olccs16 07 Mar 2017
  
  in Public
  
  As of February 2017, PubChem’s data are from more than 500 organizations, including government agencies, university labs, pharmaceutical companies, substance vendors, and other databases
  
  How would Pubchem curate and check the validity of those data coming from different sources? Phuc
  
  2017OLCCModule4
14. olccs16 07 Mar 2017
  
  in Public
  
  Getting the most out of PubChem for virtual screening
  
  What are other tool for drugs virtual screening in Pubchem beside structure similarities? Phuc
  
  2017OLCCModule4
15. OLCCS10 06 Mar 2017
  
  in Public
  
  The term “data provenance” refers to a record trail that describes the origin or source of a piece of data and the process by which it entered in a database.1
  
  Is this the same as a substance record on PubChem? Emily
  
  2017OLCCModule4
16. OLCCS10 06 Mar 2017
  
  in Public
  
  (d) Explain the reason why the “legacy” designation was introduced in PubChem in two or three sentences.
  
  The best explanation for this in 2.4 in the article at the bottom of the page. Emily
  
  2017OLCCModule4
17. OLCCS10 06 Mar 2017
  
  in Public
  
  Some records in PubChem are “non-live”, meaning that they are “not searchable”, although they do exist in the database. This exercise is designed to help students better understand what non-live records are.
  
  Here is a great explanation on what "non-live" records are in PubChem. Emily
  
  2017OLCCModule4
18. OLCCS15 06 Mar 2017
  
  in Public
  
  Therefore, databases need to document the provenance of the data and devise a way to notify users of that information. In turn, users should always pay attention to the data provenance issue when using a database.
  
  Since documenting the provenance of data is still a working progress, can we say public databases are not completely reliable? Daniel
  
  2017OLCCModule4
19. olcc197 05 Mar 2017
  
  in Public
  
  ChEMBL:
  
  how does CHEMBL create the chemical structure for each compound? (NWUME)
  
  2017OLCCModule4
20. OLCCS17 04 Mar 2017
  
  in Public
  
  PubChem organizes its data into three inter-linked databases: Substance, Compound, and BioAssay (See Table 1), which can be searched from either the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) or the web page of one of the three PubChem databases. Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID Individual data contributors deposit information on chemical substances to the Substance database (https://www.ncbi.nlm.nih.gov/pcsubstance). Different data contributors may provide information on the same molecule, hence the same chemical structure may appear multiple times in the Substance database. To provide a non-redundant view, chemical structures in the Substance database are normalized through a process called “standardization” and the unique chemical structures are identified and stored in the Compound database (https://www.ncbi.nlm.nih.gov/pccompound). The difference between the Substance and Compound databases is explained in more detail in this blog post.
  
  How do we use a process called standardization to normalize substance database?
  
  Esther
  
  2017OLCCModule4
21. olcc197 04 Mar 2017
  
  in Public
  
  1.2. Primary databases vs. secondary databases
  
  please can someone classify the pulic databases mentioned in module 4 as primary and secondary databases.
  
  2017OLCCModule4
22. olcc197 04 Mar 2017
  
  in Public
  
  Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID
  
  is it possible for someone to update substance data in PubChem and i hope once you update it ,it will not affect the SID. (NWUME)
  
  2017OLCCModule4
23. olcc197 04 Mar 2017
  
  in Public
  
  ChemIDPlus26,27 is a dictionary of over 400,000 chemical records (names, synonyms, and structures) and provides access to the structure and nomenclature files used for the identification of chemical substances in the TOXNET system and other NLM databases. The Hazardous Substances Data Bank (HSDB)28,29 focuses on the toxicology of potentially hazardous chemicals, providing information on human exposure, industrial hygiene, emergency handling procedures, environmental fate, regulatory requirements, nanomaterials, and related areas. All HSDB data are referenced and derived from a core set of books, government documents, technical reports and selected primary journal literature. Importantly, HSDB is peer-reviewed by the Scientific Review Panel (SRP), a committee of experts in the major subject areas within the data bank's scope. The Comparative Toxicogenomics Database (CTD)30,31 contains manually curated data describing interactions of chemicals with genes/proteins and diseases. This database provides insight into the molecular mechanisms underlying variable susceptibility for environmentally influenced diseases
  
  since currently 16 databases are integrated into TOXNET SYSTEM but only 3 were shown here ,how can i find the rem 13 databases. (NWUME)
  
  2017OLCCModule4
24. olcc197 04 Mar 2017
  
  in Public
  
  All information in the Substance database is submitted by individual data depositors. However, the Compound database does contain information that are not submitted by data depositors,
  
  Are there sample files for submitting substances in PUBCHEM since substance database can be submitted by individual data depositors (NWUME)
  
  2017OLCCModule4
25. OLCCS17 04 Mar 2017
  
  in Public
  
  1.2. Primary databases vs. secondary databases Databases are often categorized into primary and secondary databases. Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”). In essence, these databases serve as archives that keep original data. Therefore, they are also known as archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data. These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature. Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data. For this reason, they are also known as curated databases, or knowledgebase.
  
  Can we classify PubChem as secondary database since it collects data from other sources?
  
  Esther
  
  2017OLCCModule4
26. OLCCS17 04 Mar 2017
  
  in Public
  
  PubChem organizes its data into three inter-linked databases: Substance, Compound, and BioAssay (See Table 1), which can be searched from either the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) or the web page of one of the three PubChem databases. Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID Individual data contributors deposit information on chemical substances to the Substance database (https://www.ncbi.nlm.nih.gov/pcsubstance). Different data contributors may provide information on the same molecule, hence the same chemical structure may appear multiple times in the Substance database. To provide a non-redundant view, chemical structures in the Substance database are normalized through a process called “standardization” and the unique chemical structures are identified and stored in the Compound database (https://www.ncbi.nlm.nih.gov/pccompound). The difference between the Substance and Compound databases is explained in more detail in this blog post.
  
  what is the difference between Upload ID, Registry ID,SID, CID and AID in PubChem? Esther
  
  2017OLCCModule4
27. OLCCS17 04 Mar 2017
  
  in Public
  
  1.2. Primary databases vs. secondary databases Databases are often categorized into primary and secondary databases. Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”). In essence, these databases serve as archives that keep original data. Therefore, they are also known as archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data. These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature. Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data. For this reason, they are also known as curated databases, or knowledgebase.
  
  Can i allow my associates or reviewers to access my on-hold assay data? Esther
  
  2017OLCCModule4
28. OLCCS17 04 Mar 2017
  
  in Public
  
  ChEMBL: literature-extracted biological activity information ChEMBL (https://www.ebi.ac.uk/chembl/)8,9 is a large bioactivity database, developed and maintained by the European Bioinformatics Institute (EBI), which is part of the European Molecular Biology Laboratory (EMBL). The core activity data in ChEMBL are “manually” extracted from the full text of peer-reviewed scientific publications in select chemistry journals, such as Journal of Medicinal Chemistry, Bioorganic Medicinal Chemistry Letters, and Journal of Natural products. From each publication, details of the compounds tested, the assays performed and any target information for these assays are abstracted. ChEMBL also integrates screening results and bioactivity data from other public databases (such as PubChem BioAssay) and information on approved drugs from the U.S. FDA Orange Book10 and the NLM’s DailyMed
  
  Is it possible for ChEMBL to work when used to search for chemical structures? Esther
  
  2017OLCCModule4
29. OLCCS17 04 Mar 2017
  
  in Public
  
  Public Chemical Databases These days many public online databases provide chemical information free of charge and the databases mentioned in this module are only a few examples of them. Note that these databases vary in size and scope. 2.1. PubChem: chemical information repository at the U.S. NIH PubChem (https://pubchem.ncbi.nlm.nih.gov)2-4 is a public repository of information on small molecules and their biological activities, developed and maintained by the National Library of Medicine (NLM), an institute within the U.S. National Institutes of Health (NIH). Since its launch in 2004 as a component of the NIH’s Molecular Libraries Roadmap Initiatives, it has been rapidly growing, and now serves as a key chemical information resource for researchers in many biomedical science areas, including cheminformatics, chemical biology, and medicinal chemistry. Detailed information on PubChem can be found in these three papers:
  
  if i want to submit manuscript to a journal, what pubchem identifier can i use? Esther
  
  2017OLCCModule4
30. olcc197 04 Mar 2017
  
  in Public
  
  The Protein Data Bank (PDB) is an archive of the experimentally determined 3-D structures of large biological molecules such as proteins and nucleic acids.
  
  Aside protein and nucleic acid so protein Data Bank cannot be used to determine 3D structure of another biological molecule. ( NWUME)
  
  2017OLCCModule4
Visit annotations in context

Tags

2017OLCCModule5

2017OLCCModule4

Annotators

apcornell

OLCCS10

olcc197

OLCCS15

OLCCS17

olccs16

OLCCS198

axnakarmi

URL

olcc.ccce.divched.org/2017OLCCModule4
olcc.ccce.divched.org olcc.ccce.divched.org

The elements of experimental chemistry [electronic resource]

1
1. rebelford 08 Mar 2017
  
  in Public
  
  thasSOlongbeenacustomtoprefaceacourseoflectureswiththehistoryofthesciencewhichistheirsubject,thatitmaybenecessarytostate,briefly,thereasonsthathaveinducedmetodepartfromthisestablishedusage.
  
  This is a test of a Gutenberg Archive book uploaded to the OLCC
Visit annotations in context

Annotators

rebelford

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/b21299444.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

Getting the most out of PubChem for virtual screening

3
1. OLCCS10 06 Mar 2017
  
  in Public
  
  Lipinski’sruleof5[50].Among them, 10.3 million (12% of the total) are fragment-like ones, which satisfy Congreve’sruleof3[51].
  
  What are the Lipinski and Congreve rules? Emily
  
  2017OLCCModule4
2. OLCCS15 06 Mar 2017
  
  in Public
  
  Therefore, 3-D neighboring may offer comple-mentary views on structural similarity between molecules withsimilar biological activities.
  
  In researching on the 2-D and 3-D neighbors, i couldn't find which is better because they both have their advantages and disadvantages. Plus since they rely on different methods, which one is used more often? Daniel
  
  2017OLCCModule4
3. OLCCS15 06 Mar 2017
  
  in Public
  
  Forexample, ChEMBL [49] manually extracts bioactivity datafrom peer-reviewed papers published in journals in themedicinal chemistry and natural product domains.
  
  Based on this passage in Section 2.2 on Bioactivity , it will be preferable to use ChEMBL than PubChem for high quality data extraction. Daniel
  
  2017OLCCModule4
Visit annotations in context

Tags

2017OLCCModule4

Annotators

OLCCS15

OLCCS10

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/Getting the most out of PubChem for virtual screening.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

No Job Name

1
1. rebelford 01 Mar 2017
  
  in Public
  
  t the approach above is easily implemented andcan run in a wide range of environments. It has the benefitof synergy with the c
  
  test for public file on cheminformatics OLCC
  
  tlo/test-0
Visit annotations in context

Tags

tlo/test-0

Annotators

rebelford

URL

olcc.ccce.divched.org/system/files/Guha-2-Blue ObelisksInteroperability.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

Untitled document

1
1. rebelford 01 Mar 2017
  
  in Public
  
  es.PubChem archives the molecular structure and bioassay data from the MLSCNand other contributors. PubChem provides search, retrieval, and data analysistools to optimize the utility of these results. PubChem further enhances the re-search utility of the
  
  Restricted Access Article - I am testing to see if this link works when someone is logged in, and is not logged in, to the OLCC course.
  
  tlo/test-0
Visit annotations in context

Tags

tlo/test-0

Annotators

rebelford

URL

olcc.ccce.divched.org/system/files/Bolton-2008-PubChem.pdf
Feb 2017
olcc.ccce.divched.org olcc.ccce.divched.org

3.2 Representing and Managing Digital Spectra | DivCHED CCCE: Cheminformatics OLCC

5
1. apcornell 27 Feb 2017
  
  in Public
  
  NIST Atomic Spectra Database - http://www.nist.gov/pml/data/asd.cfm NIST Molecular Spectra Databases - http://www.nist.gov/pml/data/molspec.cfm NMR Shift DB - http://nmrshiftdb.nmr.uni-koeln.de/ Human Metabolome Database - http://www.hmdb.ca EPA Emissions Measurement Center Spectral Database -http://www3.epa.gov/ttn/emc/ftir/index.html MassBank - http://www.massbank.jp/ Romanian Database of Raman Spectroscopy - http://rdrs.uaic.ro/index.html
  
  Its seems just a matter of time before all of the online sources for spectra offer APIs, possibly including predictive spectra. I wonder how much of a load these types of things put on a server to request.
  
  2017OLCCModule3TLO2
2. apcornell 27 Feb 2017
  
  in Public
  
  This capability was added in order that spectral files were not to large for the storage media available (see above).
  
  Although compression is nice for saving space, would the compression limit the ability to search spectral databases if reading compressed files makes things more complicated. when a lot of files need to be accessed in a single search, such as through an API or user interface search i would think this may take longer.
  
  2017OLCCModule3TLO2
3. OLCCS18 21 Feb 2017
  
  in Public
  
  lthough JCAMP-DX has not formally been standardized, it is currently the de facto standard for sharing spectral data and all the major databases store their data in the format.
  
  How & WHY did the JCMAP format become the de facto standard even though it was not formally standardized?
  
  2017OLCCModule3TLO2
4. axnakarmi 21 Feb 2017
  
  in Public
  
  websites where you can obtain reliable spectral data, and software for viewing/simulating spectra.
  
  Is there any free software or website in which we can analyze SEM, TEM images and TGA spectra?
  
  2017OLCCModule3TLO2
5. OLCCS17 21 Feb 2017
  
  in Public
  
  #YFACTOR= 9.5367E-7 … ##XYDATA= (X++(Y..Y)) 4400 68068800 68092800 68145600 68100800 68140800 68232000 4394 68304000 68316800 68195200 68152000 68182400 68176000 4388 68240000 68252800 68156800 68156800 68236800 68292800 4382 68302400 68265600 68233600 68214400 68224000 68284800 4376 68353600 68334400 68219200 68230400 68315200 68276800 4370 68259200 68264000 68257600 68316800 68292800 68339200
  
  i dont understand the actual meaning of this DATA table and it's format
  
  2017OLCCModule3TLO2
Visit annotations in context

Tags

2017OLCCModule3TLO2

Annotators

apcornell

axnakarmi

OLCCS17

OLCCS18

URL

olcc.ccce.divched.org/2017OLCCModule3TLO2
olcc.ccce.divched.org olcc.ccce.divched.org

3.1 Chemical Property Data | DivCHED CCCE: Cheminformatics OLCC

8
1. apcornell 27 Feb 2017
  
  in Public
  
  It is fun to point out though that Binary Large Objects (BLOBs) can be used to store binary files like images, audio, etc.
  
  Would storing images in a database as a blob cause poor performance. This seems like a lot for a single entry. I would think storing a link to the image in the database would reduce the entry size.
  
  2017OLCCModule3TLO1
2. axnakarmi 21 Feb 2017
  
  in Public
  
  all data on a computer must be represented in binary notation.
  
  I am quite confused with binary notation. How is it related with chemistry?
  
  2017OLCCModule3TLO1
3. OLCCS18 21 Feb 2017
  
  in Public
  
  While there are a number of styles of database the most common currently are relational.
  
  Graph databases and other NoSQL information stores (non relational) seem to be gaining traction in filling gaps where traditional relational database struggle. MongoDB may be a good tool to include as part of future material for this course. {edit} just noticed you mention NoSQL later in the text...apologies!)
  
  2017OLCCModule3TLO1
4. axnakarmi 21 Feb 2017
  
  in Public
  
  SPARQL which is the interestingly recursive acronym that stands for ‘SPARQL Protocol and RDF Query Language’.
  
  What is the significant of this?
  
  2017OLCCModule3TLO1
5. OLCCS18 21 Feb 2017
  
  in Public
  
  n the context of discussing the common ‘data types’ we are going to reference those that are used in the relational database software ‘MySQL’.
  
  How important are the details of MySQL datatypes? Are these datatypes consistent across other relational databases?
  
  2017OLCCModule3TLO1
6. OLCCS18 21 Feb 2017
  
  in Public
  
  STIX fonts
  
  How far back in history does the STIX fonts project consider for mapping languages? For example, if one wanted to have unicodes for scientific texts from ancient egyptian era, would STIX provide a glpyh?
  
  2017OLCCModule3TLO1
7. OLCCS17 21 Feb 2017
  
  in Public
  
  Then there is the much more sophisticated Visual Basic for Applications, (VBA), which sits behind Excel and is used to record the Macro’s run in Excel. VBA is much more a true programming language allowing for declaration of variables, loop structures, if-then-else conditionals and user defi
  
  please i do not understand the practical way of using excel to check for stings and to tell if the strings is a valid InChI string or not.
  
  2017OLCCModule3TLO1
8. rebelford 20 Feb 2017
  
  in Public
  
  Amino Acid Properties Thermodynamic Properties of Pure Substances
  
  bad links
Visit annotations in context

Tags

2017OLCCModule3TLO1

Annotators

apcornell

rebelford

OLCCS18

OLCCS17

axnakarmi

URL

olcc.ccce.divched.org/2017OLCCModule3TLO1
olcc.ccce.divched.org olcc.ccce.divched.org

3. Data Representation on Computer for Chemists | DivCHED CCCE: Cheminformatics OLCC

9
1. olccs16 21 Feb 2017
  
  in Public
  
  ANalytical Data Interchange (ANDI)
  
  Is this type of file recently developed? I have not ever seen this one as the save option in our UALR Agilent GC-MS
  
  2017OLCCMOdule3TLO2 2017OLCCModule3
2. olccs16 21 Feb 2017
  
  in Public
  
  <request string="arsinic acid" representation="names"> <data id="1" resolver="name_by_opsin" notation="arsinic acid"> <item id="1" classification="pubchem_iupac_name">arsinic acid</item> <item id="2" classification="pubchem_substance_synonym">CHEBI:29840</item> <item id="3" classification="pubchem_substance_synonym">HAsH2O2</item> <item id="4" classification="pubchem_substance_synonym">[AsH2O(OH)]</item> <item id="5" classification="pubchem_substance_synonym">arsinic acid</item> <item id="6" classification="pubchem_substance_synonym">dihydridohydroxidooxidoarsenic</item> </data> </request>
  
  How can you pull this string of data? is it from inspection option on the browser? what type of file is this?
  
  2017OLCCMOdule3TLO2 2017OLCCModule3
3. olccs16 21 Feb 2017
  
  in Public
  
  chemical metadata (information about chemicals – not chemical data like melting points etc.)
  
  I need more to clarification about chemical metadata. Like what is it represent if it's not like melting point??
  
  2017OLCCMOdule3TLO2
4. olcc197 21 Feb 2017
  
  in Public
  
  instrument data and metadata
  
  please, can someone tell me the difference between instrument data and metadata.
  
  2017OLCCModule3TLO2
5. olcc197 21 Feb 2017
  
  in Public
  
  Although we still ‘use’ ASCII today, in reality we use something called UTF-8. This is easier to say than how it is derived - Universal Coded Character Set + Transformation Format - 8-bit. Unicode (see http://unicode.org) started in 1987 as an effort to create a universal character set that would encompass characters from all languages and defined 16-bits, two bytes -> 216 -> 256 x 256 = 65536 possible characters – or code points. Today, the first 65536 characters are considered the “Basic Multilingual Plane”, and in addition there are sixteen other planes for representing characters giving a total of 1,114,112 code points. Thankfully, we don’t need to worry because if something is UTF-8 encoded it is backward compatible with the first 128 ASCII characters.
  
  please i want know if it is possible that all computer can make use of UTF-8 and uincode
  
  2017OLCCModule3TLO1
6. OLCCS10 20 Feb 2017
  
  in Public
  
  It was recognized in 2004 that there needed to be a successor to JCAMP-DX because of i) advances in technology, ii) a recognized need to represent data from all analytical techniques, and iii) issues with variants of JCAMP-DX that made interoperability of the files difficult. AnIML files consist of up to four data sections; SampleSet, ExperimentStepSet, AuditTrailEntrySet, and SignatureSet. By design very little data/metadata is required so that legacy data, which may not have much or any metadata to describe it, can be stored in the AnIML format. An example ‘minimum’ AnIML file is shown below:
  
  What are the major differences between JCAMP-DX and its successor AnIML, and how have the problems with JCAMP-DX been corrected with AnIML?
  
  2017OLCCModule3TLO2
7. OLCCS10 20 Feb 2017
  
  in Public
  
  widgets
  
  Why haven’t the major chemical database created widgets, the ones that can be downloaded and used as plug-ins on search engines like google chrome?
  
  2017OLCCModule3TLO1
8. OLCCS15 20 Feb 2017
  
  in Public
  
  Although it grew out of the relational database model
  
  Does this imply that the older relational softwares are no longer in use?
  
  2017OLCCModule3TLO1
9. OLCCS15 20 Feb 2017
  
  in Public
  
  This is for mass spectrometry and chromatography data
  
  For representing and managing Digital Spectra, is ANDI only use for mass spectrometry and chromatography data?
  
  2017OLCCModule3TLO2
Visit annotations in context

Tags

2017OLCCModule3

2017OLCCMOdule3TLO2

2017OLCCModule3TLO1

2017OLCCModule3TLO2

Annotators

OLCCS15

olccs16

olcc197

OLCCS10

URL

olcc.ccce.divched.org/2017OLCCModule3
olcc.ccce.divched.org olcc.ccce.divched.org

2.3 Chemical Representations on Computer: Part III | DivCHED CCCE: Cheminformatics OLCC

17
1. olccs11 18 Feb 2017
  
  in Public
  
  GENSAL (GENeric Structure LAnguage
  
  Why GENSAL and not GENSLA?
  
  2017OLCCModule2P3
2. olccs11 18 Feb 2017
  
  in Public
  
  Note that aromaticity is not a measurable physical quantity, but a concept without a unanimous mathematical definition. As a result, different aromaticity detection algorithms often disagree with each other on whether a given molecule is aromatic or not, making it difficult to interchange information between databases that use different aromaticity detection algorithms for SMILES generation
  
  Do any of the aromaticity detection algorithms employ Huckel's Rule (4n+2pi electron rule) for predicting aromaticity?
  
  2017OLCCModule2P3
3. olccs11 18 Feb 2017
  
  in Public
  
  ASCII
  
  What does ASCII stand for?
4. apcornell 15 Feb 2017
  
  in Public
  
  Hashing is a one-way mathematical transformation typically used to calculate a compact fixed length digital representation of a much longer string of arbitrary length.
  
  This is very similar to how I use to protect passwords on my server before SSH keys became the standard. We used similar protocols under the MD5 standard so its very interesting to see the same thing used to make something easier to find with search engines as it was to keep something from being found when used as a security measure.
  
  2017OLCCModule2P3
5. axnakarmi 14 Feb 2017
  
  in Public
  
  Another extension of SMILES is SMIRKS28,29, which is a line notation for generic reactions.
  
  Can you provide more details of SMIRKS and SMARTS with examples? How can we generate any reaction using this extension?
  
  2017OLCCModule2P3
6. axnakarmi 14 Feb 2017
  
  in Public
  
  Actually, it is very common that there are a lot of SMILES strings that represent the same structure, whether it has a ring or not, because one can start with any atom in a molecule to derive a SMILES string. Therefore, it is necessary to select a “unique SMILES” for a molecule among many possibilities. Because this is done through a process called “canonicalization”, this unique SMILES string is also called the “canonical SMILES”.
  
  How can we do canonicalization to get unique SMILES?
  
  2017OLCCModule2P3
7. OLCCS17 14 Feb 2017
  
  in Public
  
  In SMILES, atoms are represented by their atomic symbols. The second letter of two-character atomic symbols must be entered in lower case. Each non-hydrogen atom is specified independently by its atomic symbol enclosed in square brackets, [ ] (for example, [Au] or [Fe]). Square brackets may be omitted for elements in the “organic subset” (B, C, N, O, P, S, F, Cl, Br, and I) if the proper number of “implicit” hydrogen atoms is assumed. “Explicitly” attached hydrogens and formal charges are always specified inside brackets. A formal charge is represented by one of the symbols + or -. Single, double, triple, and aromatic bonds are represented by the symbols, -, =, #, and :, respectively. Single and aromatic bonds may be, and usually are, omitted. Here are some examples of SMILES strings
  
  According to Smiles specification rules, atom with two characters are enclosed in a square bracket.why is CL, BR not included.?
  
  2017OLCCModule2P3
8. OLCCS17 14 Feb 2017
  
  in Public
  
  Line notations represent structures as a linear string of characters. They are widely used in Cheminformatics because computers can more easily process linear strings of data. Examples of line notations include the Wiswesser Line-Formula Notation (WLN)1, Sybyl Line Notation (SLN)2,3 and Representation of structure diagram arranged linearly (ROSDAL)4,5. Currently, the most widely used linear notations are the Simplified Molecular-Input Line-Entry System (SMILES)6-9 and the IUPAC Chemical Identifier (InChI)10-13, which are described below.
  
  In this context, does it mean that WLN, SLN and ROSDAL line notations are no longer in existence, since SMILES and Inchi are widely used now.
  
  2017OLCCModule2P3
9. OLCCS17 14 Feb 2017
  
  in Public
  
  The Simplified Molecular-Input Line-Entry System (SMILES)6-9 is a line notation for describing chemical structures using short ASCII strings.
  
  What is the meaning of ASCII strings?
  
  2017OLCCModule2P3
10. olcc197 14 Feb 2017
  
  in Public
  
  Many databases such as PubChem17, ChemSpider18, ChEBI19, and NIST Chemistry Webbook20 accept InChI and InChIKey strings as queries to search for chemical structures. InChIs and InChIKeys can also be used as queries in UniChem21 to produce cross-references between chemical structure identifiers from different databases.
  
  when all these databases, pubchem ,chemspider chEBI and NIST are used to search for the inchis or inchikey of a particular structure are they going to give the same result
  
  2017OLCCModule2P3
11. olcc197 14 Feb 2017
  
  in Public
  
  the standard InChI
  
  when open smiles and standard inchi were not in existence what was the previous scientist doing to avoid different result in their project when they use different software?
  
  2017OLCCModule2P3
12. olcc197 14 Feb 2017
  
  in Public
  
  hey are widely used in Cheminformatics because computers can more easily process linear strings of data. Examples of line notations include the Wiswesser Line-Formula Notation (WLN)1, Sybyl Line Notation (SLN)2,3 and Representation of structure diagram arranged linearly (ROSDAL)4,5. Currently, the most widely used linear notations are the Simplified Molecular-Input Line-Entry System (SMILES)6-9 and the IUPAC Chemical Identifier (InChI)10-13, which are described below.
  
  since smiles and inchi are currently used the linear notation ,does it mean that WLN, SLN and ROSDAL when used now can yield a wrong result since they are no more in existence.
  
  2017OLCCModule2P3
13. olccs16 14 Feb 2017
  
  in Public
  
  Generic structures are commonly used in chemistry texts as well as in chemical patents in which the inventor claims a whole class of related compounds. Generic structures are more often called “Markush” structures after Dr. Eugene A. Markush, who involved in a legal case which set a precedent in the USA for generic chemical structure patent filing.
  
  This might sound a bit dumb! Can Inchi be generate from a Markush structure? I know generic structure and Inchi are quite contradicted from each other. Markush can be quite proprietary and InCHI is open science.
  
  2017OLCCModule2P3
14. olccs16 14 Feb 2017
  
  in Public
  
  c1ccccc1 Benzene (C6H6)
  
  if there are different substituent group on a benzene. Will SMILE indicate its position such as Ortho, Para, Meta?
  
  2017OLCCModule2P3
15. apcornell 13 Feb 2017
  
  in Public
  
  As a result, different aromaticity detection algorithms often disagree with each other on whether a given molecule is aromatic or not, making it difficult to interchange information between databases that use different aromaticity detection algorithms for SMILES generation.
  
  What process or services would be needed for the databases to perform this interchange?
  
  2017OLCCModule2P3
16. OLCCS10 13 Feb 2017
  
  in Public
  
  There are currently six InChI layer types, each different class of structural information: the main layer, a charge layer, a stereochemical layer, an isotopic layer, a fixed-H layer and a reconnected layer.
  
  I understand the first four layers of InChI, but what does it mean about the last 2 a fixed-H layer and a reconnected layer?
  
  2017OLCCModule2P3
17. OLCCS10 13 Feb 2017
  
  in Public
  
  SMARTS is useful for substructure searching, which finds a particular pattern (subgraph) in a molecule.
  
  Can you give some examples of the substructures that are searched for?
  
  2017OLCCModule2P3
Visit annotations in context

Tags

2017OLCCModule2P3

Annotators

apcornell

OLCCS10

olcc197

OLCCS17

olccs11

olccs16

axnakarmi

URL

olcc.ccce.divched.org/2017OLCCModule2P3
olcc.ccce.divched.org olcc.ccce.divched.org

InChI, the IUPAC International Chemical Identifier

1
1. olccs16 15 Feb 2017
  
  in Public
  
  The first block of 14 (out of total 27) characters for anInChIKey encodes core molecular constitution, as de-scribed by formula, connectivity, hydrogen positions andcharge sublayers of the InChI main laye
  
  Can you search the web with the first block of the InchI key and find all isomer of the compound?
  
  2017OLCCModule2P3
Visit annotations in context

Tags

2017OLCCModule2P3

Annotators

olccs16

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/InChI-The IUPAC Chemical Identifier.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

Microsoft Word - Introducing Cheminformatics V2.0 - Sample.docx

3
1. apcornell 15 Feb 2017
  
  in Public
  
  Linenotations are not the only way of communicating structure: also popular are file-based formats such asMDL's MOL File19(and its variant, the SD File), and Chemical Markup Language20(CML, avariant of XML
  
  One of the big drawbacks that I often hear when using XML with databasing is that it quickly starts making very large files in terms of storage. Would the same hold true for using CML with large chemical databases? If so, what size reduction could be expected for the same size collection of chemicals stored using connection tables?
  
  2017OLCCModule2P3
2. axnakarmi 14 Feb 2017
  
  in Public
  
  All InChIs currently are prefixed with “INCHI=”. Following this, a designator of “1/” or “1S/” indicates whether the InChI is non-standard or standard (i.e. with fixed standardized options in the software)
  
  Can one compound or molecule have both standard and non-standard InCHIs?
  
  2017OLCCModule2P3
3. OLCCS15 12 Feb 2017
  
  in Public
  
  . Ring aromaticity is handled in SMILES at the atomic level, not at the bond level (i.e. an atom is considered aromatic rather than a
  
  In SMILES, why is aromaticity not considered on bond level? Given that the bonds in the ring structures counts for it's aromaticity.
  
  2017OLCCModule2P3
Visit annotations in context

Tags

2017OLCCModule2P3

Annotators

apcornell

axnakarmi

OLCCS15

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/Introducing Cheminformatics - Wild.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

In Silico Medicinal Chemistry_1.pdf

13
1. apcornell 15 Feb 2017
  
  in Public
  
  Main layer○Chemical formula, no prefix○atom connections, prefix ‘c’○hydrogen atoms, ‘h’●Charge layer○proton sublayer, ‘p’○Charge sublayer, ‘q’●Stereochemical layer○Double bonds and cumulenes, ‘b’○tetrahedral stereochemistry of atoms and allenes, ‘t’ or ‘m’○Stereochemistry information type, ‘s’●Isotope layer, ‘I’, ‘h’, and ‘b’, ‘t’ and ‘m’ for stereochemistry of isotopes●Fixed-h layer, ‘f’
  
  Will a future release version of InChI include a layer to include inorganic molecules that can be standardized? I have come across things in the past saying that it really only works efficiently for organic molecules. I have the understanding that some inorganic things have been included, but not the more complex structures.
  
  2017OLCCModule2P3
2. OLCCS199 14 Feb 2017
  
  in Public
  
  While the InChI representation is normally too complex for a human to decode, it is impossible for even a computer to extract the chemical structure from the InChIKey. therefore, it is important that the InChI repre-sentation is also included in any database.
  
  In instances were chemical structures can't be determined with the provided InChI or InChIKey, are there any tips for searching with an InChI?
  
  2017OLCCModule2P3
3. olccs16 14 Feb 2017
  
  in Public
  
  WLN:WiswesserLineNotatio
  
  I havent encountered this line notation at all. Wonder if is there any database systems still use this as a part of history showcase? (know that it is already out of favor)
  
  2017OLCCModule2P3
4. OLCCS10 13 Feb 2017
  
  in Public
  
  aromatic bonds are implied between aromatic atoms, but may be explicitly defined using the ‘:’ symbol.
  
  When would you use the colon instead of lowercase letters for aromatic bonds?
  
  2017OLCCModule2P3
5. OLCCS15 12 Feb 2017
  
  in Public
  
  another language, based on conventions in SMILeS, has also been devel-oped for rapid substructure searching, called SMiles arbitrary target Spec-ification (SMartS). Similarly, SMIrKS has also been defined as a subset of SMILeS that encodes reaction transforms. SMIrKS does not have a defini-tion, but plays on the SMILeS acronym. SMartS and SMIrKS will be consid-ered in more detail in later chapters.
  
  SMARTS used for rapid substructure searching is noted as another language based on SMILES conventions. What is the meaning of SMIRKS and is it used in a similar form as SMARTS?
  
  2017OLCCModule2P3
6. OLCCS18 09 Feb 2017
  
  in Public
  
  Oneofthemostimportantapplicationsofgraphtheorytochemoinfor-maticsisthatofgraph-matchingproblems.Itisoftendesirableinchemoin-formaticstodeterminedifferingtypesofstructuralsimilaritybetweentwomolecules,oralargersetofmolecules.
  
  I have been looking into chematics, which looks to be graph theory (network) applied to goal of automating chemical synthesis. How ready for primetime is this technology? Do we envision its likely to be militarized/weaponized or remain in the corporate domain? What are thoughts on how these developments will impact chemists in the near term (5 to 10 years). Interesting article from last year on the topic:
  
  http://blogs.sciencemag.org/pipeline/archives/2016/04/12/the-algorithms-are-coming
  
  2017OLCCModule1P1
7. OLCCS17 09 Feb 2017
  
  in Public
  
  the Molfile contains the atoms and the bonding patterns between those atoms, but also includes xyz co-ordinate information so the 3D structure can be explicitly encoded and stored for subsequent use. the file format was orig-inally developed by MDL Information Systems, which through a number of acquisitions and mergers, Symyx technologies and accelrys, respectively, is now subsumed with Biovia, a subsidiary of Dassault Systems.the Molfile is split into distinct lines of information, referred to as blocks. the first three lines of any Molfile contain header information: molecule name or identifier; information regarding its generation, such as software, user, etc.; and the comments line for additional information, but in practice this is often blank. the next line always encodes the metadata regarding the connection table and must be parsed to identify the numbers of atoms and bonds, respectively. the first two digits of this line encode the numbers of atoms and bonds, respectively
  
  I need a clearer explanation on the use of MDL format
8. OLCCS199 09 Feb 2017
  
  in Public
  
  these additional data, coupled with the additional metadata recorded in the SDF format, makes this file format ideal
  
  In what instances is MDL ideal and/or when would you prefer MDL over SDF?
  
  OLCC
9. OLCCS199 09 Feb 2017
  
  in Public
  
  a key advantage of the Molfile and SDF formats is the inclusion of geomet-ric information regarding the spatial arrangement of atoms in three-dimen-sional space.
  
  You talk of advantages of using both MDL and SDF formats, are there any disadvantages that would make you utilize other formats?
  
  OLCC
10. olcc197 09 Feb 2017
  
  in Public
  
  MDL Molfile (extension *.mol) or structure-data file (SDF, extension *.sdf
  
  what is the main difference between MDL and SDF
11. olcc197 09 Feb 2017
  
  in Public
  
  3.3
  
  is adjacency matrix still in existence or its outdated because since it has up to two advantages more than the connection table why do you prefer connection table instead of it.
12. olcc197 09 Feb 2017
  
  in Public
  
  CheMBL and SureCheMB
  
  please i want to understand if chEMBL and SureCHEMBL are doing the same work or do they work differently
13. OLCCS17 09 Feb 2017
  
  in Public
  
  21Structure Representationsystems can encode and decode systematic names such that they follow the naming conventions, but this representation is not commonly used in prac-tice for computational work.Chemical structures are the explicit representation and the chemist’s lingua franca. Chemical structure drawings appear in many scientific publications,
  
  What is the difference between SMILES and Aromatic Smiles?
Visit annotations in context

Tags

2017OLCCModule2P3

OLCC

2017OLCCModule1P1

Annotators

apcornell

OLCCS15

OLCCS10

OLCCS18

OLCCS17

olccs16

OLCCS199

olcc197

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/In Silico Medicinal Chemistry_1.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

ACMB100A-08.tex

1
1. apcornell 13 Feb 2017
  
  in Public
  
  aromatic bond type
  
  How would an aromatic bond be represented if needed to be used to bypass the multiple configuration possibilities?
  
  OLCC
Visit annotations in context

Tags

OLCC

Annotators

apcornell

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/Cheminformatics- an Intro for compter scientists.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

2.1 Chemical Representations on Computer: Part I | DivCHED CCCE: Cheminformatics OLCC

1
1. apcornell 13 Feb 2017
  
  in Public
  
  3D (x,y,z) coordinates can also be stored for each atom and used to display the conformation of a molecule. These coordinates may be determined experimentally (typically via x-ray crystallography), or calculated (using force-fields, quantum chemistry, molecular dynamics or composite models such as docking). Understanding a molecule's actual shape, whether it be in solution, in a vacuum, or in the binding site of a protein, opens up a whole new domain of computational chemistry. Most molecules have some flexibility, and even if a given conformation is the most stable, there are often a number of competing shapes to consider. Knowing how a particular set of coordinates was determined is crucial to making intelligent use of it for cheminformatics purposes.
  
  This would be a very good project if all of the 3D data for the different conformations were all indexed and compared based on the influence type and how it changed a molecules shape.
  
  OLCC
Visit annotations in context

Tags

OLCC

Annotators

apcornell

URL

olcc.ccce.divched.org/2017OLCCModule2P1
olcc.ccce.divched.org olcc.ccce.divched.org

WCMS-36_LR

3
1. apcornell 13 Feb 2017
  
  in Public
  
  A substructure search query can be matchedagainst a connection table atom-by-atom but the so-called subgraph isomorphism algorithm that is usedin substructure search to compare one graph againstanother is slow and complex and it is likely that theremay be many mismatches before a hit is found. Asubstructure search can be carried out faster if an ini-tial screening stage is carried out to filter out quicklystructures that could not possibly be matches. A com-mon method is to use substructure fragments as thefilter.
  
  Its very interesting that breaking apart a structure into fragments, performing a search and then rebuilding the structure as part of a screening process with a complex search would be more accurate than just using a connection table to compare each atom.
  
  OLCC
2. olcc197 08 Feb 2017
  
  in Public
  
  (InChI) and InChIKey
  
  please i want to know if there is any difference between inchi and inchi key
3. olcc197 08 Feb 2017
  
  in Public
  
  Hydrogenatoms are not necessarily included explicitly in aconnection table: they may be implicit.
  
  what will happen if HYDROGEN atom is included explicitly in a connection table please i need a clear explanation.
Visit annotations in context

Tags

OLCC

Annotators

apcornell

olcc197

URL

olcc.ccce.divched.org/sites/olcc.ccce.divched.org/files/Represent of chem structure-Warr.pdf
olcc.ccce.divched.org olcc.ccce.divched.org

Introduction | DivCHED CCCE: Cheminformatics OLCC

1
1. rebelford 09 Feb 2017
  
  in Public
  
  ol-water partition coefficient (ClogP). The descriptors tend to convolute any different properties into these simple scalar descriptors, but can be highly effective in certain circumstances and are widely appreciated for their interpretability in interactive systems. One such set of property descriptors that has gained wide acceptance is the Lipinski rule-of-five, which has been suggested as an heuristic for indicating the oral absorption of a potential drug based on marketed orally-dosed drugs. The rule-of-five applies four calculated properties and defined cut-offs
  
  test of tags
  
  2015OLCCModule1P3a
Visit annotations in context

Tags

2015OLCCModule1P3a

Annotators

rebelford

URL

olcc.ccce.divched.org/2017OLCCModule1P1
olcc.ccce.divched.org olcc.ccce.divched.org

2.2 Chemical Representations on Computer: Part II | DivCHED CCCE: Cheminformatics OLCC

6
1. axnakarmi 09 Feb 2017
  
  in Public
  
  Chirality
  
  How can we write the Mol file for Chirality between two different atoms like chirality between C-CH3 or C-OH?
  
  OLCC
2. axnakarmi 09 Feb 2017
  
  in Public
  
  Bonds Block
  
  It seems confusing to me how do they build up this bond block? I am confused at 4-5,6-1 and 5-7 bond block.
  
  OLCC
3. axnakarmi 09 Feb 2017
  
  in Public
  
  Resonance Run-of-the mill delocalization presents some of the same problems as aromaticity, but there is no conventional label for (non-aromatic) delocalized electrons, such as the delocalized negative charge and pi system in benzoate (VII and VIII). The connection tables will simply represent one resonance structure or another.
  
  In the figure MOL VII and MOL VIII, there is insert of 5 in the V2000 (file format), what is significant of this value? and how can we know which value should be inserted for other resonance structure?
  
  OLCC
4. olccs16 08 Feb 2017
  
  in Public
  
  MOL files do indicate chirality. However, they can do so in two ways. A “1” or “6” in the fourth field of the bonds table indicates wedged and dashed bonds, respectively. A “1” or “2” in the stereochemistry field of the atom table represents the chirality of a stereocenter.
  
  Is there any clue on the Mol File that if the structure has more than one chirality center?
5. OLCCS10 08 Feb 2017
  
  in Public
  
  To make things even more complicated, software may account for the chirality of a stereocenter atom when generating a MOL file but ignore it when rendering a MOL file!
  
  Is there a way to catch this if it happens?
  
  OLCC
6. OLCCS10 08 Feb 2017
  
  in Public
  
  Each of the two Kekulé structures for the benzene ring shows up as a different set of single and double bonds (MOL I, MOL IV). The Bond Tables are different: The MOL file format uses the number 4 to indicate bonds that are explicitly labeled as aromatic (MOL V). This has the advantage of differentiating aromatic bonds from single and double bonds without requiring the chemist to write a script to identify and label the alternating single and double bonds of a Kekulé structure.
  
  When would you use the aromatic structure instead of the Kekulé structures?
  
  OLCC
Visit annotations in context

Tags

OLCC

Annotators

olccs16

axnakarmi

OLCCS10

URL

olcc.ccce.divched.org/2017OLCCModule2P2
olcc.ccce.divched.org olcc.ccce.divched.org

2.2.2. Anatomy of a MOL file | DivCHED CCCE: Cheminformatics OLCC

1
1. olcc197 09 Feb 2017
  
  in Public
  
  A connection table can represent multiple distinct compounds.
  
  SINCE A CONNECTION TABLE CAN REPRESENT MULTIPLE COMPOUNDS IS POSIBLE FOR IT TO BE USED REPRESENT A COMPOUND WITH UP TO 50 TO 100 ATOMS
Visit annotations in context

Annotators

olcc197

URL

olcc.ccce.divched.org/2017OLCCModule2P2TLO2

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

23 1:1 CovalentUnitCount

24 56 documents

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators