- Mar 2017
-
www.ncbi.nlm.nih.gov www.ncbi.nlm.nih.gov
-
To confound the issue, some energy products (eg,5-Hour Energy,Monster Energy,Rockstar) are mar-keted as dietary supplements, while others (eg,Red-Bull) are marketed as conventional foods.4,5Althoughthe FDA regulates both dietary supplements and con-ventional foods under the Federal Food, Drug, andCosmetic Act (FFDCA), the requirements for theseproducts are different, including the process for mar-keting and reporting of adverse events post marketing.
This is very interesting that based on the marketing or how a company classifies their product, they can have different rules to follow when regardless the product is intended to be consumed.
-
-
pubchemblog.ncbi.nlm.nih.gov pubchemblog.ncbi.nlm.nih.gov
-
PubChem is an archive. When a record is updated, it is versioned and the original record is retained and accessible. These dates are more prominent and now includes a modification date table for easy access. For example, the Substance Record page for SID 12345 has multiple versions:
This is a really great feature. With all data, I believe things should be preserved even as new things replace the old.
Andrew
-
-
pubchemblog.ncbi.nlm.nih.gov pubchemblog.ncbi.nlm.nih.gov
-
substance is a contributed chemical substance sample description from a particular PubChem data provider. A compound is a normalized chemical structure representation found in one or more contributed substance
On module 4 #4, it states that databases often share information with each other. Does PubChem share information from the substance database to an outsid data base or do they go through the process to be linked to PubChem compound first? Lyndsie
-
Data is provided by hundreds of contributors (http://pubchem.ncbi.nlm.nih.gov/sources/), including publishers, researchers, chemical vendors, pharmaceutical companies, and a number of important chemical biology resources. Each of these data sources contributes a description of chemical substance samples for which they have information.
How do you become a data contributor if you have things you would like to put into PubChem?
Surely, this would be a fairly simple process with millions of data sets already put into the system.
Andrew
-
A substance is a contributed chemical substance sample description from a particular PubChem data provider
To maintain the quality and integrity of the substance content, what is the validation process PubChem applies to providers and their substance submissions? Michael
-
The distinction is important as PubChem is organized in three separate databases: Compound, Substance, and BioAssay.
What is the connection between bioAssay with Compound ansd Substance? Amita
-
-
olcc.ccce.divched.org olcc.ccce.divched.org
-
(22) ToxNet (http://toxnet.nlm.nih.gov/) (Accessed on 2/19/2017).
I am really glad that this source was used in the write-up. Im not sure why I was so unfamiliar with some of the things that this resource provides given all of the articles i have read from TOXNET, but I will be using it more in the future.
Andrew
-
Therefore, it is very common that database groups exchange their information with each other.
Does this mean PubChem is sharin information between BioAssay and Compound databases or PubChem is sharing information with ChemSpider, etc.? Lyndsie
-
Although the data provenance information is critical in the reliability of a data source (and its data), this information is not easy to manage.
How do we determine the reliability of a source? Like, I've gone through and looked at the data provenance, how do I say this peice of info is reliable, but this conflicting information on a different database is not based on their data provenance?
Lydndsie
-
In turn, users should always pay attention to the data provenance issue when using a database.
Since we are focusing on PubChem and this sentence tell us, as users, to pay attention to the sources, how do we find the source on PubChem? Furthermore, say that source leads to another souce, how far do we go back?
Lyndsie
-
TOXNET (http://toxnet.nlm.nih.gov/)22-25, maintained by the National Library of Medicine (NLM) at NIH, is a group of databases covering toxicology, hazardous chemicals, toxic releases, environmental and occupational health, risk assessment. Currently, 16 databases are integrated into the TOXNET system, and users can search all these databases either at once or individually. While all the 16 databases provide valuable information, three of them may be worth mentioning in the context of this course.
If i were a chemical supply vender and wanted to update the SDS sheets for important information updates, would this be recommended as the first place to perform a search? It seems like a perfect source, but with a lot of entries in SDS sheets, they may state that something is a possible carcinogin for example. Its not really clear to me if TOXNET includes possible hazards or only the ones that have been confirmed through scientific means.
Andrew
-
The error propagation issue is a serious, but very common, problem.39,40 Therefore, when using information in these databases, one should keep in mind various data accuracy and quality issues prevalent in these databases.
Over the years, we have seen some of the errors mentioned when pulling data sets and using APIs, especially when we would use database entries as the basis to perform conversions. The most difficult involved either, identifiers that had multiple possibilities or through use of a chemicals common name which had a lot of varyance in different databases.
Andrew
-
As of February 2017, PubChem contains more than 235 million depositor-provided substances, 94 million unique chemical structures, and one million biological assays, which cover about 10 thousand protein target sequences.
I know that PubChem houses a lot of data and also pulls data from many other sources. Due to this, would the 235 million deposited chemical structures mean that it holds that many in its own database or is that number a sum of entries held in many separate databases?
Andrew
-
2.5. NIST Webbook: thermodynamic and spectroscopic data of chemicals
I find it really fascinating that the NIST Webbook has versions that go back to 1996 for their database. Do they offer direct access to this database or would bulk data only be acquired through their product services or web scraping?
Andrew
-
It is very common that a primary database curates its data with information drawn from secondary databases.
I was unaware that this was very common. Its easy to wrap my head around secondary databases pulling from primary or even from other secondary sources, but I wonder if pulling secondary into primary would then make it primary data based on the way it may be used in the new database. Maybe the data exchange and integration mentioned in the following sentence causes a new way to directly use data in a primary way.
Andrew
-
TOXNET (http://toxnet.nlm.nih.gov/)22-25, maintained by the National Library of Medicine (NLM) at NIH, is a group of databases covering toxicology, hazardous chemicals, toxic releases, environmental and occupational health, risk assessment.
Does Toxnet also deals with nanomaterials and environmental pollutions? Amita
-
As of February 2017, PubChem’s data are from more than 500 organizations, including government agencies, university labs, pharmaceutical companies, substance vendors, and other databases
How would Pubchem curate and check the validity of those data coming from different sources? Phuc
-
Getting the most out of PubChem for virtual screening
What are other tool for drugs virtual screening in Pubchem beside structure similarities? Phuc
-
The term “data provenance” refers to a record trail that describes the origin or source of a piece of data and the process by which it entered in a database.1
Is this the same as a substance record on PubChem? Emily
-
(d) Explain the reason why the “legacy” designation was introduced in PubChem in two or three sentences.
The best explanation for this in 2.4 in the article at the bottom of the page. Emily
-
Some records in PubChem are “non-live”, meaning that they are “not searchable”, although they do exist in the database. This exercise is designed to help students better understand what non-live records are.
Here is a great explanation on what "non-live" records are in PubChem. Emily
-
Therefore, databases need to document the provenance of the data and devise a way to notify users of that information. In turn, users should always pay attention to the data provenance issue when using a database.
Since documenting the provenance of data is still a working progress, can we say public databases are not completely reliable? Daniel
-
ChEMBL:
how does CHEMBL create the chemical structure for each compound? (NWUME)
-
PubChem organizes its data into three inter-linked databases: Substance, Compound, and BioAssay (See Table 1), which can be searched from either the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) or the web page of one of the three PubChem databases. Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID Individual data contributors deposit information on chemical substances to the Substance database (https://www.ncbi.nlm.nih.gov/pcsubstance). Different data contributors may provide information on the same molecule, hence the same chemical structure may appear multiple times in the Substance database. To provide a non-redundant view, chemical structures in the Substance database are normalized through a process called “standardization” and the unique chemical structures are identified and stored in the Compound database (https://www.ncbi.nlm.nih.gov/pccompound). The difference between the Substance and Compound databases is explained in more detail in this blog post.
How do we use a process called standardization to normalize substance database?
Esther
-
1.2. Primary databases vs. secondary databases
please can someone classify the pulic databases mentioned in module 4 as primary and secondary databases.
-
Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID
is it possible for someone to update substance data in PubChem and i hope once you update it ,it will not affect the SID. (NWUME)
-
ChemIDPlus26,27 is a dictionary of over 400,000 chemical records (names, synonyms, and structures) and provides access to the structure and nomenclature files used for the identification of chemical substances in the TOXNET system and other NLM databases. The Hazardous Substances Data Bank (HSDB)28,29 focuses on the toxicology of potentially hazardous chemicals, providing information on human exposure, industrial hygiene, emergency handling procedures, environmental fate, regulatory requirements, nanomaterials, and related areas. All HSDB data are referenced and derived from a core set of books, government documents, technical reports and selected primary journal literature. Importantly, HSDB is peer-reviewed by the Scientific Review Panel (SRP), a committee of experts in the major subject areas within the data bank's scope. The Comparative Toxicogenomics Database (CTD)30,31 contains manually curated data describing interactions of chemicals with genes/proteins and diseases. This database provides insight into the molecular mechanisms underlying variable susceptibility for environmentally influenced diseases
since currently 16 databases are integrated into TOXNET SYSTEM but only 3 were shown here ,how can i find the rem 13 databases. (NWUME)
-
All information in the Substance database is submitted by individual data depositors. However, the Compound database does contain information that are not submitted by data depositors,
Are there sample files for submitting substances in PUBCHEM since substance database can be submitted by individual data depositors (NWUME)
-
1.2. Primary databases vs. secondary databases Databases are often categorized into primary and secondary databases. Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”). In essence, these databases serve as archives that keep original data. Therefore, they are also known as archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data. These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature. Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data. For this reason, they are also known as curated databases, or knowledgebase.
Can we classify PubChem as secondary database since it collects data from other sources?
Esther
-
PubChem organizes its data into three inter-linked databases: Substance, Compound, and BioAssay (See Table 1), which can be searched from either the PubChem home page (https://pubchem.ncbi.nlm.nih.gov) or the web page of one of the three PubChem databases. Table 1. Three inter-linked databases in PubChem. Database URL Identifier Substance https://www.ncbi.nlm.nih.gov/pcsubstance SID Compound https://www.ncbi.nlm.nih.gov/pccompound CID BioAssay https://www.ncbi.nlm.nih.gov/pcassay AID Individual data contributors deposit information on chemical substances to the Substance database (https://www.ncbi.nlm.nih.gov/pcsubstance). Different data contributors may provide information on the same molecule, hence the same chemical structure may appear multiple times in the Substance database. To provide a non-redundant view, chemical structures in the Substance database are normalized through a process called “standardization” and the unique chemical structures are identified and stored in the Compound database (https://www.ncbi.nlm.nih.gov/pccompound). The difference between the Substance and Compound databases is explained in more detail in this blog post.
what is the difference between Upload ID, Registry ID,SID, CID and AID in PubChem? Esther
-
1.2. Primary databases vs. secondary databases Databases are often categorized into primary and secondary databases. Primary databases contain experimentally-derived data that are directly submitted by researchers (also called “primary data”). In essence, these databases serve as archives that keep original data. Therefore, they are also known as archival databases. Secondary databases contain secondary data, which are derived from analyzing and interpreting primary data. These databases often provide value-added information related to the primary data, by using information from other databases and scientific literature. Essentially, secondary databases serve as reference libraries for the scientific community, providing highly curated reviews about primary data. For this reason, they are also known as curated databases, or knowledgebase.
Can i allow my associates or reviewers to access my on-hold assay data? Esther
-
ChEMBL: literature-extracted biological activity information ChEMBL (https://www.ebi.ac.uk/chembl/)8,9 is a large bioactivity database, developed and maintained by the European Bioinformatics Institute (EBI), which is part of the European Molecular Biology Laboratory (EMBL). The core activity data in ChEMBL are “manually” extracted from the full text of peer-reviewed scientific publications in select chemistry journals, such as Journal of Medicinal Chemistry, Bioorganic Medicinal Chemistry Letters, and Journal of Natural products. From each publication, details of the compounds tested, the assays performed and any target information for these assays are abstracted. ChEMBL also integrates screening results and bioactivity data from other public databases (such as PubChem BioAssay) and information on approved drugs from the U.S. FDA Orange Book10 and the NLM’s DailyMed
Is it possible for ChEMBL to work when used to search for chemical structures? Esther
-
Public Chemical Databases These days many public online databases provide chemical information free of charge and the databases mentioned in this module are only a few examples of them. Note that these databases vary in size and scope. 2.1. PubChem: chemical information repository at the U.S. NIH PubChem (https://pubchem.ncbi.nlm.nih.gov)2-4 is a public repository of information on small molecules and their biological activities, developed and maintained by the National Library of Medicine (NLM), an institute within the U.S. National Institutes of Health (NIH). Since its launch in 2004 as a component of the NIH’s Molecular Libraries Roadmap Initiatives, it has been rapidly growing, and now serves as a key chemical information resource for researchers in many biomedical science areas, including cheminformatics, chemical biology, and medicinal chemistry. Detailed information on PubChem can be found in these three papers:
if i want to submit manuscript to a journal, what pubchem identifier can i use? Esther
-
The Protein Data Bank (PDB) is an archive of the experimentally determined 3-D structures of large biological molecules such as proteins and nucleic acids.
Aside protein and nucleic acid so protein Data Bank cannot be used to determine 3D structure of another biological molecule. ( NWUME)
-
-
pubchemblog.ncbi.nlm.nih.gov pubchemblog.ncbi.nlm.nih.gov
-
Unique Ingredient Identifiers (UNIIs) and pharmacological classifications were added from the U.S. Food and Drug Administration (FDA).
"Ingredient indentifiers". Are these similar to InCHi Keys or in a leauge of its own? Lyndsie
-
For example, to access the old version of the Compound Summary page for Aspirin:
The old summary page and the new don't appear appreciably different. Is this link redirecting to the new compound summary format? Michael
-
PubChem is organized as three interconnected databases: Substance, Compound, and BioAssay
Why is the focus on Substance and Compound but not BioAssay? Daniel
-
-
academic.oup.com academic.oup.com
-
The BioAssay database (https://www.ncbi.nlm.nih.gov/pcassay)
I followed the link here to find out more. I do not know what BioAssay means. Googling gives bio screening programs, but can anyone give me the gist of what this is? Lyndsie
-
Alternatively, PubChemRDF data can also be loaded into RDF-aware graph databases such as Neo4j, and the graph traversal algorithms can be used to query the PubChem knowledge graphs.
Does anyone know the site for Neo4j for graph databases? Amita
-
In 2-D similarity search, the similarity between chemical structures is quantified using the Tanimoto equation (22–24) in conjunction with the PubChem substructure fingerprint
What is the Tanimoto equationa and what is a significent of it? Amita
-
PubChem standardization process in which unique chemical structures are extracted from the Substance database and stored in the Compound database.
This is a very great idea that needs to be used more in other databases, they just give the data and not actualy try to make sure it is true. Emily
-
-
pubchem.ncbi.nlm.nih.gov pubchem.ncbi.nlm.nih.gov
-
178 deg C (sublimes)
Is it feasible a data quality score could be calculated by applying an algorithm across all sources contributing verified property information? That score could then be displayed as a relative confidence indicator within aggregators like pubchem. Michael
Tags
Annotators
URL
-
-
edspace.american.edu edspace.american.edu
-
Sublimation is one way to purify the sample, because caffeine has the ability to pass directly from the solid to vapor and reverse to form a solid all without undergoing the liquid phase. Caffeine has the ability to undergo sublimation under different conditions than the impurities, and can thus be isolated
This may be a reason PubChem reports Sublimation Point for under BP for Caffeine. Michael
-
-
pubchemblog.ncbi.nlm.nih.gov pubchemblog.ncbi.nlm.nih.gov
-
PubChem implements both manual and automated processes
If a Legacy tag doesn't neccesarily mean the content is outdated, no longer relevant or erroneous, how can its application to compounds reliably be automated? How often are contributions that are perfectly valid and current marked Legacy due to their vendors not updating Pubchem? Michael
-
If a data contributor is designated as “legacy”, all records deposited by the contributor are also designated as “legacy”.
Are their cost implications for vendors regarding the legacy tag for their submissions? Michael
-
Therefore, some records in PubChem can persist with outdated (or incorrect) data. To help identify such cases, we are introducing a “legacy” indication for contributors and their records. Please note that this does not mean that data identified as “legacy” is without value.
This might be a trivial question but, would this be the case were certain compounds are not found on some databases but can be found on PubChem? Sooyah
-
Therefore, some records in PubChem can persist with outdated (or incorrect) data. To help identify such cases, we are introducing a “legacy” indication for contributors and their records. Please note that this does not mean that data identified as “legacy” is without value. Quite to the contrary, some legacy collections successfully collected valuable scientific data for the research community, and are simply no longer updating the information.
How can we determined the data which are designated as Legacy are valuable or not? Amita
-
This “legacy” designation applies to project/contributors that appear to no longer be active, as well as to their individual records. This designation will help PubChem users quickly identify records that may have out-of-date information and/or hyperlinks.
This might sound a bit dumb. But, if it's out of date or not really validated, why not just remove it? Phuc
-
-
olcc.ccce.divched.org olcc.ccce.divched.org
-
Lipinski’sruleof5[50].Among them, 10.3 million (12% of the total) are fragment-like ones, which satisfy Congreve’sruleof3[51].
What are the Lipinski and Congreve rules? Emily
-
Therefore, 3-D neighboring may offer comple-mentary views on structural similarity between molecules withsimilar biological activities.
In researching on the 2-D and 3-D neighbors, i couldn't find which is better because they both have their advantages and disadvantages. Plus since they rely on different methods, which one is used more often? Daniel
-
Forexample, ChEMBL [49] manually extracts bioactivity datafrom peer-reviewed papers published in journals in themedicinal chemistry and natural product domains.
Based on this passage in Section 2.2 on Bioactivity , it will be preferable to use ChEMBL than PubChem for high quality data extraction. Daniel
-
-
pubchem.ncbi.nlm.nih.gov pubchem.ncbi.nlm.nih.gov
-
he table of contents on a PubChem summary page lists the categories of information that are available for the particular substance or compound. A PubChem Substance summary page is based on the data submitted by an individual depositor. A PubChem Compound summary page, on the other hand, displays data organized by NCBI automated data processing, serving as a hub of information for each unique chemical structure. PubChem Compound summary pages therefore tend to list more topics in their table of contents.
Looking at Table of Contents to distinguish PubCherm compound from PubChem substance. Daniel
-
-
support.ncbi.nlm.nih.gov support.ncbi.nlm.nih.gov
-
The PubChem Substance database contains substance information (often including the chemical structure) that was submitted to NCBI by individual submitters (depositors). The PubChem Compound records comprise a non-redundant set of standardized and validated chemical structures. A PubChem Compound record may link to more than one PubChem Substance record if different depositors supplied the same structure. Chemical names shown in PubChem Compound records are a composite derived from all linked substances, with default ranking of names by weighted frequency of use.
This is the best explanation, I have found, of the difference between PubChem substance and PubChem compound databases. Emily
-