6 Matching Annotations
  1. Jun 2024
    1. The Cytoscape app DomainGraph (22) visualizes domain interactions simultaneously with protein interactions and analyzes the effect of differential exon usage. However, DomainGraph is limited to the output of the tool AltAnalyze (22). Ghadie et al. developed DIIP using a similar method to predict an isoform interactome (18). While their results were verified based on the experimentally validated isoform interactome reported by Yang et al. (1), their database covers only a fraction of the proteome with 2944 reference proteins and 4363 interactions. Exon Ontology (EXONT) characterizes protein domains and features that are affected by AS (23) but does not consider AS on the network level.

      The available isoform-resolved network tools. Pretty much there is DomainGraph (Nathan's tool, deprecated). Ghadie made DIIP but specific to the isoform. There are exon/splice event-centric tools, but these do not capture the gene-to-proteoform relationship. Characterization of exon events are fragmentary, because multiple variations can exist on the same PF.

    1. Notably, the current interactome only includes a single isoform of each gene product with limited annotations about which splice isoform is examined.112 It is widely accepted that different spliced forms can lead to marked changes in phenotypes as well as altered PPIs.113 A disease-associated isoform of lamin A for Hutchinson-Gilford progeria syndrome is a striking example of how its interactions with other proteins differ from those of a non–disease-associated isoform.114 Davis et al115 hypothesize that most human PPIs could be modulated by alternate splicing. Systematic profiling of all biologically relevant isoforms116 and their PPIs at a proteomic level is, therefore, much needed.

      Loscalzo commentary of the importance of isoform-resolved networks. They highlight an example of differential PPIs for lamin A (that make sense?) - could we use as our example?

    1. Proteins are curated at the sequence level, using the UniProtKB database as the reference resource for proteins and peptides. The use of UniProtKB enables the curator, for each publication, to accurately describe the level of detail provided about the proteins and to use identifiers for the unambiguous annotation of each protein interactor. For instance, a publication may only give enough detail for an interactor to be mapped to any or all of the protein isoform products of a specific gene, or more specifically to a single protein isoform, or to a post-translationally cleaved peptide chain. UniProtKB supplies appropriate identifiers for all of these, and in each case supplies the corresponding underlying sequence. Binding regions can be aligned to known protein domains, as described by InterPro28. The effects of point mutations can be captured down to the amino acid level, using a CV to describe their effect on an interaction. To capture this level of detail, the use of a high-quality protein reference resource is essential. Reverse engineering protein to gene identifiers to enable network analysis of, for example, RNA-Seq data is a relatively trivial task but it is considerably more difficult, if not impossible, to map isoforms and binding domain data directly to a gene model or genomic sequence. Databases that curate PPI data directly to gene identifiers simply do not capture this wealth of information.

      Cross-referencing nodes to proteins (protein isoforms) annotated in UniProt.

    2. Characterising protein isoforms and featuresMost eukaryotic protein-coding genes transcribe more than one isoform. The different functions of isoforms are sometimes known or can be inferred (for example specific isoforms do/do not contain certain functional domains), but in many cases the biological significance of multiple isoforms derived from the same gene is not understood. However, the different interaction patterns of associated isoforms may provide an indication of their different biological functions by analysing their respective binding partners. In 2013, Talavera et al.57 published an editorial stating that “it is crucial to the advance of basic and medical research that interactions are reported on an isoform-to-isoform basis and that databases switch to a similar approach”. The IMEx databases curate this information, whenever the data is made available by authors, making isoform comparisons possible. UniProtKB identifiers enable curators to differentiate between transcripts being identified at the isoform or canonical (reference sequence) level. Over 100,000 interactions in IMEx (~12% of IMEx data) contain specific isoform information, with more than 11,000 records containing specific isoform–isoform interactions. The UniProtKB database recently (release 2020_02) refactored the Interaction section of their records to improve the display of isoform data imported from IMEx. It is anticipated that the availability of such data will increase as protein identification techniques improve or as authors realise the value of such data and include this level of detail in publications.IMEx also captures so-called negative interactions, which will be of increasing use in the future. These data largely pertain to isoform-specific interactors, and describe cases where certain isoforms of a gene bind to a bait protein, while other isoforms of the same gene do not bind to the same bait in the same assay system. IMEx curation rules mandate publication of the protein expression levels of the negative interactors to exclude poor protein expression as a reason for the lack of interaction.To fully comprehend protein interactions, researchers frequently need to identify the sequence region to which a molecule binds and any modification to that sequence. Any change to an amino acid sequence has the potential to influence the molecules with which the protein interacts. The IMEx Consortium captures these variations, thereby supporting the analysis of their downstream effects as shown in the examples below.

      Commentary on capturing isoform resolved information in networks.

  2. May 2024
    1. Benchmarking tool-kit assays using reference setsTo develop a confidence score, we characterized assay performance using a positive reference set (PRS) and a random reference set (RRS) for protein interactions20. Our first version of a human PRS (hsPRS-v1) contain 92 interacting human protein pairs for which we found more than one peer-reviewed publication in multiple manually curated databases21–25 (details in Supplementary Methods online). Apart from verification of the curation reports26 and ensuring ORF availability in the human ORFeome1.127, we applied no additional filters, so interactions between membrane proteins, ligand-receptor pairs or those dependent on post-translational modifications (PTMs) were all included. HsPRS-v1 thus constitutes a reasonable representation of well-established human binary interactions. For our first version of a human RRS (hsRRS-v1), 92 protein pairs were chosen randomly from the human ORFeome1.1 (108 pairwise combinations) after removing all previously described interacting pairs21–25. Because there is no available gold standard for non-interacting proteins and because randomly chosen protein pairs are unlikely a priori to interact, our RRS serves as a negative control set. Alternative approaches for choosing negative training examples are possible, but introduce unacceptable biases28,29.We tested all pairs of the reference sets by tool-kit assays evaluating the effect of assay stringency on the detection of PRS and RRS pairs (Supplementary Fig. 2 online). The use of 184 controls, as opposed to the small number usually used to characterize interaction assays, increases robustness. The receiver-operating characteristics (ROC) curve of four tool-kit assays illustrating the tradeoff between true and false positive rates as a function of stringency are shown in Fig. 3a. For the analysis of assay performance we used a threshold that maximized detection of PRS while maintaining a low number of positive scoring RRS.

      Background on how PRS and RRS was created.

    1. Given the high technical quality of these data sets shown here, interacting pairs that do not share known functional annotations could be promising candidates for biological discovery, particularly true biological interactions that involve proteins currently lacking adequate functional annotations, or they could be true biophysical interactions that do not occur physiologically. We call this latter class 'pseudointeractions', by analogy to pseudogenes. Pseudointeractions could correspond to ancient biological interactions that have evolved to lose physiological relevance and provide interesting insights into the evolution of the interactome.

      Another type of non-biologically relevant interactions are "pseudointeractions".