Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.
Learn more at Review Commons
Reply to the reviewers
Firstly, we would like to thank the reviewers for their time and efforts in critiquing this paper. The reviewers addressed our study to be significant, but also presented great suggestions to improve our manuscript, mainly the comparison of mRNA and eRNA for predicting subtype specificity and prognosis, the integration with independent validation datasets, etc. Our preliminary analyses showed that our classified mRNAs can predict subtypes better which is not surprising, as these subtypes were initially discovered using mRNA differences. Hence, we employed a novel approach of associating these classified mRNA and eRNA with distance and identified 71% classified eRNAs are associated with classified mRNAs. We also propose to integrate the datasets with PEGS (Briggs et al 2021) to achieve better mRNA-eRNA association and Perturb-seq validated regions to achieve functional validation of the eRNA loci. We believe that our potential improved integrative analyses will improve the novelty and power of our findings, as this is an unique approach which is employed in patient samples-based high resolution eRNA atlas for the first time. We have addressed most of the other major and minor comments of the reviewers and have provided the preliminary revised manuscript.
Reviewer #1
Evidence, reproducibility and clarity
Summary<br /> This study assesses eRNA activity as a classifier of different subtypes of breast cancer and as a prognosis tool. The authors take advantage of previously published RNA-seq data from human breast cancer samples and assess it more deeply, considering the cancer subtype of the patient. They then apply two machine learning approaches to find which eRNAs can classify the different breast cancer subtypes. While they do not find any eRNA that helps distinguish ductal vs. lobular breast cancers, their approach helps identify eRNAs that distinguish luminal A, B, basal and Her2+ cancers. They also use motif enrichment analysis and ChIP-seq datasets to characterize the eRNA regions further. Through this analysis, they observe that those eRNAs where ER binds strongest are associated with a poor patient prognosis.
Major comments:
Part of the rationale for this study is the previous observation that eRNAs are less associated with the prognosis of breast cancer patients in comparison to mRNAs and they claim that the high heterogeneity between breast cancer subtypes would mask the importance of eRNAs. In this study, the authors solely focus on eRNAs as a classification of breast cancer subtypes and prognostic tool and do not answer whether eRNAs or mRNAs are a better predictor of cancer subtypes and of prognosis. Since the answer and the tools are already in their hands, it would be important to also see a comparative analysis where they assess which of the two (mRNAs or eRNAs) is a better predictor.
Response: We appreciate the reviewer for this valid point about comparing the prognostic eRNAs vs mRNAs. Our study doesn’t imply that eRNA markers are better than mRNAs in predicting subtype specificity and/or prognosis, but our motivation for working with eRNAs is that they can be used to define relevant transcriptional regulators and prognosis generally if they are subtyped. As the molecular subtypes in breast cancers were established using gene expression datasets, mRNAs would perform better as predictors of subtypes and or prognosis. However, identifying regulatory networks with emphasis on transcription factor binding motif analyses is not achievable using mRNA datasets. Analysing the active enhancer regions with eRNA transcription will provide high resolution landscape of TF and epigenetic networks. These sorts of analyses usually require ATAC-seq or H3K27ac datasets, but these assays need fresh frozen tissue material and laborious experimental designs compared to RNA-seq datasets. Furthermore, eRNA-transcribing enhancers represent highly active enhancers, while ATAC and H3K27ac datasets can identify all enhancers, which can be inactive or poised, but captured due to the dynamic nature of enhancers. We demonstrate that traditional RNA-seq datasets mapped on active enhancer regions showing eRNA transcription would be sufficient to identify the highly active TF network and gene-enhancer regulatory frameworks in a subtype-specific manner, hence emphasising the potential of eRNA studies.
Hence, the scope of our study is not to establish which RNA can predict subtype and survival, but to demonstrate the potential of studying eRNAs in patient samples using traditional RNA-seq assays. This study would be beneficial for epigenetics biologists of how enhancer transcription can be associated with gene regulation through deregulated transcription factor networks in patients. The above section had been included in the discussion in the revised manuscript.
As the comparative analyses suggested by the reviewer will substantiate the potential of eRNAs being studied as cancer prognostic markers, we performed identical methodologies with our machine learning approaches on the published TCGA mRNA-seq datasets, identify the subtype-specific mRNAs as well as prognostic mRNAs and perform the comparative analyses of eRNAs and mRNAs. As we expected, mRNAs indeed perform better in associating with subtype specificity than eRNAs as we could identify more subtype-specific mRNAs with better statistics metrics. The results exhibit great separation across subtypes (Basal, Her2, LumA/B) as well as Ductal vs Lobular.
We believe that eRNA and mRNA are complementary but not comparative to predict subtype-specific survival. To address this in the revised manuscript, we performed an initial selection of the eRNAs associated with their corresponding subtype-specific mRNAs within 50 kb distance which can be integrated with the above analyses, based on the suggestion from reviewer 3. In our preliminary analysis, around 71% of eRNAs are associated with the subtype-specific mRNAs and we also observed an observable separation of ductal and lobular subtypes using this method.
Furthermore, we integrated our enhancer RNAs with the key enhancer regions which show significant impact on gene transcription, as shown in single cell CRISPRi screens (Perturb-seq) datasets derived from ATAC-matched H3K27ac datasets verified on one ER+ and one ER- breast cancer cell lines (Wang et al., Genome Biology 2025, https://genomebiology.biomedcentral.com/articles/10.1186/s13059-025-03474-0) . Our initial analyses identified at least 29 regions from the Perturb-seq datasets overlapping with 72 and 5 eRNAs of subtype classification and Her2 survival respectively.
For the revised manuscript, we will perform the mRNA-eRNA association in a detailed manner and include the data. We will also employ our well-established tool for associating mRNAs and noncoding elements, Peak set Enrichment in Gene Sets (PEGS, Briggs et al., F1000 research, 2021 https://f1000research.com/articles/10-570/v2 ). We hypothesise that this will improve the power of the classification models used in the study and will also provide gene-enhancer RNA interaction landscape in patient samples for the first time. Furthermore, we will integrate the activity of these eRNA-mRNA pairs with chromatin accessibility and enhancer activity using ATAC-seq and H3K27ac ChIP-seq datasets to establish more robust active regulatory networks in patient samples. We will also perform motif analyses on the published ATAC-seq peaks (performed on TCGA-BRCA patient samples, Corces et al., 2018) close to the eRNA loci to identify the TF networks with better precision, hopefully unravelling novel and relevant subtype-specific TFs in an efficient manner, better than our original work. Furthermore, as an experimental functional validation of our classified eRNAs, we will investigate the regulatory effect of 29 Perturb-seq overlapped regions. Hence, our revised manuscript will potentially provide a comprehensive validated list of enhancer RNA regions which are highly active, actively transcribing, subtype and survival specific regulatory networks in breast cancer patients for the first time.
The authors run the umaps of Fig. 1C only taking the predictor eRNAs. It is then somewhat expected to observe a separation. Coming from a single-cell omics field, what I would suggest is to take the eRNA loci and compute a umap with the highly variable regions, perform clustering on it and assess how the cancer subtypes are structured within the data. This would give a first overview of how much segregation and structure one can have with this data. Having a first step of data exploration would also strengthen the paper. If the authors have tried it, could the authors comment on it?
Response: We appreciate the reviewer for sharing their experience from single cell omics analysis. In our case, following the scRNA like pipeline is not appropriate, given the focus of our study on identifying markers on the already annotated subtypes. Basically, we aim to assess the quality of the identified markers (the quality is quantified by the statistics provided for random forest classification), and we see that the data is well-separated in PCA using only PC1 and PC2. We showed the umap (using PC1 and PC2) for better visualization in the original manuscript and we included the PCA plots in the revised manuscript.
'neither measures could classify any distinct eRNAs for invasive ductal vs lobular cancer samples' S1B. Just by eye, I can see a potential enrichment of ductal on the left and on the right while lobular stays in the center. This suggests to me that, while perhaps each eRNA alone does not have the power to classify the lobular vs ductal subtype, perhaps there is a difference - which could result from a cooperative model of eRNA influence - that would need further exploration. Would a PCA also show enrichments of ductal vs. lobular in specific parts of the plot? It may be worth exploring the PC loadings to see which eRNAs could play an influence. In this regard, a more unbiased visual examination, as suggested in my previous point, could help clarify whether there could be an association of certain eRNAs that cannot be captured by ML.
Response: The subtypes of cancer patients (Basal, Her2, LumA/B) possess clear differences in mRNA expression in breast cancer studies. Given the fixed annotations of the subtypes in the patient datasets, we applied our methodologies on mRNA datasets, and the results exhibited great separation across subtypes (Basal, Her2, LumA/B) as well as Ductal vs Lobular. In addition, 70% of subtype-specific eRNAs are located next to mRNA. This ensures that we detected proper eRNA markers. Furthermore, Random Forest is the standard and powerful non-linear classifier for these types of classifying questions. Therefore, we hypothesized that the data which can distinguish Ductal vs Lobular does not exist in the used eRNA dataset. We only detected 38 subtype-specific mRNAs using information gain with standard cutoff 0.05 which they have classifying power across ductal-lobular. With this standard cutoff only one eRNA-associated gene was detected. To explore more, we used low cutoff for information gain (0.01) and then took only the eRNAs which are located near classified mRNAs (up to 50KB). In this way, we detected 96 eRNA candidates linked to 8 classified mRNAs. These 96 eRNAs could, to some extent, classify ductal vs lobular (PCA plots attached above). This observation can further verify that if a more comprehensive eRNA dataset exists, we could detect better eRNA markers and cover more (probably all) mRNA markers. Hence, cooperative model of eRNA as suggested by the reviewer can't be achieved and random forest is one of the efficient tools to decipher the cooperation if it exists. Besides, as we demonstrated in this paper that eRNA is a complementary dataset to mRNA which can assist in the identification of regulatory networks. For the revision, we will provide more detailed eRNA-mRNA associations using integration with PEGS and Perturb-seq validated regions, in both subtype classification and survival and will motivate the potential similar studies for ductal vs lobular in the discussion.
"we employed machine learning approaches on 302,951 eRNA loci identified from RNA-seq datasets from 1,095 breast cancer patient samples from previous studies" - the previous studies from which the authors take the data [11,12] highlight the presence of ~60K enhancers in the human genome and they use less than that in their analysis. Could the authors please clarify the differences in numbers with previous studies and give a reasoning?
Response: ~300K enhancers are derived from ENCODE H3K27ac datasets which represents all active enhancer regions marked by H3K27ac (Hnisz et al., 2013). This is a high-resolution map of eRNA loci ever presented. In Chen et al 2020, 1,531 superenhancers representing 30K eRNA loci was utilised for exploratory analysis, and the findings were generalised back to the 300K set. 65K enhancer loci covers tissue-specific enhancers initially identified by FANTOM CAGE datasets and this subset provide limited regions of eRNA expression. Hence, our analyses on ~300K eRNA loci provide unbiased information on subtype specificity and gene-TF regulatory networks. The differences had been highlighted in the methods and results in the revised manuscript.
Also, from the methods section, they discard many patient samples due to low QC, so, from what I understand, the number of samples analyzed in the end is 975 and not 1,095.
Response: We thank the reviewer for pointing this out and we have updated the numbers in the revised manuscript.
Minor comments:
• Can the authors please state the parameters of the umap in methods? Although it could be intrinsic to the dataset, data points are grouped in a way that makes me think that the granularity is too forced. Could the authors please show how the umap would behave with more lenient parameters? Or even with PCA?
Response: We used ‘umap’ function from umap package (with default parameters) in R using only PC1 and PC2, hence the granularity is not forced. As suggested by the reviewer, we have now added PCA plots in the main figures (Fig. 1E) and moved all the umap plots to the Supplementary figures (Fig.S1B) in the revised manuscript.
• 'Majority of the basal' -> The majority of the basal.
Response: We thank the reviewers for noticing the typo and we corrected this in the revised manuscript.
Significance
This is a paper relevant in the cancer field, particularly for breast cancer research. The significance of the paper lies in digging into the breast cancer samples, taking the different existing subtypes into account to assess the contribution of eRNAs as a classifier and as a prognostic tool. The data is already available but it has not been studied to this degree of detail. It highlights the importance of characterizing cancer samples in more depth, considering its intrinsic heterogeneity, as averaging across different subtypes would mask biology. My expertise lies in gene regulation and single-cell omics. My contribution will therefore be more focused on the analysis and extraction of biological information. The extent of its specific relevance in cancer research falls beyond my expertise.
Response: We appreciate the reviewer for understanding our efforts to bring out the importance of subtyping and to explore the association of eRNA in breast cancer transcriptional gene regulatory networks.
Reviewer #2
Evidence, reproducibility and clarity
Summary<br /> Enhancer RNAs (eRNAs) are early indicators of transcription factor (TF) activity and can identify distinct molecular subtypes and pathological outcomes in breast cancer. In this study, Patel et al. analysed 302,951 polyadenylated eRNA loci from 1,095 breast cancer patients using RNA-seq data, applying machine learning (ML) to classify eRNAs associated with specific molecular subtypes and survival. They discovered subtype-specific eRNAs that implicate both established and novel regulatory pathways and TFs, as well as prognostic eRNAs -specifically, LumA and HER2-survival- that distinguish favorable from poor survival outcomes. Overall, this ML-based approach illustrates how eRNAs reveal the molecular grammar and pathological implications underlying breast cancer heterogeneity.
Major comments
1. The authors define 302,951 eRNA loci based on RNA-seq data, yet it is widely known that many enhancers reside in proximity to promoters or within intronic regions (examples presented in Fig. 3B and S3). Consequently, it seems likely that reads mapped to these regions might not truly represent eRNA signals but include mRNA contamination. Could the authors clarify how they ensured that the identified eRNAs were not confounded by mRNA reads? What fraction of these enhancer loci is promoter proximal or intronic? How does H3K4me3, a well-established and standardized active promoter histone mark, behave on these loci? The reviewer considers it important to confirm that the identified eRNAs are indeed of enhancer origin rather than promoter transcripts.
Response: For this study, we utilised pan cancer atlas-based published work (Chen et al 2018 and 2020) where the abundant RNA signals on intronic and intergenic regions are included, and promoter-based signals are excluded. These studies utilise the advantage of identifying eRNAs on large sample size and the possibility of mRNA being on introns in 1000s of patient samples is very low. A clarification of this concern had been discussed in the Introduction of these studies as follows: “because eRNA reads associated with real enhancer activity recurrently accumulate, whereas background transcription noise tends to occur stochastically. The large number of RNA-seq reads obtained would compensate for the statistical power compromised by the low eRNA expression level typically observed in a single sample.” We included clarification of this concern in the discussion. Furthermore, as per the reviewer’s suggestion, we examined the distribution of the eRNA loci across the genome and found that majority of eRNA regions are located on introns and intergenic regions. This figure had been included in the Supplementary Fig. S6A.
2. In Fig. 1B, the F measure (0.540) of the Basal subtype using the Logmc method contradicts its extremely high precision (1.000) and sensitivity (0.890). The authors need to clarify the exact formula or method used to compute F1 and the discrepancy in the reported metrics for this subtype and perhaps other subtypes as well.
Response: We apologise for the mistake in this section and thank the reviewer for pointing this out. We included the formulas for each statistical metric in the method section of the manuscript. The F-measure was mentioned wrong which led to the confusion here. The figure had been corrected with the F-measure of 0.94 in the revised manuscript.
3. As shown in Fig. 4C, S4B, and most, if not all, tracks of Fig. S3, ER binding regions are not annotated as eRNA loci. It seems, in this reviewer's opinion, very unlikely that this is because they generally lack eRNA expression, but rather they do not express polyadenylated eRNA (typically 1D eRNA), which is captured in this dataset. The reviewer posits that these enhancers produce more transient, non-polyadenylated 2D eRNA. It has been widely documented in prior studies that ER-bound enhancers exhibit bimodal eRNA expression patterns [e.g., Li, W. et al. Functional roles of enhancer RNAs for oestrogen-dependent transcriptional activation. Nature 498, 516-520 (2013)]. Could the authors address this opinion and elaborate on how the restriction to polyadenylated transcripts might underrepresent enhancers regulated by ER and other TFs and whether this bias impacts the overall findings?
Response: The authors appreciate the reviewer’s suggestion to address the caveats of using polyadenylated eRNAs to identify the ER binding patterns. TCGA eRNA atlas with polyadenylated eRNAs indeed possesses this disadvantage of using polyadenylated eRNAs for this study, however currently there are no data available with bidirectional transcripts in any breast cancer patient samples. The tools to profile these RNAs are not robust enough to be performed on frozen cancer tissue samples which are extremely limited in their size and availability. By utilising the polyadenylated eRNA-seq datasets, we might not only lose the accuracy of ER binding patterns, but also for other transcription factors which activate/associate with bimodal expression around enhancers. However, our integrative analysis on stable polyadenylated eRNA loci can still identify the most-relevant TF networks of each subtype.
Furthermore, we validated this finding by analysing our own datasets of KAS-seq which represents any active transcribing bidirectional enhancers from MCF7 cell line. Independently, we also incorporated ATAC-seq, H3K27ac ChIP-seq, CAGE and GRO-seq data on the gene profiles in Fig. S3 to associate the eRNA regions identified in polyadenylated RNA datasets with ER binding sites in patients and published bidirectional transcripts in the preliminarily revised manuscript. We observed that all the ER binding sites are accompanied by open and active enhancer marks with bidirectional transcription (either GRO- or CAGE positive) but they are not on the exact location of eRNA regions. Subtype-specific eRNA regions close to genes like MLPH and XBP1 possess both active bidirectional transcribing ER bound sites far away (around 1.5 kb) from subtype-specific eRNA loci and bidirectional transcribing ER unbound sites. However, these distal ER binding sites are close to the regions from the list of 300K eRNA loci and they were simply not identified as subtype-specific regions. Hence, it can be true that the occupancy of ER might not be present on all subtype-specific eRNA loci, but our subtype-specific eRNA sites are representative of bidirectional transcription.
Upon the suggestion from the reviewer, we discussed the potential of identifying TF networks by analysing the 1D eRNAs, in the revised manuscript.
4. Despite the unsatisfied performance of the ML approach on classifying Her2 subtypes, the hierarchical clustering performed in Fig. 2A and S2A appears to show a reasonable separation of Her2 subtypes, showing as a clustered green band. Could the authors quantitatively assess how effective this clustering results and compare that to the ML outcome? (OPTIONAL)
Response: The authors acknowledge this interpretation from the reviewers. Using both the measures, our ML platform can identify markers for Her2 subtype but some of the statistical metrics are poor. As the heatmaps were performed based on these identified Her2 markers, a separate analysis on this cluster would not be much informative. The poor metrics for Her2 classification was already justified, partly due to the low number of Her2+ patients in the cohort.
5. In Fig. 4 and S4, the authors reported to have enriched binding or motif of TFs, e.g., FOXA1, AP-2, and E2A, specifically at enhancer loci with low eRNA level, which conflicts with their established roles as transcriptional activators. The reviewer asks for an address as to why these factors would be associated with basal low-eRNA regions and whether any additional data might clarify their functional role in these contexts.
Response: The authors appreciate the reviewer’s concern, but we would like to clarify that eRNAs which are less expressed in basal subtype are classified as basal low. These regions show high expression in luminal patients. Hence, there is a strong overlap of basal low and luminal high regions. FOXA1 and AP2 factors are strongly established coactivators in luminal ER+ transcriptional signaling, hence they are associated with basal low eRNA regions. We clarified this in the discussion and provided more literature evidence in the revised manuscript to demonstrate the strong role of FOXA1 and AP2 factors in ER+ luminal breast cancer transcriptional response.
6. Regarding Fig. 4B, the authors state that "ER binding occupies only the strongest ssDNA and GRO-seq-positive sites". Firstly, the GRO-seq data quality is poor with indiscernible peaks. This may be insufficient for a qualified representation of nascent eRNA expression. More importantly, it appears each heatmap is ranked independently, so top loci for ssDNA are not necessarily top loci for GRO-seq, ER, Pol-II, or H3K27ac. The reviewer requests clarification on how the authors plot these heatmaps and questions whether the statement is supported by the analysis as presented.
Response: We acknowledge the reviewer’s concern and based on their suggestion, we utilised another set of GRO-seq datasets which is more deeply sequenced and published by the same lab. The average plot from these new datasets showed better profile. We also apologize for not providing enough details of how we generated the heatmaps in Fig. 4B. The heatmaps were made separately for each profile to auto scale with their own intensity levels but the order of the regions is based on KAS-seq intensity. The order of these regions was kept the same between each profile. Hence, top loci of ssDNA are not exact top loci of GRO, ER, H3K27ac and Polymerase but top loci of ssDNA also show similar high intensity in GRO, ER, H3K27ac and Polymerase, hence correlated. We also removed regions which belong to blacklisted regions of hg38 and the regions which were over-sequenced due to amplifications and showed weird signals. We provided the new heatmaps and profile plots in the revised manuscript with different clusters of KAS-seq intensity. We also updated the methods section to clarify how these heatmaps were made.
7. In Fig. S4B and the third plot of 4C, the averaged histogram of ER binding appears in multiple sharp peaks with drastic asymmetric positioning around the enhancer centre, which is highly atypical of most published ER ChIP-seq profiles. Could the authors discuss possible "spatial syntax" or directional patterns of ER binding in relation to eRNA loci and cite any literature showing a similar pattern? Further evidence is required to substantiate these observations, as they are remarkably unique.
Response: The authors agree with the reviewer’s point about asymmetric peaks of ER on the luminal specific eRNA regions. Due to the nature of the average profile plots and the number of regions explored here are so low, the profiles look asymmetrical and different than the published literature. Heatmaps lose their resolution when made on a very low number of regions. The focus of this analysis is to highlight that the ER is not binding to the centre of eRNA loci which is contradictory to the published findings from in vitro studies, but further away on these subtype-specific regions. We don’t have any solid evidence to demonstrate the directional patterns of ER binding related to this data. To avoid any confusion, we removed these average plots but focused on the already existing single gene profiles in Fig. S3 and discussed our interpretations in detail.
Minor comments<br /> 1. When introducing eRNAs, the reviewer recommends mentioning that 1) eRNA levels correlate with enhancer activity and 2) eRNA expression precedes target gene transcription, thus reflecting upstream regulatory events. Relevant references include: Arner, E. et al. Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells. Science 347, 1010-1014 (2015); Carullo, N. V. N. et al. Enhancer RNAs predict enhancer-gene regulatory links and are critical for enhancer function in neuronal systems. Nucleic Acids Res. 48, 9550-9570 (2020); Kaikkonen, Minna U. et al. Remodeling of the Enhancer Landscape during Macrophage Activation Is Coupled to Enhancer Transcription. Mol. Cell 51, 310-325 (2013).
Response: These are great recommendations from the reviewer, and we included the suggested publications in the Introduction section of the revised manuscript.
2. H3K27ac is used initially to define these regulatory loci, and like eRNAs, H3K27ac also varies among patients. Which H3K27ac dataset(s) were used initially, and could this approach potentially overlook patient-specific enhancers? (OPTIONAL)
Response: This is a totally valid point from the reviewer. The idea of this project is to define common subtype-specific enhancers which can be regulatory and prognostic, hence can be developed further as biomarkers providing benefit for more patients in the future. Hence, investigating the common enhancers which are activated in multiple normal and cancer cell lines defined by ENCODE is more valid than patient-specific enhancers whose activity might be influenced by specific genetic alterations. There is very limited availability of H3K27ac ChIP-seq datasets from cancer patients to explore the patient-specific enhancers, and our analyses were totally based on the published work, hence not possible to fully address this concern. The source of the H3K27ac ENCODE datasets (from 86 human cell lines and tissue samples) is clarified in the revised manuscript.
3. In addition to the overall metrics displayed in Fig. 2B, could the authors provide precision and sensitivity values for LumA and LumB separately under the Logmc method, given the observation in Fig. 2E that LumA and LumB are not well separated in the UMAP projection?
Response: The authors appreciate the suggestion from the reviewer. We have included the metrics separately for LumA and LumB in the revised manuscript in Fig. S1D.
4. Could the author elaborate, in the discussion section, on why there is a substantial difference in ML performance depending on whether InfoGain or Logmc is used?
Response: We have included the following text in the discussion to explain the differences between these two measures.
“InfoGain measure work with the approach of binarization with k-means (k=2). It has the potential to capture both strongly expressed eRNAs which are differential between subtypes as well as low expressed sparser on and off eRNAs. In the first case, although eRNA is highly expressed in all patients, the higher expression mode becomes 1 and the lower expressed mode become 0. However, in case of low expression, more on and off expression, recentered logmc would not generate a striking high value. Furthermore, binarization is also a strong process to perform better clustering and classification, as distinguishing between data points gets better and clearer. “
5. How does the expression pattern of Basal high, Basal low, Her2, and Lum eRNA clusters behave differentially in Basal, Her2, and LumA/B subtypes? Are Basal high eRNAs downregulated in Her2 or Lum subtypes, and vice versa? Since many downstream analyses rely on these eRNA clusters, it is suggested to include a heatmap and/or boxplot that displays how each eRNA category is expressed in each subtype to confirm that these definitions are consistent.
Response: We thank the reviewers for this suggestion and apologise for not providing enough clarification on the expression of eRNAs in other subtypes. Indeed, Basal high expressed eRNA are expressed low in LumA and LumB and Basal low expressed eRNAs are expressed higher in lumA and lumB. Her2 subtype-specific eRNAs has a trend of expression between Basal and Lum, as it can be seen in the umap and PCA. Basically, the Basal high expressed eRNAs are Lum lower expressed eRNAs, and the Basal low expressed markers are Lum higher expressed markers. As per the suggestion from the reviewer, we provided heatmaps on eRNA expression of each subtype-specific with regulation in other subtype patients in figure S2F-K.
Referee cross-commenting
I share Reviewer #1's opinion that the manuscript should assess whether mRNA or eRNA is the stronger predictor of breast cancer subtypes and clinical outcomes. It will greatly improve the novelty if eRNA is shown to be a better indicator for cancer characterization.
Also, I strongly concur with Reviewer #3 that the current informatics approach is superficial and that several conclusions are contentious. The authors need to resolve the inconsistencies in their ML statistics and the potentially misleading interpretations of the ChIPseq and motif enrichment results.
It is further recommended that, building Reviewer #3's comment, the study integrate eRNA signatures with their proximal genes to address 1) whether genes located near these enhancers are differentially expressed-and correlated with enhancer activity-across cancer subtypes, and 2) whether it provides insights into understanding the enhancer-gene regulatory architecture in a subtype-specific context.
Response: We thank reviewer 2 for cross-commenting on reviewer 1 and 3’s suggestions. Indeed, these are interesting points to cover and will increase the novelty of the study. Based upon these suggestions and discussed earlier for reviewer 1’s comments, we will explore the comparison of mRNAs vs eRNAs as predictor of cancer subtypes and prognosis and the association of genes-eRNAs in cis as discussed in other reviewer’s comments. Our preliminary analyses show a strong association of eRNA and mRNA specific to subtypes and an observable separation on subtypes which were harder to classify markers using eRNAs alone. Hence, we will improve these analyses, and the manuscript further as discussed above in the final revision.
Significance
General Assessment
This study provides insights into the potential use of eRNA to classify breast cancer subtypes and refine prognostic markers. A strength is the integration of large-scale RNA-seq data with machine learning to identify eRNA signatures in biologically-meaningful patient samples, revealing both established and novel TF networks. The study also discovered eRNA clusters that correlate with the survival of patients, thus providing strong clinical implications. However, the ML approach yields several inconsistencies-for instance, unsatisfactory classification results for the Her2 subtype as well as the confused statistical metrics in the results. Furthermore, the ML model struggles to differentiate more nuanced molecular classes (e.g., LumA vs. LumB) and higher-level histological subtypes (e.g., lobular vs. ductal), thus limiting its power to dissect more delicate pathological and molecular mechanisms. Another limitation worth noting of this ML approach is the exclusive use of only polyadenylated eRNAs via RNA-seq, which excludes perhaps the more prominent 2D eRNA expressed in regulatory enhancers. Moreover, certain datasets appear to be of suboptimal quality, leading to assertions that would benefit from additional supporting evidence. Altogether, while the study offers a promising angle on eRNA-based tumor stratification, more robust experimental validations are needed to resolve inconsistencies and clarify the mechanistic underpinnings.
Advance<br /> Conceptually, the study highlights the potential for eRNA-based signatures to capture regulatory variation beyond classical markers. However, the utility of these signatures is constrained by the focus on polyadenylated transcripts alone, likely underrepresenting key enhancer regions, and certain evidence presented in this study is not substantial enough to support some statements. While the work adds an important dimension to the understanding of enhancer biology in breast cancer, the resulting insights are partly hampered by limitations in data coverage and quality.
Audience<br /> The primary audience includes cancer epigenetics, functional genomics, and bioinformatics researchers who are interested in leveraging eRNAs as biomarkers and dissecting complex regulatory networks in breast cancer. Clinically oriented scientists focusing on molecular diagnostics may also find relevance in the authors' approach to stratify subtypes and outcomes. The research is most relevant to a specialized audience within basic and translational cancer genomics, as well as computational biology groups interested in eRNA analysis.
Field of Expertise
I evaluate this manuscript as a researcher specializing in cancer epigenetics, functional genomics, and NGS-based data analysis. Parts of the manuscript touching on clinical outcome measures may require additional review from practicing oncologists.
Reviewer #3 (Evidence, reproducibility and clarity (Required)):
This study aims to classify prognostic and subtype-specific eRNAs in breast cancer, highlighting their potential as biomarkers.<br /> Data was analysed using existing machine learning algorithms,<br /> Data analysis is superficial and it is hard to understand the key significant findings.
This is an important topic and a highly relevant approach to identifying RNA-based biomarkers.<br /> They analyse published RNAseq datasets by focusing on molecular subtype-specific eRNAs, enhancing clinical relevance and thereby addressing the heterogeneity of the cancer type (strength of the study).
Weaknesses include: Most of the findings are purely correlation-based and also based on a reanalysis of published datasets; it would benefit from experimental validation to support their findings. Differential expression analysis of large datasets likely yields some differences in the transcriptome. How significant are these changes?<br /> Does the expression of eRNAs affect the expression of genes in cis? Although this analysis would provide some associated gene expression differences, it can also provide some insights into subtype-specific differences in gene expression programs.<br /> If the authors find experimental validations are not feasible, I recommend validating the eRNA signature in an independent dataset.
Response: We acknowledge the weaknesses noticed by the reviewer from this study about the correlation-based analyses of published datasets. While the TCGA eRNA atlas datasets are reanalysed, these are the high-resolution maps ever published on eRNA expression on cancer patient samples, and our study is the first to establish the subtype specific classification of eRNAs. We believe that the eRNAs are biologically relevant, as they are strongly associated with the subtype-specific pathways and epigenetic regulators. Upon suggestion from the reviewers, we will explore the association of mRNAs and eRNAs in cis to establish further significance and relevance of the eRNAs we identified (discussed earlier in reviewer 1 comments).
We would like to focus on studying the functional relevance of eRNAs as a separate project. In vitro studies to establish the knockdown of eRNAs are not straightforward due to the toxicity and non-specific targeting of the locked nucleic acids approach or Cas13-based RNA targeting. siRNA-based approaches don't target the nuclear eRNAs effectively, even though they were widely used by other labs to target eRNAs. Hence, a lot of effort on optimisations are needed to establish functional validation of our eRNAs, hence not under the scope and time frame of this study/revision. To provide validation and significance using independent datasets, we will explore the association of these factors with the expression of subtype-specific eRNAs further in our final revised manuscript using the tools explained above for reviewer 1 (PEGS and Perturb-seq integration). Integration of our classified eRNAs with the published Perturb-seq validated regions from ER+ and ER- breast cancer cell lines will provide the functional validation of patient-associated classified enhancer/eRNAs. Hence, our study would be the first to demonstrate the validated gene-enhancer regulatory networks from breast cancer patient datasets.
Furthermore, we included the single gene visualisation profiles of independent datasets of ER ChIP-seq from different patients (Ross-Innes et al., 2012), ATAC-seq from TCGA patients (Corces et al., 2018), H3K27ac ChIP-seq datasets from cell lines (Theodorou et al., 2013 and Hickey et al., 2021) and GRO-seq and CAGE data published in MCF7 cells close to the eRNA regions and discussed their overlap with the eRNA regions in the revised manuscript. In the final revision, we will perform further detailed integration of all these profiles. Overall, our study will provide the integratory analysis of various independent epigenetic and functional profiles to validate our classified subtype and survival-specific eRNA regions.
Here are major points; addressing these points in the revised version is important.
From Figure 1B, what eRNAs were identified for LumB using log2MC?
Response: The authors acknowledge the lack of analyses on LumB eRNAs in the original version of the manuscript. In the final revised manuscript after associating with mRNAs, we will provide the heatmaps, pathway analyses and other functional annotations for LumB specific eRNAs.
Page 8 However, sensitivity and F-measure .... It would help to include the metrics for the number of patients in each subtype. The ratio of eRNAs/number of cases in each subtype would inform if the number of eRNAs is an outcome of no. of cases or subgroup-specific.
Response: This is a great suggestion from the reviewer, and we included the number of patients for each subtype in the table in Fig. 1D. We observed that the basal patients are low in number, but we identified more basal eRNAs. Hence, the number of eRNAs identified in subtype-specific manner is not correlated to the number of patients in the cohort.
Page 9 "Altogether, both measurements classify eRNAs efficiently based on subtypes, InfoGain allowed us to distinguish further samples based on high and low expression of eRNAs for basal subtype and performed better in statistical metrics" Based on statistical metrics, both models seem to be performing similarly except for Her2.
Response: We apologise for this wrong interpretation. We corrected this in the revised manuscript at page 9.
In Fig. 1B, the F-measure metrics are wrong for basal LogMC, as it is 0.94 rather than 0.54, which could lead to a misinterpretation of the model.
Response: We apologise for the mistake in this figure, and we included the corrected heatmap in the revised manuscript.
Many genome browser figures, including Figure S3. TFBS is not at the same site as eRNAs detected. Is there CAGE data to show that binding these TFs at these sites leads to the expression of eRNAs? That will give direct evidence that the eRNAs are transcribed due to these TFs
Response: This is a great suggestion from the reviewer. We incorporated ATAC-seq, H3K27ac ChIP-seq, CAGE and GRO-seq data on the gene profiles in Fig. S3 to validate the activity of these ER binding sites in the preliminarily revised manuscript. We observed that all the ER binding sites are accompanied by open and active enhancer marks with bidirectional transcription (either GRO- or CAGE positive) but they are not on the exact location of eRNA regions (250-1000 bps away from the centre of ER binding site). Subtype-specific eRNA regions close to genes like MLPH and XBP1 possess active bidirectional transcribing ER binding sites far away from subtype-specific eRNA loci and also ER unbound sites. However, these distal ER binding sites are close to the regions from the list of 300K eRNA loci and they were simply not identified as subtype-specific regions.
Page 10, There were 30 Her2-specific eRNA regions.... Do the same enhancers also regulate these genes as those from which eRNAs are transcribed? Is it cis-effect, or could these affect the trans-regulating of other genes?
Response: We acknowledge the concern from the reviewer, however this is hard to be validated, as functional experiments to explore the 3D interactions of enhancers and gene promoters are not robust enough to be performed in patient samples and can't be performed within the revision time frame. In the final revised manuscript, we will explore the association of enhancers and promoters of ERBB2 with PEGS association as discussed above and with available HiC datasets in Her2+ cell lines (HCC1954, GSE167150, Kim et al., 2022 https://pubmed.ncbi.nlm.nih.gov/35513575/ )
Minor comments:
Page 8 "InfoGain meausure..." Fig. S2A also shows high and low expressed eRNAs for the basal group
Response: We apologise for the lack of clarity here. InfoGain measure identifies both high and low expressed eRNAs in all patients showing similar pattern of regulation among patients. However, logmc derived eRNAs are highly expressed in most patients. Low expressed eRNAs could not be identified in logmc measure as strong as InfoGain regions. The text in the results had been edited in the revised manuscript to reflect better clarity on this point.
Page 11, Our analyses also identified the role of another..... The statement is misleading as it is the enrichment of these TFs with the eRNAs<br /> Response: We included the word “enrichment” to clarify this statement.
Page 13, "Around 90% of eRNAs are bidirectional and non-polyadenylated [53]. TCGA expression datasets are based on RNA-seq assays, which capture only non-polyadenylated RNAs. Thus, analysing the expression of eRNAs on mRNA-seq datasets might not be adequate". It is very confusing, please check<br /> Response: We apologise for the mistake, and this has been corrected in the revised manuscript.
Reviewer #3 (Significance (Required)):
This is an important topic and a highly relevant approach to identifying RNA-based biomarkers.<br /> They analyse published RNAseq datasets by focusing on molecular subtype-specific eRNAs, enhancing clinical relevance and thereby addressing the heterogeneity of the cancer type (strength of the study).


