Skip to main content

Integrated transcriptional profiling and genomic analyses reveal RPN2 and HMGB1 as promising biomarkers in colorectal cancer


Colorectal cancer (CRC) is a heterogeneous disease that is associated with a gradual accumulation of genetic and epigenetic alterations. Among all CRC stages, stage II tumors are highly heterogeneous with a high relapse rate in about 20–25 % of stage II CRC patients following surgery. Thus, a comprehensive analysis of gene signatures to identify aggressive and metastatic phenotypes in stage II CRC is desired for a more accurate disease classification and outcome prediction. By utilizing a Cancer Array, containing 440 oncogenes and tumor suppressors to profile mRNA expression, we identified a larger number of differentially expressed genes in poorly differentiated stage II colorectal adenocarcinoma tissues, compared to their matched normal tissues. Ontology and Ingenuity Pathway Analysis (IPA) indicated that these genes are involved in functional mechanisms associated with several transcription factors. Genomic alterations of these genes were also investigated through The Cancer Genome Atlas (TCGA) database, utilizing 195 published CRC specimens. The percentage of genomic alterations in these genes was ranked based on their mRNA expression, copy number variations and mutations. This data was further combined with published microarray studies from a large set of CRC tumors classified based on prognostic features. This led to the identification of eight candidate genes including RPN2, HMGB1, AARS, IGFBP3, STAT1, HYOU1, NQO1 and PEA15 that were associated with the progressive phenotype. In particular, RPN2 and HMGB1 displayed a higher genomic alteration frequency in CRC, compared to eight other major solid cancers. Immunohistochemistry was performed on additional 78 stage I–IV CRC samples, where RPN2 protein immunostaining exhibited a significant association with stage III/IV tumors, distant metastasis, and poor differentiation, indicating that RPN2 expression is associated with poor prognosis. Further, our study revealed significant transcriptional regulatory mechanisms, networks and gene signatures, underlying CRC malignant progression and phenotype warranting future clinical investigations.


Colorectal cancer (CRC) is the third most common cancer worldwide [1]. TNM staging is a standard pathology classification used for treatment strategy and outcome prediction, especially for most stage-I, -III, and -IV CRC patients. In the clinical setting, approximately 90 % of localized stage I CRC patients are cured by surgical removal of the tumor burden, so the prognosis and treatment plan for stage I CRC patients has been standardized [2, 3]. However, stage II CRC is highly heterogeneous, with 20–25 % of patients exhibiting recurrence or relapsed disease following surgery. The 5-year overall-survival of patients with stage II tumors ranges from 58 to 85 % [46]. Although a wide variety of potential clinical and pathological risk factors have been examined for improved outcome prediction, such as T4 lesions, poorly differentiated histology or intestinal obstruction [7], the molecular mechanisms underlining the heterogeneous characteristics of stage II CRC are still not well established. In fact, previous publications indicated that clinical outcome prediction and treatment of stage II CRC remains controversial, with a necessity for a better molecular classification utilizing gene signatures and biomarkers, in order to complement TNM staging [810].

Over the past few years, there has been a significant progress in identifying distinct molecular signatures to better define CRC subsets. The biological and clinical significance of overexpressed oncogenes (i.e. EGFR and MYC), and functional loss of tumor suppressor genes (i.e. TP53 and APC) have been well characterized [1115]. Better understanding of the oncogenes and related signaling pathways have led to successful CRC therapies, especially in targeting the EGFR-RAS-MAPK signaling pathway [16, 17]. Further discoveries on TP53 and APC have been utilized to predict poor CRC prognosis, with the presence of defective of APC expression or point mutations in TP53 [18]. However, these gene signatures have not been successfully utilized as biomarkers to classify the heterogeneous stage II CRC for diagnosis and treatment. More comprehensive gene signatures and signaling transduction pathways related to stage II CRC are needed to understand the disease progression and for an improved prognosis, as well as treatment.

Recent development of high throughput technologies, such as gene expression profiling and genomic sequencing analysis, enabled us to identify comprehensive cancer gene signatures and related signaling pathways, based on the genetic and expression alterations in multiple cancers [19, 20]. Previously published gene signatures using gene profiling, RT-PCR, or sequencing technologies varied considerably in terms of their gene composition, with little gene overlap [21]. The lack of concordant gene signatures could be related to several issues, including differences: (1) in technological platforms, such as microarray, RT-PCR or sequencing technologies; (2) different sample types selected for analysis, and (3) the different analytical tools used to generate the gene signatures [22]. Hence, an integrative approach combining the information derived from different technological platforms, summarizing different categories of genetic and expression alterations from a large number of samples, may more accurately identify the associations of clinical phenotypes with genetic and expression alterations.

In the current study, we performed an integrated data analysis, combining gene expression profiles from our collected paired stage II CRC patient’s samples, with genomic alteration data of 195 CRC samples, previously published by the Cancer Genome Atlas (TCGA) project, along with extracted results from more than 50 previously published microarray studies. This integrated approach has identified eight gene candidates, significantly associated with a progressive CRC phenotype. Among these genes, a significantly higher alteration profile for Ribophorin II (RPN2) and High-mobility group protein B1 (HMGB1) was observed in CRC tumors, compared to other eight major human solid cancer types, currently available in the TCGA database. Immunohistochemistry was performed on 78 clinical stage I–IV CRC samples, where RPN2 exhibited a significant association with distant metastasis and poor differentiation. These gene signatures expand the current CRC biomarker pools for tumor progression and CRC outcome prediction. Further, these gene signatures warrant future validation as potential biomarkers in large clinical trials.


Identification of gene expression signatures, related pathways and upstream regulators

To investigate important gene signatures in stage II CRC tissues with implications in aggressive malignant phenotypes, we utilized a cancer-specific array, containing known cancer related gene probes (N = 440) for mRNA expression profiling. A total of 92 genes, exhibiting at least 1.5-fold difference with p < 0.05 in mRNA level between tumor and normal samples were identified (Additional file 1: Table S1). Unsupervised hierarchical clustering algorithm was used to classify sub-populations based on the list of differentially expressed genes. As expected, tumor and normal samples were clearly separated into two subgroups (Fig. 1a). We further identified two characteristic clusters, including an over-expressed gene cluster A and an under-expressed gene cluster B in the tumor subgroup (Fig. 1b, c).

Fig. 1
figure 1

Identification of CRC gene signatures using global expression profiles. Microarray analysis was performed using stage II CRC cancer specimens (T1-4), compared to matched mucosal tissue samples (N1-4). a Unsupervised hierarchical clustering of differentially expressed genes was used to identify differentially expressed genes (fold change 1.5, p < 0.05). Over-expressed and under-expressed genes are indicated by red and green colour. The expression level is proportional to colour brightness. Black bars on the left indicate gene clusters A and B, respectively. An expanded view of cluster A (over-expressed genes, b), or cluster B (under-expressed genes, c), with the indicated gene names is shown

To investigate their functional relevance, we annotated differentially expressed genes in Fig. 1 according to Gene Ontology (GO) biological processes, by using the DAVID software. We observed significant enrichment in apoptosis, phosphorylation, cell proliferation, protein kinase cascade, colorectal cancer metastasis and intracellular signaling cascade in stage II CRC (Additional file 1: Table S2a). We further applied IPA (Ingenuity Pathway Analysis) to identify enriched pathways and unveil the functional relevance of our differentially expressed genes in stage II CRC. Using this approach, six enriched sub-networks were hence identified. They were associated with NFKB, AP1, STAT3, TP53, HSP90 and CTNNB1 signaling pathways (Additional file 2: Figure S1). Furthermore, utilizing the IPA tool, we also investigated upstream regulatory molecules that are responsible for identified pathways and altered gene expression in stage II CRC. As shown in Additional file 1: Table S2b, three top transcription regulators TP53, TP63 and TP73 were significantly enriched in stage II CRC. In addition, several transcription factors and oncogenes ranked top on the list, such as NFKB, AP1, STAT3 and MYC, as well as other regulators (i.e. E2F1, HIF1A and ANR). This data suggests that aberrant regulation of these TFs (NFKB, AP1, STAT3, TP53, TP63 and MYC) could potentially influence the stage II CRC progression.

Transcriptional regulatory gene network in CRC

To test the hypothesis on how upstream oncogenic and tumor suppressor TFs regulate gene expression in stage II CRC, we applied a previously developed bioinformatics model [24] able to predict these TFs regulation of their target genes. As shown in Additional file 1: Table S1, we generated a list of putative targets for the seven TFs, including NFKB1, RelA, TP53, TP63, STAT3, MYC and AP1. About 35–40 % of NFKB1 or TP53 targets were previously validated by experimental data (Additional file 1: Table S1). There were twenty-nine NFKB1 targeted genes, consisting of both predicted and validated gene candidates, as well as eight unique NFKB1 target genes, including RPN2 and HMGB1 (Additional file 1: Table S1). In addition, there were many genes under the regulation of multiple TFs (Additional file 1: Table S1). Subsequently, we constructed a transcriptional regulatory gene network, presented in Fig. 2. We predicated ten genes being co-targeted by all of the seven TFs, including under-expressed (BAX, CDKN1A, CDKN2B, LDHA, MDM2, SLC16A1, WEE1) and over-expressed (HSP90AB, NQO1 as well as PTMA) genes, respectively. These genes are known to be involved in biological processes, such as apoptosis, cell proliferation, and cell cycle. Our data suggests that the interaction of these seven TFs may participate alone or co-regulate the signaling pathways, associated with the progression of stage II CRC.

Fig. 2
figure 2

Inferred transcriptional regulatory gene network in CRC. A newly developed computational model was utilized to identify target genes of seven cancer-related TFs and to construct gene regulatory networks. The triangular nodes represent corresponding TFs. Circle nodes refer to the target genes of TFs. Arrow lines show regulatory relationships from TFs to their target genes. Purple or blue lines stand for TFs that act as tumor suppressor or oncogene, respectively. Red and green nodes refer to over- and under-expressed genes, respectively

Genomic and expression alterations of identified CRC-related genes in the TCGA database

To identify the genomic alterations for the differentially expressed genes discovered in this project, we took the advantage of the TCGA database, containing recently published large mRNA expression and genomic alteration data, derived from 195 stage I–IV CRC patients [20]. We analyzed 92 differentially expressed genes identified in our study, using the TCGA database and determined that a total of 24 genes (22 over- and 2 under-expressed genes), exhibited consistent expression patterns with the TCGA data (Additional file 2: Figure S2). Among the over-expressed genes, RPN2 was altered most frequently in 87 out of 195 (37 %) CRC cases at stage II and III, including significant gene amplification and mRNA overexpression. The second most altered gene was HMGB1 (13 %), which exhibited a significantly higher rate of mRNA up-regulation (Fig. 3a; Additional file 2: Figure S3). The remaining 19 over-expressed genes were altered similarly in 195 CRC cases (5–10 %) and across all stages (Fig. 3a; Additional file 2: Figure S3a). Alterations of TNK1 included mRNA down-regulation or mutations in 22 out of 195 cases (13 %) across all tumor stages (Additional file 2: Figure S3). We further examined if copy number variations (CNVs) are associated with mRNA expression for the 24 genes, identified in the TCGA. The mRNA expression of 14 out of these 24 genes correlated significantly with CNVs (Fig. 3b). In addition, ≥25 % recurrent CNVs were observed as gains on chromosomes 20q, 13q, 6q, 16q, 10q, 11q, 12q, 14q, and 1q, as well as losses at 17p and 1p, respectively (Fig. 3b). Remarkably, RPN2 and HMGB1 had a higher percentage of CNV in these cases, including gains (RPN2 and HMGB1) and amplifications (RPN2, Fig. 3b).

Fig. 3
figure 3

Evaluation of genomic alterations in TCGA CRC dataset. a Genomic alterations, including copy number variations (CNVs), mutations and gene expression of each gene candidate were extracted from the TCGA colorectal cancer (CRC) database. The X- and Y-axis represent genomic alterations and case number, respectively. b Variations of mRNA expression versus CNVs and chromosomal locations for individual gene candidates are shown. CNV categories include homozygous deletions (Homodel), heterozygous deletions (Hetloss), diploid, gain and amplification (Amp). mRNA was expressed as 25th, 50th, and 75th percentile and whiskers represent minimal and maximal values, excluding the outliers. Red circle indicates missense mutations

Identification of CRC-specific gene signatures across multiple data sets

Previous evaluation of 29 microarray studies, utilizing CRC tumor specimens, has identified 31 gene signatures with prognosis significance [31]. Out of these 31 genes, 28 were overlapping with our identified differentially expressed genes from our microarray data, of which 8 genes (AARS, PEA15, NQO1, STAT1, IGFBP3, HYOU1, HMGB1 and IGF2R) were also matched genes, previously verified in the TCGA dataset (Table 1; Fig. 3; Additional file 2: Figure S3). Next, we manually searched for the presence of these 28 differentially expressed genes in 50 previously published microarray studies that were utilizing CRC tissues (Additional file 1: Table S3). The 50 published microarray studies were selected based on prognosis signature inclusion criteria. Sixteen out of these 28 genes were overlapping with the 50 published microarray experiments. Further investigation revealed that 10 out of these 16 genes (RPN2, AARS, PEA15, NQO1, AKT1, STAT1, IGFBP3, HYOU1, HMGB1 and IGF2R) also matched with the identified genes in the TCGA database (Table 1).

Table 1 Comparison of differentially expressed genes with published CRC microarray studies

Finally, we selected 8 out of the above 10 genes from above list (RPN2, HMGB1, AARS, IGFBP3, STAT1, HYOU1, NQO1 and PEA15), based on the following criteria: (1) previously identified by CRC tumor microarray experiments; (2) previously published as a prognosis markers and (3) identified candidate within the transcriptional regulation network presented in Fig. 2. The genomic alterations of these selected genes appeared to be more evident in stage II, compared to stage I CRC tumors, as observed in the TCGA database (Fig. 4a). Furthermore, we analyzed these eight genes for alteration frequency in nine major solid cancer types, available in the TCGA database. A significantly higher alteration frequency for RPN2 and HMGB1 genes was observed in CRC tumors, when compared with other tumors (Fig. 4b).

Fig. 4
figure 4

Identification of CRC-specific gene signatures across most common solid tumors. a Case numbers with genomic alterations for each gene candidate, identified in stage I/II colorectal cancer from the TCGA dataset. b The frequency of genomic alterations for each gene candidates was analyzed across nine different solid cancer types (each with >150 tumor samples) from the TCGA database

Association of RPN2, HMGB1 and NFkB1 protein expression with CRC clinic-pathological features

Based on our observations, where a higher percentage of CRC cases exhibited an association between genetic alterations and expression profiles in RPN2 and HMGB1, as well as NFkB1, a common transcriptional regulator of both genes, we performed immunohistochemistry to validate RPN2, HMGB1 and NFkB1 protein expression in a cohort of additional 78 CRC specimens (Fig. 5). A total of 29.5 % of stage I/II and 51.4 % of stage III/IV specimens were positive for cytoplasmic RPN2, with a significant association between tumor stages (p = 0.047). In addition, RPN2 staining was also strongly associated with distant metastasis (p = 0.0007) and histological differentiation (p = 0.015), but not with gender, age or tumor location (Fig. 5a, b). Characteristic cytoplasmic and nuclear staining was observed for NFkB1 in the majority of tumor specimens (stage I–IV). However, cases with positive staining were highest in stage III/IV tumors but barely reached statistical significance (p = 0.055, Fig. 5a, b). No significant association of NFkB1 protein expression was observed for distant metastasis or other clinic-pathological features (Fig. 5b). In addition, ~90 % of all examined CRC tumor samples exhibited HMGB1 immuno-reactivity, however no difference association with clinicopathological features was observed (Fig. 5b).

Fig. 5
figure 5

Association of RPN2, NFkB1 and HMGB1 protein expression with clinicopathological features in a cohort of CRC. a Immunohistochemical analysis of RPN2, HMGB1 and NFkB1 protein expression was performed in a cohort of additional 78 CRC specimens. H&E refers to Hematoxylin and Eosin staining. Microscope images were taken at either ×100 or ×400 magnifications. The low-scale bar represents 200 μm and high-scale bar corresponds to 50 μm. b Correlation analysis of RPN2, NFkB1 and HMGB1 expression by gender, age, stages, metastasis, tumor location, and histological differentiation in 78 CRC samples. The cases with metastasis were divided into lymph node only and distant metastasis. *Corresponds to a p value <0.05 (Fisher’s exact test)


In this study, we performed an integrated data analysis combining differentially expressed genes from microarray (Fig. 1), with published literature (Additional file 1: Table S3) and available genomic, as well as expression datasets from TCGA (Figs. 2, 4). The integrated analysis identified novel gene signatures in stage II CRC tumors, with strong implications for late stage aggressive and metastatic phenotypes (Table 1; Fig. 5). Bioinformatics analysis indicated that these genes were regulated by seven TFs, including NFKB1, RelA, TP53, TP63, STAT3, MYC and AP1 (Fig. 3), and further enriched in NFKB, AP1, STAT3, TP53, HSP90 and CTNNB1 pathways, known to be important regulators for cell proliferation, cell cycle, apoptosis, and intracellular signaling (Additional file 1: Table S1; Additional file 2: Figure S1). Furthermore, integrated evaluation of large published CRC datasets from TCGA identified eight candidate genes that were associated with the progressive phenotype of CRC (Table 1; Fig. 4a). A significantly higher alteration frequency for RPN2 and HMGB1 was also observed in CRC tumors, compared to other eight common solid tumor types. Finally, immunohistochemistry of RPN2, HMGB1 and NFkB1, revealed a significant association of RPN2 with CRC stage, metastasis and differentiation, in a cohort of additional CRC samples (Fig. 5). Our data revealed an association of important gene signatures with aggressive stage II CRC and their underlining molecular regulatory mechanisms.

Among our differentially expressed gene list, we observed several overexpressed genes being tightly regulated by several critical TFs that control cell proliferation, survival and inflammation [3234]. Using IPA analysis, to identify upstream regulators for the differentially expressed genes in stage II CRC samples used in this study, we observed a strong enrichment for tumor suppressor TP53 family members and oncogenic TFs (i.e.NFKB, AP1 and MYC) that either individually or combined regulate gene expression through shared or unique target genes (Fig. 2; Additional file 1: Table S1). The tumor suppressor TP53 family members and oncogenic TFs (NFKB1, AP1 and MYC) are thought to play a crucial role in controlling CRC progression, consistent with the observed results in the published TCGA report [20]. Our data suggested an existing strong link between the regulatory programs of these TFs in CRC. In particular, this study unveiled several potentially unique target genes for each TF, such as RPN2 and HMGB1 being targeted by NFKB1. These findings are consistent with previous studies of these important TFs in carcinogenesis [3537]. Our experimental data from microarray is supported by both computational analysis and literature searches, where differentially expressed gene signatures are shown to promote the malignant CRC process.

Additional support for our experimental data comes from the analysis of gene signature expression profiles across 195 CRC samples that are associated with different CRC stages from the TCGA database (Additional file 2: Figure S2). We found that about 55 % of identified over-expressed genes from our cancer array did overlap with published TCGA data. Interestingly, several genes exhibited recurrent CNVs (frequency > 25 %), which directly modulate their mRNA expression (Fig. 3b), hence providing a genetic mechanism for gene expression, showing consistency with previous reports [38, 39]. These findings highlight our observation that this subset of overexpressed genes may play a critical role in genomic instability, which is significantly associated with progressive CRC phenotypes. We further examined gene signatures with a prognostic CRC marker potential and by utilizing an integrated analysis approaches, we have identified eight gene candidates, including RPN2, HMGB1, AARS, IGFBP3, STAT1, HYOU1, NQO1 and PEA15. Among this list, six out of eight genes have been previously implicated in deregulation of gene expression and associated with the prognosis of CRC and other cancer types [4043]. Only RPN2 and AARS were novel genes, with no previous publication describing their functional contribution to CRC. The genetic alteration and expression profiles for this eight candidate genes were compared across different cancer types. Only RPN2- and HMGB1-genes were found to be most altered in CRC (Fig. 4b), further supporting their biological significance in CRC pathogenesis. We used a cohort of 78 independent CRC specimens (stages I–IV) to validate RPN2, HMGB1 and NFkB1 protein expression and their association with several clinic-pathological features. We observed that only RPN2 expression was significantly associated with tumor stage, histological differentiation and distant metastasis (Fig. 5). HMGB1 was positively expressed in 90 % of cells in all tumor stages and across our selected CRC samples. HMGB1 has been previously implicated in CRC with controversial roles in cancer immunity and metastasis [4448]. Although, it was not significantly associated with clinic-pathological features in our CRC samples (Fig. 5b), we still observed a consistent mRNA and protein over-expression of HMGB1 (Figs. 1, 5), as well as significant correlation between mRNA and CNV gains (Fig. 3b). Our data suggest that these gene markers can be identified in CRC samples, as early as stage II, but are not solely stage II specific. The consistent overexpression of these gene markers further supports their functional importance as oncogenes in tumor progression.

Our bioinformatics analysis suggests that HMGB1 and RPN2 are both targeted by an oncogenic TF, namely NFkB1 (Fig. 2). In fact, aberrant expression of NFkB1 expression was also observed in about 29.7 % of stage III/IV CRC cases (Fig. 5), which is consistent with previously published findings [43]. The RPN2 gene, located on chromosome 20q13, encodes a proteasome scaffolding protein that inhibits Bcl-mediated apoptosis and stabilizes mutated p53 protein expression through inactivation of GSK3β in breast cancer [49, 50]. Over-expression of RPN2 in stage II CRC was also reported by another microarray that focused on CRC tumor metastasis (Table 1), supporting its implication beyond early staged tumors. In TCGA CRC dataset, RPN2 up-regulation was observed in 65 out of 195 CRC cases (37 %) and was significantly associated with copy number gain (Fig. 3). This is consistent with previous findings, where >65 % of CRC cases have shown gains on chromosome and a strong association with liver metastasis and poor outcome [5153]. In addition, we examined RPN2- and HMGB1 alteration profiles in 195 CRC staged patients, available in the TCGA database, by ranking them according to their alteration rates (Additional file 2: Figure S2). Further evaluation revealed a slightly longer survival of patients without RPN2 alteration, but that did not reach statistical significance (p = 0.38, data not shown). This survival data may be valid, because the marker was not originally identified as a predictor for CRC survival in all stage cancers. It will be interesting to complement the genetic alteration and differential expression profiles of these molecules as biomarkers in future clinical trials for stage II CRC patients.

Furthermore, the importance of RPN2 in tumor prognosis and therapeutic implications has been documented in other solid cancers, including esophageal squamous cell carcinoma [54, 55], osteosarcoma [56, 57] and breast cancer [50]. We observed that RPN2 is also highly altered in head and neck squamous cell carcinoma (HNSCC), as well as lung squamous cell carcinoma (Fig. 4b), consistent with previously published studies [55, 58]. In various human malignancies, silencing of RPN2 was associated with increased apoptosis, reduced tumor growth and increased sensitivity of tumor cells to docetaxel response [49, 54]. The value of RPN2, both as a prognostic marker and as a therapeutic target is suggested for future validation. In this manuscript, we have shown that CRC cases with RPN2 staining were significantly higher in stage III/IV, in distant metastatic, and poorly differentiated tumors, indicating that its expression is associated with worse prognosis. In addition, NFkB1 protein expression was also associated with distant metastasis. Our data presents experimental evidence that RPN2 protein expression could serve as a potential biomarker to predict metastasis and worse prognosis. However, due to the lack of patient survival and outcome data in the current study, these molecules need to be further validated for their value as biomarkers in larger clinical trials.


In this manuscript, by utilizing mRNA profiling, we have identified a panel of differentially expressed gene signatures in stage II colorectal adenocarcinoma tissues. Through integrated analyses of the transcriptional regulation, The Cancer Genome Atlas database, and 50 published microarray studies of colorectal cancer specimens, we have identified eight candidate genes that are significantly associated with the aggressive phenotype, including RPN2, HMGB1, AARS, IGFBP3, STAT1, HYOU1, NQO1 and PEA15. Among those genes, RPN2 and HMGB1 displayed higher frequencies of genomic alterations in colorectal cancer, compared to other solid tumors. Furthermore, RPN2 protein expression evaluated by immunohistochemistry in 78 independent (stage I–IV) colorectal cancer tissues, exhibited a significant association with stage III/IV tumors, distant metastasis, and poor differentiation. Our study identified important molecular signatures underlying malignant progression and phenotype of colorectal cancer, which warrants future clinical investigations.



This study was approved by the ethical committee at Inner Mongolia Medical University and a written informed patient consent was obtained. A total of 82 patients were diagnosed with pathological stages (Additional file 2: Figure S3) and underwent surgical resection for CRC at Inner Mongolia Medical University Hospital from 2002 to 2006 and were diagnosed with pathological stages. Patients with hereditary syndromes, e.g. familial adenomatous polyposis (FAP), Lynch syndrome or hereditary nonpolyposis colorectal cancer (HNPCC), or inflammatory syndromes were pre-screened and excluded from this study. The preoperational chemo-radiotherapy or chemotherapy could significantly influence the expression of biomarkers. Hence none of the patients used in this study received treatment prior to surgery. Tumor staging was performed according to TNM classification criteria and guidelines of the International Union Against Cancer (UICC) guidelines [23]. Histological differentiation was evaluated, as poorly differentiated carcinomas are known to have a high-risk of recurrence or metastasis. Accordingly, we randomly selected 4 poorly differentiated adenocarcinoma (stage II) samples with matched adjacent normal mucosa tissues and performed microarray analysis. Histological evaluation confirmed the content of tumor- or normal colon epithelium cells to be more than 50 % (Additional file 2: Figure S1b).

RNA extraction and microarray profiling analysis

Total RNA was isolated from frozen tissue samples and extracted using the RNeasy Mini Kit (QIAGEN, Maryland, USA) according to manufacturer’s instructions. Gene expression was analyzed using Oligo GEArray Human Cancer Microarray® (Cat# OHS-802SA, Biosciences, CA, USA), which contains a total of 480 probes for 440 genes, encoding for tumor suppressors, oncogenes, signal transduction molecules, growth factors and their corresponding receptors, as well as others associated with angiogenesis. Gene expression levels were normalized to the beta-actin housekeeping gene. The selection criterion of differentially expressed genes was based on at least 1.5-fold threshold between the CRC tumors and matched normal tissues.

Computational inference of transcription factor target genes

We have developed a mathematical model, capable to identify target genes of a particular transcription factor (TF) and thus to construct a gene network, regulated by that TF. By recursively applying this model to the identified network, multiple networks can be interconnected. Moreover, the model is able to infer how likely a gene is regulated by a particular TF, which has been successfully applied in other cancer datasets [2428]. In this study, this model was applied to the above-obtained CRC differential gene expression data, in order to investigate target genes regulated by seven common cancer-related TFs, including NFKB1, RELA, AP1, TP53, TP63, STAT3 and MYC.

Genomic alterations from TCGA database

The cancer genome Atlas (TCGA) project has published the first “Marker” paper of colon cancer in 2012, which included genomic sequencing, epigenetic and mRNA expression profiling across 195 human colorectal cancer specimens [20]. The data is accessible through the cBio Cancer Genomics Portal ( [29], a web resource designed for the visualization of oncogenomic datasets. Using the differentially expressed gene list in our current study, we extracted genomic alteration profiles from 195 CRC specimens, which have been published and deposited in the TCGA database. Since the first “Marker” paper of colon cancer was published [20], there is a constant submission of more colon cancer samples to the TCGA project, for continued generation of high throughput experimental data. However, not all of these data are complete or have gone through the confirmation and validation process, and thus are not included in this study.

Immunohistochemical analysis

Thin sections of 10 % formalin-fixed, paraffin-embedded tissue specimens were treated with goat anti-human RPN2 antibody (sc-12165, Santa Cruz), rabbit anti-human HMGB1 antibody (#6893S, Cell Signaling), or rabbit anti-human NFKB1 antibody (sc-1190, Santa Cruz), followed by a peroxidase-conjugated goat anti-rabbit (sc-2018, Santa Cruz) or rabbit anti-goat (sc-2023, Santa Cruz) secondary antibody. Color was developed using Avidin and Biotin-conjugated horseradish peroxidase (ABC reagents) and according to standard protocols. The percentage of positively stained cancer cells was determined under the microscope from more than four visual fields (at 400× magnification). Specimens were evaluated by two independent pathologists and classified into 2 groups: negative staining (no cells were intensely stained), and positive staining (at least 10 % cells were intensely stained) [30].

Statistical analysis

Correlation between gene expression and distinct clinicopathologic characteristic was analyzed by the Fisher’s exact test. For all statistical analysis, a P value of <0.05 was considered significant.



colorectal cancer


gene ontology


ingenuity pathway analysis


The Cancer Genome Atlas


copy number variations


transcription factor


ribophorin II


high-mobility group protein B1


  1. Kuipers EJ, Rosch T, Bretthauer M. Colorectal cancer screening–optimizing current strategies and new directions. Nature review. Clin Oncol. 2013;10(3):130–42.

    CAS  Google Scholar 

  2. Kuntz KM, Lansdorp-Vogelaar I, Rutter CM, Knudsen AB, van Ballegooijen M, Savarino JE, et al. A systematic comparison of microsimulation models of colorectal cancer: the role of assumptions about adenoma progression. Med Decis Mak Int J Soc Med Decis Mak. 2011;31(4):530–9.

    Article  Google Scholar 

  3. Labianca R, Merelli B. Screening and diagnosis for colorectal cancer: present and future. Tumori. 2010;96(6):889–901.

    PubMed  Google Scholar 

  4. Akiyoshi T, Kobunai T, Watanabe T. Recent approaches to identifying biomarkers for high-risk stage II colon cancer. Surg Today. 2012;42(11):1037–45.

    Article  CAS  PubMed  Google Scholar 

  5. Koelzer VH, Lugli A. The tumor border configuration of colorectal cancer as a histomorphological prognostic indicator. Front Oncol. 2014;18(4):29.

    Google Scholar 

  6. Lavery IC, De Campos-Lobato LF. How to evaluate risk and identify stage II patients requiring referral to a medical oncologist: a surgeon’s perspective. Oncol N Y. 2010;24(1):14–6.

    Google Scholar 

  7. Fleming M, Ravula S, Tatishchev SF, Wang HL. Colorectal carcinoma: pathologic aspects. J Gastrointest Oncol. 2012;3(3):153–73.

    PubMed Central  PubMed  Google Scholar 

  8. Furlan D, Carnevali IW, Bernasconi B, Sahnane N, Milani K, Cerutti R, et al. Hierarchical clustering analysis of pathologic and molecular data identifies prognostically and biologically distinct groups of colorectal carcinomas. Mod Pathol. 2011;24(1):126–37.

    Article  CAS  PubMed  Google Scholar 

  9. Madhavan S, Gusev Y, Natarajan TG. Genome-wide multi-omics profiling of colorectal cancer identifies immune determinants strongly associated with relapse. Front Genet. 2013;20(4):236.

    Google Scholar 

  10. Sharif S, O’Connell MJ. Gene signatures in stage II colon cancer: a clinical review. Curr Colorectal Cancer Rep. 2012;8(3):225–31.

    Article  PubMed Central  PubMed  Google Scholar 

  11. Chu D, Zhang Z, Li Y, Wu L, Zhang J, Wang W, Zhang J, et al. Prediction of colorectal cancer relapse and prognosis by tissue mRNA levels of NDRG2. Mol Cancer Ther Jan. 2011;10:47.

    Article  CAS  Google Scholar 

  12. Vicuna B, Benson AB 3rd. Adjuvant therapy for stage II colon cancer: prognostic and predictive markers. J Nat Compr Cancer Net JNCCN. 2007;5(9):927–36.

    CAS  Google Scholar 

  13. Walther A, Johnstone E, Swanton C, Midgley R, Tomlinson I, Kerr D. Genetic prognostic and predictive markers in colorectal cancer. Nat Rev Cancer. 2009;9(7):489–99.

    Article  CAS  PubMed  Google Scholar 

  14. Roth AD, Tejpar S, Delorenzi M, Yan P, Fiocca R, Klingbiel D, et al. Prognostic role of KRAS and BRAF in stage II and III resected colon cancer: results of the translational study on the PETACC-3, EORTC 40993, SAKK 60-00 trial. J Clin Oncol Off J Am Soc Clin Oncol. 2010;28(3):466–74.

    Article  CAS  Google Scholar 

  15. Ogino S, Nosho K, Irahara N, Shima K, Baba Y, Kirkner GJ, et al. CpG island methylator phenotype, microsatellite instability, BRAF mutation and clinical outcome in colon cancer. Gut. 2009;58(1):90–6.

    Article  PubMed Central  PubMed  Google Scholar 

  16. Amado RG, Wolf M, Peeters M, Van Cutsem E, Siena S, Freeman DJ, et al. Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer. J Clin Oncol. 2008;26(10):1626–34.

    Article  CAS  PubMed  Google Scholar 

  17. Razis E, Pentheroudakis G, Rigakos G, Bobos M, Kouvatseas G, Tzaida O, et al. EGFR gene gain and PTEN protein expression are favorable prognostic factors in patients with KRAS wild-type metastatic colorectal cancer treated with cetuximab. J Cancer Res Clin Oncol. 2014;140(5):737–48.

    Article  CAS  PubMed  Google Scholar 

  18. Smith FM, Stephens RB, Kennedy MJ, Reynolds JV. P53 abnormalities and outcomes in colorectal cancer: a systematic review. Br J Cancer. 2005;92(9):1813.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Ribic CM, Sargent DJ, Moore MJ, Thibodeau SN, French AJ, Goldberg RM, et al. Tumor microsatellite-instability status as a predictor of benefit from fluorouracil-based adjuvant chemotherapy for colon cancer. N Engl J Med. 2003;349(3):247–57.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. Network Cancer Genome Atlas. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487(7407):330–7.

    Article  Google Scholar 

  21. Shibayama M, Maak M, Nitsche U, Gotoh K, Rosenberg R, Janssen KP. Prediction of metastasis and recurrence in colorectal cancer based on gene expression analysis: ready for the clinic? Cancers. 2011;3:2858–69.

    Article  PubMed Central  PubMed  Google Scholar 

  22. Watanabe T. Biomarker for high-risk patients with stage II colon cancer. Lancet Oncol. 2013;14(13):1247–8.

    Article  PubMed  Google Scholar 

  23. Sobin LH, Fleming ID. TNM classification of malignant tumors, fifth edition (1997). Union Internationale Contre le Cancer and the American Joint Committee on Cancer. Cancer. 1997;80(9):1803–4.

    Article  CAS  PubMed  Google Scholar 

  24. Yan B, Li H, Yang X, Shao J, Jang M, Guan D, et al. Unraveling regulatory programs for NF-kappaB, p53 and microRNAs in head and neck squamous cell carcinoma. PLoS One. 2013;8(9):e73656.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  25. Ethayathulla AS, Nguyen HT, Viadiu H. Crystal structures of the DNA-binding domain tetramer of the p53 tumor suppressor family member p73 bound to different full-site response elements. J Biol Chem. 2013;288(7):4744–54.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Martynova E, Pozzi S, Basile V, Dolfini D, Zambelli F, Imbriano C, et al. Gain-of-function p53 mutants have widespread genomic locations partially overlapping with p63. Oncotarget. 2012;3(2):132–43.

    Article  PubMed Central  PubMed  Google Scholar 

  27. Kim J, Lee JH, Iyer VR. Global identification of Myc target genes reveals its direct role in mitochondrial biogenesis and its E-box usage in vivo. PLoS One. 2008;3(3):e1798.

    Article  PubMed Central  PubMed  Google Scholar 

  28. Zeller KI, Jegga AG, Aronow BJ, O’Donnell KA, Dang CV. An integrated database of genes responsive to the Myc oncogenic transcription factor: identification of direct genomic targets. Genome Biol. 2003;4(10):R69.

    Article  PubMed Central  PubMed  Google Scholar 

  29. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–4.

    Article  PubMed  Google Scholar 

  30. Huqun, Ishikawa R, Zhang J, Miyazawa H, Goto Y, Shimizu Y, et al. Enhancer of zeste homolog 2 is a novel prognostic biomarker in nonsmall cell lung cancer. Cancer. 2012;118(6):1599–606.

    Article  CAS  PubMed  Google Scholar 

  31. Sanz-Pamplona R, Berenguer A, Cordero D, Riccadonna S, Solé X, et al. Clinical value of prognosis gene expression signatures in colorectal cancer: a systematic review. PLoS One. 2012;7(11):e48877.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  32. Oikawa T. ETS transcription factors: possible targets for cancer therapy. Cancer Sci. 2004;95(8):626–33.

    Article  CAS  PubMed  Google Scholar 

  33. Gronemeyer HJA. Gustafsson, and V. Laudet. Principles for modulation of the nuclear receptor superfamily. Nat Rev Drug Disc. 2004;3(11):950–64.

    Article  CAS  Google Scholar 

  34. Sharma HW, Perez JR, Higgins-Sochaski K, Hsiao R, Narayanan R, et al. Transcription factor decoy approach to decipher the role of NF-kappa B in oncogenesis. Anticancer Res. 1996;16(1):61–9.

    CAS  PubMed  Google Scholar 

  35. Hasselblatt P, Gresh L, Kudo H, Guinea-Viniegra J, Wagner EF. The role of the transcription factor AP-1 in colitis-associated and beta-catenin-dependent intestinal tumorigenesis in mice. Oncogene. 2008;27(47):6102–9.

    Article  CAS  PubMed  Google Scholar 

  36. Nakao K, Mehta KR, Fridlyand J, Moore DH, Jain AN, Lafuente A, et al. High-resolution analysis of DNA copy number alterations in colorectal cancer by array-based comparative genomic hybridization. Carcinogenesis. 2004;25(8):1345–57.

    Article  CAS  PubMed  Google Scholar 

  37. Gordziel C, Bratsch J, Moriggl R, Knösel T, Friedrich K. Both STAT1 and STAT3 are favourable prognostic determinants in colorectal carcinoma. Br J Cancer. 2013;109(1):138–46.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  38. Camps J, Nguyen QT, Padilla-Nash HM, Knutsen T, McNeil NE, Wangsa D, et al. Integrative genomics reveals mechanisms of copy number alterations responsible for transcriptional deregulation in colorectal cancer. Genes Chromosom Cancer. 2009;48(11):1002–17.

    Article  CAS  PubMed  Google Scholar 

  39. Henrichsen CN, Chaignat E, Reymond A. Copy number variants, diseases and gene expression. Hum Mol Genet. 2009;18(R1):R1–8.

    Article  CAS  PubMed  Google Scholar 

  40. Yi JM, Dhir M, Van Neste L, Downing SR, Jeschke J, Glöckner SC, et al. Genomic and epigenomic integration identifies a prognostic signature in colon cancer. Clin Cancer Res Off J Am Assoc Cancer Res. 2011;17(6):1535–45.

    Article  CAS  Google Scholar 

  41. Yao X, Zhao G, Yang H, Hong X, Bie L, Liu G. Overexpression of high-mobility group box 1 correlates with tumor progression and poor prognosis in human colorectal carcinoma. J Cancer Res Clin Oncol. 2010;136(5):677–84.

    Article  CAS  PubMed  Google Scholar 

  42. Slaby O, Sobkova K, Svoboda M, Garajova I, Fabian P, Hrstka R. Significant overexpression of Hsp110 gene during colorectal cancer progression. Oncol Rep. 2009;21(5):1235–41.

    Article  CAS  PubMed  Google Scholar 

  43. Southern SL, Collard TJ, Urban BC, Skeen VR, Smartt HJ, Hague A, et al. BAG-1 interacts with the p50-p50 homodimeric NF-kappaB complex: implications for colorectal carcinogenesis. Oncogene. 2012;31(22):2761–72.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  44. Kang R, Zhang Q, Zeh HJ 3rd, Lotze MT, Tang D. HMGB1 in cancer: good, bad, or both? Clin Cancer Res Off J Am Assoc Cancer Res. 2013;19(15):4046–57.

    Article  CAS  Google Scholar 

  45. Süren D, Yıldırım M, Demirpençe Ö, Kaya V, Alikanoğlu AS, Bülbüller N, et al. The role of high mobility group box 1 (HMGB1) in colorectal cancer. Med Sci Monit. 2014;31(20):530–7.

    Google Scholar 

  46. Ueda M, Takahashi Y, Shinden Y, Sakimura S, Hirata H, Uchi R, et al. Prognostic significance of high mobility group box 1 (HMGB1) expression in patients with colorectal cancer. Anticancer Res. 2014;34(10):5357–62.

    CAS  PubMed  Google Scholar 

  47. Zhang CC, Gdynia G, Ehemann V, Roth W. The HMGB1 protein sensitizes colon carcinoma cells to cell death triggered by pro-apoptotic agents. nt. J Oncol. 2015;46(2):667–76.

    CAS  Google Scholar 

  48. Zhu L, Li X, Chen Y, Fang J, Ge Z. High-mobility group box 1: a novel inducer of the epithelial-mesenchymal transition in colorectal carcinoma. Cancer Lett. 2015;357(2):527–34.

    Article  CAS  PubMed  Google Scholar 

  49. Honma K, Iwao-Koizumi K, Takeshita F, Yamamoto Y, Yoshida T, Nishio K, et al. RPN2 gene confers docetaxel resistance in breast cancer. Nat Med. 2008;14(9):939–48.

    Article  CAS  PubMed  Google Scholar 

  50. Takahashi RU, Takeshita F, Honma K, Ono M, Kato K, Ochiya T. Ribophorin II regulates breast tumor initiation and metastasis through the functional suppression of GSK3 beta. Sci Rep. 2013;3:2474.

    PubMed Central  PubMed  Google Scholar 

  51. Vilar E, Gruber SB. Microsatellite instability in colorectal cancer-the stable evidence. Nature reviews. Clin Oncol. 2010;7(3):153–62.

    CAS  Google Scholar 

  52. Nanashima A, Yamaguchi H, Yasutake T, Sawai T, Kusano H, Tagawa Y, et al. Gain of chromosome 20 is a frequent aberration in liver metastasis of colorectal cancers. Dig Dis Sci. 1997;42:1388–93.

    Article  CAS  PubMed  Google Scholar 

  53. Loo LW, Tiirikainen M, Cheng I, Lum-Jones A, Seifried A, Church JM, et al. Integrated analysis of genome-wide copy number alterations and gene expression in microsatellite stable, CpG island methylator phenotype-negative colon cancer. Genes Chromosomes Cancer. 2013;52(5):450–66.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  54. Kurashige J, Watanabe M, Iwatsuki M, Kinoshita K, Saito S, Nagai Y, et al. RPN2 expression predicts response to docetaxel in oesophageal squamous cell carcinoma. Br J Cancer. 2012;107(8):1233–8.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  55. Lian M, Fang J, Han D, Ma H, Feng L, Wang R, et al. Microarray gene expression analysis of tumorigenesis and regional lymph node metastasis in laryngeal squamous cell carcinoma. PLoS One. 2013;8(12):e84854.

    Article  PubMed Central  PubMed  Google Scholar 

  56. Fujiwara T, Takahashi RU, Kosaka N, Nezu Y, Kawai A, et al. RPN2 gene confers osteosarcoma cell malignant phenotypes and determines clinical prognosis. Molecular therapy. Nucleic Acids. 2014;3:e189.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  57. Bielack SS, Kempf-Bielack B, Delling G, Exner GU, Flege S, Helmke K, et al. Prognostic factors in high-grade osteosarcoma of the extremities or trunk: an analysis of 1702 patients treated on neoadjuvant cooperative osteosarcoma study group protocols. J Clin Oncol. 2002;20(3):776–90.

    Article  PubMed  Google Scholar 

  58. Fujita Y, Takeshita F, Mizutani T, Ohgi T, Kuwano K, Ochiya T. A novel platform to enable inhaled naked RNAi medicine for lung cancer. Scientific Rep. 2013;3:3325.

    Google Scholar 

Download references

Authors’ contributions

Conception and design: JLZ, ZC, XLS and YYB. Subject recruitment: JLZ, HQ, XLS, and YYB. Data acquisition: BY, DGG, JFS, HQ, XLS and YYB. Analysis and interpretation of data: JLZ, BY, SSS, HQ and ZC. Drafting of the manuscript: JLZ, BY, SSS, SC, YYB, XLS, CVW, QH and ZC. Study supervision: ZC, XLS and YYB. All authors read and approved the final manuscript.


This work was supported by the National Natural Science Foundation of China (Grant no. 81160253, China) and Natural Science Foundation of Inner Mongolia (Grant no. 2011MS1158, China). JLZ was supported by Natural Science Foundation of Inner Mongolia. ZC and CVW are supported by NIDCD intramural project ZIA-DC-00016. We thank Dr. Jiro Kato at NIH/NHLBI and Han Si at NIH/NEI for critical reading of the manuscript and for valuable discussions.

Compliance with ethical guidelines

Competing interests The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Zhong Chen, Xiulan Su or Yongyi Bi.

Additional files

Additional file 1:

Table S1. List of differentially expressed genes with transcriptional regulation. Table S2. Gene Ontology (GO) and Ingenuity Pathway Analysis (IPA) of differentially expressed genes based on biological processes, canonical pathways and upstream regulators. a GO biological processes enriched (P<0.01, FDR<5%). b Upstream regulator analysis using Ingenuity Pathway Analysis 2013 (P<0.00001) Table S3. Comparison of differentially expressed genes with published CRC microarray studies

Additional file 2:

Figure S1. Ingenuity Pathway Analysis (IPA) identifies enriched top pathway networks in CRC.Figure S2. Oncoprint summary of genomic alterations in CRC from the TCGA database. Figure S3. Clininopathological characteristics of CRC samples

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Yan, B., Späth, S.S. et al. Integrated transcriptional profiling and genomic analyses reveal RPN2 and HMGB1 as promising biomarkers in colorectal cancer. Cell Biosci 5, 53 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: