Alterations of RNA splicing patterns in esophagus squamous cell carcinoma

Alternative splicing (AS) is an important biological process for regulating the expression of various isoforms from a single gene and thus to promote proteome diversity. In this study, RNA-seq data from 15 pairs of matched esophageal squamous cell carcinoma (ESCC) and normal tissue samples as well as two cell lines were analyzed. AS events with significant differences were identified between ESCC and matched normal tissues, which were re-annotated to find protein coding genes or non-coding RNAs. A total of 45,439 AS events were found. Of these, 6019 (13.25%) significant differentially AS events were identified. Exon skipping (SE) events occupied the largest proportion of abnormal splicing events. Fifteen differential splicing events with the same trends of ΔΨ values in ESCC tissues, as well in the two cell lines were found. Four pathways and 20 biological processes related to pro-metastasis cell junction and migration were significantly enriched for the differentially spliced genes. The upregulated splicing factor SF3B4, which regulates 92 gene splicing events, could be a potential prognostic factor of ESCC. Differentially spliced genes, including HNRNPC, VCL, ZNF207, KIAA1217, TPM1 and CALD1 are shown with a sashimi plot. These results suggest that cell junction- and migration-related biological processes are influenced by AS abnormalities, and aberrant splicing events can be affected by splicing factor expression changes. The involved splicing factor SF3B4 was found to be a survival-related gene in ESCC and is presumed to regulate AS in multiple cancers. In summary, we identified significant differentially expressed AS events which may be related to the development of ESCC.


Introduction
Esophageal cancer is the sixth leading cause of cancer fatalities in the world [1], with a five-year survival of less than 20% [2]. Esophageal cancer is mainly classified into two types that differ in epidemiology and pathology: esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC). About 90% of esophageal cancer cases worldwide are ESCC [1]. Studies committed to the molecular mechanism of ESCC development need to be carried out further to help improve the diagnosis, treatment and prognosis of this detrimental disease.
Pre-mRNA splicing is a biological process, carried out by a spliceosome consisting of five snRNP complexes and more than 200 auxiliary proteins [4], that removes intron lariats and joins exons together to produce mature mRNAs [3]. In the process of alternative splicing, alternative splice site selection within pre-mRNAs results in the production of various mature mRNAs from a single gene [5]. It has been reported that alternative splicing plays important roles in normal organ development and cell differentiation [6]. However, splicing pattern abnormalities caused by mutations of cis-acting sequences and spliceosomal proteins can also trigger human diseases, including cancer [7,8]. Altered splicing patterns influence multiple aspects of cancer development including cell proliferation, programmed cell death, metabolism, angiogenesis, and tumor metastasis [5,9]. Genome-wide studies indicate that tumor development often involves large-scale variations in alternative splicing [10].
Recent developments in high-throughput sequencing technologies and computational tools have revolutionized our capacity to investigate AS at the genome-wide level. In 2018, changes in exon usage were found across the full TCGA cohort, where tumor samples had an increase in abnormal splicing events as compared to normal samples for the majority of the investigated cancers [11]. Using PSI (percent spliced in) and DEXSeq methods, 28 pairs of colorectal cancer and normal tissues were analyzed, where 9 significant AS genes were found with two splicing events associated with patient overall survival [12]. Applying public data from TCGA, more than 200 DEGs (Differentially Expressed Genes) with AS events were identified and a risk score based on 12 genes associated with overall survival was established in colon adenocarcinoma [13]. In kidney renal clear cell carcinoma, 13 survival-specific genes were significantly correlated with AS events from SpliceSeq database [14].
The MISO model was originally developed to assess the expression of alternatively spliced exons and transcripts [15]. MISO uses confidence intervals to assess exon and transcript abundance, but also detects differentiallyspliced exons or transcripts. This is accomplished by calculating the posterior distribution of unobserved random variables from the RNA-Seq data [15]. The combination of RNA-seq data and computational tools enables AS analyses at a genome-wide level, which provides insight into AS mechanisms and potential clinical application. In this study, MISO was utilized to characterize the splicing patterns and try to identify the regulatory roles of splicing factors on splicing abnormalities in ESCC.

Source of sequencing data and routine analysis
RNA-seq data of 15 pairs of ESCC tissues was generated from our laboratory, which has been published in the previous study [16]. FastQC was used to measure the quality of the sequences [17]. The paired-end sequences were then aligned against the UCSC (http://genom e.ucsc.edu/) human genome (hg19) [18] using TopHat2 v2.0.13 [19] and Bowtie2 [20] using default parameters. Alignments were further analyzed with Cufflinks [21] to calculate the abundance (FPKM) of each gene and isoform using the hg19 gene annotations, and with DESeq [22] to identify differentially-expressed splicing factors. To validate the differentially-expressed splicing factors, two microarray datasets, named GSE53624 [23] and GSE23400 [24], were utilized to perform differential expression analysis with the siggenes package in R2.15.3. The RNA-seq data from a pair cell lines, an immortalized human esophageal epithelial cell line (SHEE) and its malignant transformed cell line (SHEEC) [16], was applied as the test cohort in this study.

Identification of AS events in ESCC
The splicing patterns of ESCC were analyzed with MISO [15] using exon-centric annotations V.2.0, and the results were filtered with a cut-off of ΔΨ > 0.2, Bayes-factor > 5, and a sum of inclusion and exclusion reads number > 10. Sashimi plots were used to visualize AS events with significant differences which occurred in ≧ 3 pairs of ESCC tissues, and between the SHEE and SHEEC cell lines as well.

Re-annotation of differential AS events
AS events used in the previous analysis were annotated with human genome (hg19) alternative events v2.0, and their annotations were derived from Ensembl genes (Ensembl database), knownGenes (UCSC database) and RefSeq genes (NCBI database). In order to further verify the AS event annotations and find out their represented genes and transcripts, AS events with significant differences in at least 3 pairs of ESCC tissue samples were re-annotated. Homo_sapiens.GRCh37.75.gtf from the Ensembl database [25] and GRCh37_latest_genomic. gff from the RefSeq database [26] were chosen for reannotation. Since the mRNA sequences and gene-related annotations of the UCSC database were derived from GenBank and RefSeq databases, those annotations were not adopted for re-annotation of AS events with several criterion described as follows: (i) if three exons in an SE (Skipped exon) event appear in the same transcript, the SE event is judged to be corrected, (ii) if two exons in an RI (Retained intron) event are present in the same transcript and there are no other exons between these two exons, then the RI event annotation is determined to be corrected, (iii) if adjacent exons, for example exons 1-2-4 and 1-3-4, occur in the same transcript in an MXE (Mutually exclusive exon) event, and exons 1-4 do not appear in any one transcript, the MXE event is judged as correct, (iv) if the variable exon (the longer or the shorter exon) and constitutive exons in an A3SS (Alternative 3'splice site) event occurs in the same transcript, and between these two exons there are no other exons, the A3SS event annotation is determined to be corrected, (v) if the variable exon and the constitutive exon in an A5SS (Alternative 5'splice site) appear in the same transcript, and between the two exons there are no other exons, the A5SS event annotation is judged to be correct. Since there are no chromosome numbers in the RefSeq exon annotations, the chromosomal locations of exons using RefSeq annotations should be checked later.

Functional enrichment analysis
DAVID online tools [27] were utilized to perform functional enrichment analysis towards GO biological process and KEGG pathways. The significant cut-off was set to P < 0.05. Dot plots from R package named ggplot2 [28] were then applied to visualize the significantly enriched GO terms and KEGG pathways.

Differential AS events occurred both in ESCC tissues and SHEE/SHEEC cell lines
After the results of differential AS events from the SHEE and SHEEC cell lines were filtered, the following information, including AS event names, Ψ values from the SHEE cell line, Ψ values from the SHEEC cell line and ΔΨ values from SHEE/SHEEC cells were selected. The reannotation results of the AS events in ESCC tissues using gene annotations were overlapped with SHEE/SHEEC AS events, then the similar trend of ΔΨ values of each overlapped AS event in tissues and in cell lines was judged.

Splicing regulatory network establishment
Gene expression of each sample was combined and merged with the splicing factor list, which is available from the SpliceAid2 database, which contains 2,220 target sites of 62 human splicing proteins and their expression data in 320 tissues [29]. Spearman correlation tests were performed between splicing factors and the splicing event/genes, by the calculation of the expression levels and the PSI values, accordingly. The significant pairs of splicing factors and splicing genes were chosen to establish a splicing regulatory network. Cytoscape software was used to visualize the splicing regulatory network, which is an open source platform for visualization of molecular interaction [30].

Survival analysis
The expression and clinical data of 87 ESCC patients was obtained from the Xena browser database (https :// xenab rowse r.net/datap ages/). The log-rank (Mantel-Cox) test was used to perform survival analysis and a Kaplan-Meier curve was plotted using GraphPad software.

Statistical analysis
Statistical analyses were performed using SPSS 19.0 (IBM, Chicago, IL, USA) or R 3.15.3 (Auckland, New Zealand) for Windows. Comparisons of the relative expression of between paired tumor and non-tumor tissues, were performed using a paired t-test. Overall survival time was calculated by the Kaplan-Meier method and analyzed by the log-rank test. A two-tailed P-value < 0.05 was considered to be statistical significant.

Overall mapping rate after genome alignments
Reads from the RNA-seq of each esophageal clinical case were mapped to human genome sequences. The results are shown in Table 1, where the total alignment number of the sequenced fragments ranged from 15 to 39 million, and the comparison rate ranged between 83 and 92%, suggesting the high reliability of these RNA-seq data.

Changes in AS patterns between ESCC tissue/matched normal tissue and two cell lines
From the results of splicing analysis, Using MISO, a total of 45,439 AS events were detected. Of these, 6,019 AS events were significantly differently spliced in ESCC tissues as compared to normal tissues, with a cut-off of BF > 5, |ΔΨ|> 0.2. These significant AS events included a number of inclusion and exclusion reads > 10, accounting for 13.25% of the total number of AS events, suggesting a relative high splicing index in ESCC (Fig. 1a). After filtering out the AS events with inconsistent ΔΨ trends in the paired samples, there were 5,150 differential splicing events, including 2,324 exon skipping (SE) events, 1,014 intron retention (RI) events, and 693 mutually exclusive exon (MXE) events, 592 alternative 3' splice site (A3SS) events, and 527 alternative 5' splice site (A5SS) events. The AS results of ESCC tissue samples showed that the proportion of SE events that could be detected were the most abundant at about 33.6%, RI events at 6.9%, MXE 6.8%, A3SS 8.5%, and A5SS 6.4% of the total events. The percentage of SE events in differential splicing events was still the highest as 4.8%, while RI events comprising 2.5%, MXE events comprising 1.5%, A3SS events comprising 1.2%, and A5SS events comprising approximately 1.1%. Individual differences were detected during MISO analysis which were more obvious in the differential splicing events (Fig. 1b). From the volcano plot, it could be observed that the ΔΨ values and log2(BF) values in detectable AS events and the differentially spliced events were symmetrically distributed ( Fig. 1c), Also, it could be seen from the and bar plot that the ΔΨ trend did not have a significant difference between AS event type (Fig. 1d).s On the other hand, a total of 32,891 AS events were detected in the SHEE/SHEEC cell lines, of which a total of 928 AS events were significantly differentially spliced, accounting for approximately 2.8% of the total detectable AS events (Fig. 1e). The threshold for judging the differences in AS events in the SHEE/SHEEC cell lines was the same as applied in ESCC AS event filtration. In the detectable AS events, the proportion of SE was the largest, at 55.9%, with RI at 9.2%, MXE at 10.9%, A3SS accounting for 13.6%, and A5SS at 10.3%. Of the differential splicing events, the number of SE, RI, MXE, A3SS, and A5SS events were 455, 131, 152, 96, and 94, accounting for 49%, 14.1%, 16.4%, 10.3%, and 10.1% of the total number of differential splicing events, separately (Fig. 1f ). The distribution of both total and differential AS is similar in ESCC tissues and ESCC cell lines.

Re-annotation of AS events in paired ESCC tissues
To identify the genes and transcripts representing the differential AS events and to filter out incorrect AS events, as described in the methods, we re-annotated all differentially splicing events found in at least 3 paired samples. The number of filtered significant AS events changed across the different categories of AS due to the re-annotation, but the event trend had minimal changes (Fig. 2a). For SE events, there were a total of 222 and 151 significant alternatively spliced events before and after re-annotation, reducing the number of AS events to approximately 68% of all events. There were a total of 134 RI significant AS events before reannotation, and 115 after re-annotation, with a reduction of 14.2%. A total of 62 significant MXE events were identified before re-annotation and only 6 remained after re-annotation, reducing the total number to 9.7%. 57 significant A3SS events were found before re-annotation while 42 remained after re-annotation. 35 out  (Fig. 2b). Through we found different annotations from these two popular databases, the majority of our annotation was found in both databases. We have also calculated the fractions of AS events representing protein-coding mRNA and ncRNA (non-coding RNA) (Fig. 2c). The percentage for ncRNA ranged from 10% to more than 20% in total events, suggesting alternative splicing is prevalent for ncRNA and its critical role in cancer [31]. The top 10 most prevalent splicing events that occurred in the paired samples are shown in Table 2. To better illustrate the different splicing patterns, HNRNPC, CALD1, KIAA1217, TPM1, VCL and ZNF207 were selected from the top genes and their splicing patterns are visualized by Sashimi plots (Fig. 3). These included representative examples for MXE, A5SS, RI and SE splicing types, which indicates the splicing diversity in ESCC.

Overlapping AS events in ESCC tissues and SHEE/SHEEC cell lines
A total of 37 significant AS events were identified in both ESCC tissues and cell lines, 15 of which showed the same trend of ΔΨ values. Among these 15 AS events, there were 4 SE events, 7 RI events, 2 MXE events, and 2 A5SS events (Additional file 1: Table S1). Though the overlapping rate is quite low, we presume this might reflect the consensus and difference between clinical samples and cultured cell lines.

Classification of AS events according to representative transcript types
After the re-annotation of the 349 AS events from ESCC tissues, 312 events represented at least one protein-coding transcript, which accounted for 89.4% of the total AS events. The remaining 37 AS events did not represent any protein-coding transcripts, including protein coding transcripts, lncRNAs, and micro-RNAs (Additional file 2: Table S2). Among the 235 AS events re-annotated by Ensembl transcript annotations, 200 events represented at least one proteincoding transcript, which accounted for 85.1% of the total AS events, and the remaining 35 AS events represented non-protein-coding transcripts. Of the 283 variable splicing events re-annotated by RefSeq transcript annotations, 266 events represented at least one protein-coding transcript, which accounted for 94% of the total AS events, and the remaining 17 AS events represented non-protein-coding transcripts. These results suggest that splicing of ncRNA transcripts could be just as prevalent as protein-coding transcripts.

Functional enrichment revealed the potential roles of AS transcripts
To identify the biological processes and pathways affected by splicing abnormalities in ESCC, we performed the enrichment analyses. A total of 270 protein-coding genes were selected for functional and pathway enrichment analysis from the results of alternative splicing events after re-annotation of differential AS events with gene annotations from the Ensembl and RefSeq databases. A total of 234 genes were enriched in 543 biological processes. There were 382 biological processes satisfying the threshold of P < 0.05, and 39 genes were enriched in 12 pathways, of which 10 pathways satisfied P < 0.05. Interestingly, there were 4 KEGG pathways and 20 biological processes that potentially related to invasion and metastasis of cancer cells, such as adherens junction, ECM-receptor interaction, focal adhesion, regulation of cell mobility, regulation of cell migration and cell-matrix adhesion ( Fig. 4). These results indicated that AS in ESCC might involve multiple cellular functions to promote tumor progression.

Splicing regulatory network in ESCC
To find the regulators of the splicing alterations in ESCC, we constructed a splicing regulatory network by calculating the FPKM values of splicing factors and PSI values of AS events. Filtered with a cut-off of P < 0.05 after Spearman correlation analysis, 5,999 splice factorsplicing event pairs and 5,511 splice factor-target gene pairs were chosen to build a splicing regulatory network. This network involved 81 splicing factors and 223 differential AS events (Fig. 5). Most of splicing factors could regulate dozens of target genes. On the other hand, one target gene could be processed by several splicing factors, suggesting the co-operation between these factors. Next, Splicing factors that were differentially expressed in ESCC and matched normal tissues were measured in two public microarray datasets (GSE53624 and GSE23400) and the RNA-seq dataset generated from our laboratory [16]. From these analyses, the splicing factor SF3B4 was significantly differentially expressed in ESCC with a 1.5 fold-change in three different datasets (Table 3). For the splicing factors in the network, SF3B4 (splicing factor 3b subunit 4) caused us great interest. The splicing regulatory network for SF3B4 (splicing factor 3b subunit 4) and its target genes (with significant Spearman correlation) were also illustrated, which contained 92 target Fig. 2 Distribution of RNA splicing events with significant differences in ESCC after re-annotation with Ensembl and RefSeq gene annotations. a Plot in the left panel indicates exon models of the types of AS assessed by MISO analysis. In the right panel, the different types of AS events before and after re-annotation were quantified with transcript annotations from Ensembl and RefSeq database, respectively. b A total of 283 and 235 splicing events with significantly altered PSI values were identified using transcript annotations from the RefSeq and Ensembl databases, respectively. In total, 169 splicing events were identified in both database transcript annotations. c Percentage of protein coding transcripts (blue) and non-coding transcripts (yellow) representing AS events after re-annotation analysis genes and 102 abnormal AS events (Fig. 6a), as some genes generated more than one AS event. Moreover, SF3B4 was found to be a survival-related gene in ESCC. In TCGA dataset, ESCC patients with a lower expression level of SF3B4 had a longer survival time (Fig. 6b). The PSI value of SRSF5's RI event was found to have the smallest p-value with Spearman correlation analysis with the FPKM of SF3B4 (Fig. 6c). To understand the potential  functions of SF3B4, functional enrichment analysis was also performed to identify the biological processes for SF3B4's target genes, resulting in a dot plot of 9 significant biological processes, which involves cell-cell adhesion, NF-KB signaling and regulation of translation (Fig. 6d). Many evidences indicated AS are often linked to the occurrence of cancer driver mutations in genes encoding either core components or regulators of the splicing machinery [32]. However, we searched cBio-Portal database (http://www.cbiop ortal .org/), but no mutation for SF3B4 in current TCGA ESCC clinical samples was found (data not shown).

SF3B4 affects AS after its knockdown
To validate the regulatory role of SF3B4 on AS events in tumor samples, we have downloaded two RNA-seq datasets with SF3B4 knockdown, and calculated the inclusive levels of AS events with IncLevelDifference between control shRNA and SF3B4 knockdown (methods described in Additional file 3). From the 102 SF3B4 regulated AS events described above. We selected 36 AS events with

Discussion
As an important step in post-transcriptional regulation, AS greatly contributes to proteome diversity, as well as cancer progression and development. As a common phenomenon in cancers, it has been reported that an endogenous soluble form of VEGFR-2, a product of alternative splicing, is present in humans and mice [33]. Our previous research found that endogenous soluble VEGFR-2 is down-regulated in ESCC, but its high expression is an independent prognostic factor for poor survival [33]. We identified a novel splice variant of LOXL2 (Lysyl oxidase-like 2), named LOXL2Δ72, which lacks 72 nucleotides encoding 24 amino acids [34]. Though LOXL2Δ72 has dramatically reduced enzymatic activity, it promotes greater cell migration and invasion than its wild type counterpart [34]. However, alternative splicing studies in ESCC remain at a low-throughput level, and analyses of AS events in ESCC on a genome-wide scale are still rare. Although aberrant splicing events in ESCA samples have been found and related to overall survival, detailed splicing patterns of ESCC are still unknown, and splicing-mode changes between ESCC and matched normal tissues have not been characterized [35]. In this paper, previously published next-generation sequencing data was used to study AS in ESCC. Because our RNAseq data was generated from ESCC tissue and matched normal tissue of the same patient, our data is suitable for MISO analyses and splicing-mode change characterization. There are RNA-seq data generated from a pair of cell line samples, an immortalized human esophageal epithelial cell line (SHEE) and its malignant transformed cell line (SHEEC), which was also used as a validating dataset. This comparison reflects the consistence and difference between cancer tissues and cultured cells while considering their AS pattern.
In total, 6,019 significant AS events were identified in 15 pairs of ESCC tissues. It is also noteworthy that a large proportion of the AS events were found to only have significant differences in a small number of paired samples, which suggests that AS differences in many genes might have a similar impact on the development of ESCC. The proportion of AS events with ΔΨ > 0 and ΔΨ < 0 in ESCC tends to be consistent, which indicates that the current AS classification system cannot promise all the true AS types, and the real AS modes may be more complicated. We have noticed that the AS annotations in MISO are based on existing transcripts, so alternative splicing analysis can not discover new transcripts, but unknown AS regulatory mechanisms based on known genes and transcripts could be revealed by the MISO analysis. Even so, there are many problems with the official MISO AS annotations, in which several exons in mutually exclusive exon events are found present in the same transcript, making the accuracy of the annotation confusing. In this study, transcriptional annotations from Ensembl and RefSeq database were used to re-annotate the previously analyzed AS events and to remove incorrect splicing events to make the subsequent analysis more accurate.
Many laboratories have reported some clues on the mechanism of alternative splicing affecting cancers. For example, the normal transcript for CD44 was expressed in normal gastric tissue, and the upregulation of its splice variant is associated with invasion and metastasis of gastric tumors [36]. Higher expression of CD44 variant 6 has been reported in gastric tumor with lymph node metastasis, suggesting that CD44 variant 6 plays a role in the metastasis of gastric cancer [37]. BP1, a splicing isoform of DLX4, is highly expressed in endometrial cancer, which correlates with patient poor prognosis, cancer pathological grade, tumor invasion and metastasis [38]. For the first time, from the perspective of systems biology, we show the biological processes and KEGG pathways affected by alternative splicing in ESCC development. By functional enrichment analyses, multiple GO terms and pathways are observed in relation to cell junction and cell migration, indicating that altered splicing may promote invasion and metastasis in ESCC. For the top differential AS genes generated from ESCC in this study, many of them have been reported to play an important role in tumor progression. HNRNPC is a typical splicing factor, and its regulation by alternative splicing has not been reported. Alternative splicing of ZNF207 has been reported to be regulated by SRSF11 [39], but its AS pattern has not been reported. VCL encodes an F-actin-binding cytoskeletal protein, compared to VCL-001, VCL-201 contains an extra exon named exon 19. The skipped exon 19 is reported to be increasingly preferred in colon tumor cells compared to normal mucosa [40], which is consistent with its Sashimi plot model shown in this study. CALD1 is a calmodulin-and actin-binding protein with regulatory functions, and has been reported to play an important role in cell movement. In colon cancer tissues, the longest CALD1 transcript is reported to be reduced, with the increase of its shorter splicing isoform [40]. However, Sashimi plot results in this study show that there is an alternative 5' splice site event in CALD1, which also involves a relatively down-regulated expression of its longest isoform. Intron retention of KIAA1217 has been found in nonsmall cell lung cancer, but RT-PCR results do not support this result [41]. Our results indicate that there is an RI event for KIAA1217, but the retained intron tends to be included in matched normal tissues. Regarding TPM1, it has been reported that exon 6 of TPM1 has two types, TPM1-6A and TPM1-6B, but are the same 76 bp size. In bladder cancer and prostate cancer, TPM1 exon 6A and 6B are reported to have opposite splicing trends [42]. We also show there is an MXE event in TPM1s, and both the mutually exclusive exons are 76 bp long. Though our top AS genes are generated from paired ESCC tissues and cell lines, they need further verification, both in clinical samples and cultured cell lines.
Splicing factors, which participate in the ribonucleoprotein (RNP) complexes that play key roles in AS regulation often drive different AS patterns in a tissue-and cell-type-specific manner, both in normal and cancerous conditions [43]. A key role for SF3B4 in diseases in regards to its splicing function has been revealed in recent years. A heterozygosity for SF3B4 mutations is found in Rodriguez syndrome, which leads to reduced SF3B4 synthesis and defects in mRNA splicing, primarily exon skipping [44]. Overexpressing SF3B4 resulted in a mis-splicing of Kruppel-like factor 4 (KLF4), a tumor suppressor-encoding gene, into a non-functional transcript in cancer cells, and thus promoting tumorigenesis in HCC [45].
But the relationship between SF3B4 and ESCC has not been identified. In this paper, we found that SF3B4 is up-regulated in ESCC, which is similar to the results of an HCC study, in which SF3B4 is an up-regulated diagnostic marker [46]. Moreover, according to the log-rank (Mantel-Cox) test, SF3B4 can be identified as a prognostic marker of ESCC. Consistently, a new paper found over-expression of SF3B4 may play a crucial role in the lymphatic progression of ESCC, during the submission of our study [47]. Spearman correlation analysis was utilized to calculate the correlation between the degree of splicing events and the expressions of splicing factors in ESCC, according to a previous method [48]. SF3B4 was also found to play an important role in regulating 92 protein coding genes and their aberrant splicing events in ESCC. Functional enrichment analysis indicates that SF3B4's downstream targets may play a role in cell-cell junction and nuclear factor kB (NF-kB) processes.
To identify the target genes of SF3B4 in tumors, significant AS events were investigated in SF3B4 knockdown cells. Six AS events regulated by SF3B4 in tumor samples overlapped with the top SF3B4 regulated AS events in ESCC. The oncogenic role of SRSF5 [49], PHF6 [50], SNHG12 [51] and KIAA1217 [52] have been reported in different tumors, but we firstly discovered that intron retention in these genes are repressed by SF3B4. Retained intron (RI) repression potentially leads to productive transcripts, because retained introns usually produce premature termination codons, which may generate unproductive, unstable transcripts or truncated proteins [53]. These results reveal that SF3B4 regulates the expression pattern of many cancer-associated genes through AS mechanisms. However, it is still necessary to verify their regulatory mechanism and subsequent biological effects through validation experiments.

Conclusions
In this study, AS events were analyzed from RNA-seq data from matched normal and ESCC tissue samples\ pairs and two cell lines where 13.25% significant differential AS events were identified, which are involved in cell junction and migration. The splicing factor SF3B4 is a potential prognostic factor and a splicing regulator in ESCC. In summary, we identified thousands of significant differential AS events which might be related to the development of ESCC.