Time-course single-cell RNA sequencing reveals transcriptional dynamics and heterogeneity of limbal stem cells derived from human pluripotent stem cells

Background Human pluripotent stem cell-derived limbal stem cells (hPSC-derived LSCs) provide a promising cell source for corneal transplants and ocular surface reconstruction. Although recent efforts in the identification of LSC markers have increased our understanding of the biology of LSCs, much more remains to be characterized in the developmental origin, cell fate determination, and identity of human LSCs. The lack of knowledge hindered the establishment of efficient differentiation protocols for generating hPSC-derived LSCs and held back their clinical application. Results Here, we performed a time-course single-cell RNA-seq to investigate transcriptional heterogeneity and expression changes of LSCs derived from human embryonic stem cells (hESCs). Based on current protocol, expression heterogeneity of reported LSC markers were identified in subpopulations of differentiated cells. EMT has been shown to occur during differentiation process, which could possibly result in generation of untargeted cells. Pseudotime trajectory analysis revealed transcriptional changes and signatures of commitment of hESCs-derived LSCs and their progeny—the transit amplifying cells. Conclusion Single-cell RNA-seq revealed time-course expression changes and significant transcriptional heterogeneity during hESC-derived LSC differentiation in vitro. Our results demonstrated candidate developmental trajectory and several new candidate markers for LSCs, which could facilitate elucidating the identity and developmental origin of human LSCs in vivo.


Background
Human limbal stem cells (LSCs) are located at a narrow area around the cornea and connect directly to the sclera [1][2][3]. Other than self-renewal capability for homeostasis maintenance, LSCs have unipotency to differentiated into corneal epithelial cells and play vital roles in corneal regeneration and repair [4]. However, internal or external factors, such as genetic mutations, chemicals, burns, bacteria etc., could result in limbal malfunction, and limbal stem cells deficiency (LSCD), and lead to reduced vision and blindness [5][6][7].
Among different treatment options, LSC transplantation is currently the best curative treatment that can improve both vision and quality-of-life in patients with ocular surface disorders caused by LSCD [8]. But the potential risks of infection during LSCs harvesting from donors and immunological rejection after transplantation hinder its broad application in the clinic. One promising alternative is the use of non-immunogenic limbal stem cells induced from engineered pluripotent stem cells, such as embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs) [9][10][11][12][13]. However, the complexity and heterogeneity of the LSC niche, and our limited knowledge about the development of human LSCs prevent us from developing reliable and efficient methods for LSCs differentiation [13,14].
Although the developmental origin of LSC remains enigmatic, most studies considered that the corneal epithelium descend from surface ectoderm (SE) [14,15], which also give rise to the epidermis and ectodermal associated appendages such as hair, ears, and the mammary glands etc. [16]. However, developmental surface ectodermal cells and their derivatives are difficult to isolate and study in human. Our understanding on cell-fate specification of the limbal stem cells in vivo are largely from studies of classic model organisms, such as mice [17,18] and Xenopus frogs [19]. But it is well-known that final maturation pathways are significantly different between humans and other model animals, though their pre-implantation development appear relatively similar [20]. Thus, the directed differentiation of human pluripotent stem cells (hPSCs) to LSCs could offer an alternative model system to explore these cells' identity and fate decisions for basic and clinical applications [9-12, 16, 21]. However, available differentiation protocols are still inefficient and suffer from excessive heterogeneity [22]. The lack of specific markers for LSCs, and our limited knowledge about intrinsic signaling cascades and developmental mechanisms of human LSCs hindered the clinical application of LSCs [14,21].
Single-cell RNA sequencing (scRNA-seq) is a powerful tool to quantify transcripts in individual cells to understand gene expression changes at single-cell resolution [23]. Since the first publication in 2009 [24], scRNA-seq has been increasingly utilized in many fields, such as in developmental biology to delineate cell lineage relationships and developmental trajectories [25][26][27]. In this study, we performed a time-course single-cell transcriptomic analysis of LSCs derived from human embryonic stem cells to investigate their transcriptional heterogeneity and expression changes during differentiation process in vitro.

Single-cell RNA sequencing revealed expression heterogeneity in hESC-derived LSCs
H9 human embryonic stem cells (hESC) were converted to LSCs via a surface ectodermal stage according to previous published protocols [15,28] (Additional file 1: Fig.  S1a). To characterize obtained hESC-derived LSCs, we performed scRNA-seq at four time points: Day 0 before induction, Day 7, Day 14, and Day 21 after induction. In total, 18 541 cells were sequenced, and data from 14 241 cells were used for the following analysis after filtering out low quality cells, including 4 687 cells, 4 784 cells, 3 210 cells, and 1 560 cells from Day 0, Day 7, Day 14, and Day 21, respectively (Additional file 1: Fig. S1b-S1e).
We also found high variability in the expression of some epithelial progenitor and candidate LSCs markers between Day 14 and Day 21. Percentage of cells expressing TP63 decreased from 11.21% at Day 14 to 2.46% at Day 21 (Fig. 1e). In contrast, most cells (85.67%) at Day 21 expressed ABCG2, one of the widely used markers of LSCs [14,[30][31][32], while only 21.82% of cells at Day 14 had ABCG2 expression. Furthermore, several markers of terminally differentiated LSCs, such as KRT3 and KRT12 (data not shown), were not detected in any cells at Day 14 and Day 21, indicating that these cells were still at immature differentiation stages.

Time-course Single-cell RNA-seq profiling showed specific changes of gene expression during hESCs-derived LSCs differentiation
To investigate transcriptional changes during hESCs to LSCs differentiation, we integrated data from the four time points for dimension reduction and visualization. Results showed that the cells were grouped into 11 clusters (Fig. 2a). Among the clusters, cluster 2 and 3 were from Day 0 (Fig. 2b). Not surprisingly, these cells exhibited highest expression level of pluripotent genes POU5F1, SOX2, NANOG, and DNMT3B (Fig. 2d). In contrast, expression of surface ectodermal genes, such as TFAP2A, TFAP2B, TFAP2C, HAND1, GATA3, IFR6, WISP1, and NR2F2, were upregulated throughout differentiation (Fig. 2d). Unexpectedly, epithelial genes such as CDH1, EPCAM, KRT8, and KRT18, were lowly expressed in cluster 1, while mesenchymal genes such as CDH2, COL1A1, COL1A2, and FBN1 were highly expressed, indicating that cluster 1 were mesenchymal cells. In addition, neural genes, such as COL2A1, SOX11, OTX1 and SIX1, were upregulated in cluster 9 (Fig. 2d). These results demonstrated that during this LSCs induction process, hESCs gave rise to cells with none epithelial characteristics.
Notably, differential gene expression (DGE) analysis showed that genes related to cell cycle and programmed cell death were highly expressed in cluster 8 and cluster 4, respectively (Fig. 2d). In cluster 8, expression of genes related to cell cycle such as TOP2A, MKI67, TPX2, BUB1B, and CEP55 were significantly upregulated, while SQSTM1, DDIT3, PPP1R15A, H1F0, and TRIB3 etc., which are involved in programmed cell death, showed  (Fig. 2d). To avoid the potential bias from cell cycle effects, we assigned cell cycle phase to each cell. Then, we only extract cells in G2M phase to compare the expression of cycle related genes. Results demonstrated that cycle related genes, such as TOP2A, MKI67, TPX2, BUB1B, and CEP55 etc., were highly expressed in cluster 8 as well (Additional file 1: Fig. S2d, e). These results demonstrated that there were no obvious cell cycle effects on data dimension reduction and cluster 8 were indeed rapidly proliferating cells.

Pseudotime analysis revealed unique hESC-derived LSCs developmental trajectory
To investigate hESC-derived LSCs developmental trajectory, we performed pseudotime analysis to study the path and progress of individual cells undergoing hESCsderived LSCs differentiation [34]. The resultant trajectory indicated that a trifurcation point in cluster 0 could lead to cells fate commitment toward cluster 1 (Branch 1), cluster 4 (Branch 2), and cluster 8 that further differentiate to cells in cluster 7, 10, 5 and 6 (Branch 3) (Fig. 3a, b).
In Branch 1, CDH1 (E-cadherin) and CDH2 (N-cadherin), two well-known cadherins, were differently expressed between cells in cluster 0 and cluster 1 (Additional file 1: Fig. S3b, c). Specifically, CDH1 was upregulated in cluster 0 while CDH2 was expressed significantly higher in cluster 1. The loss of epithelial surface marker CDH1 and the acquisition of mesenchymal marker CDH2 is considered as the hallmark of epithelial-mesenchymal transition (EMT), which play pivotal role in developmental regulation, such as neural crest formation [35]. Additionally, upregulated genes in cluster 1 were significantly overrepresented in nervous system development (Additional file 1: Fig. S3d), indicating the possible generation of neural crest like cells during the hESCs to LSCs differentiation process. In Branch 2, cells were undergoing programmed cells death (apoptosis) as mentioned in the section above (Additional file 1: Fig. S3d).
Apoptosis is a positive regulator of stem cells populations, it plays fundamental roles in development and tissue homeostasis [18,36].

Transcriptional difference of subpopulations in hESCs-derived LSCs
According to the expression comparisons, cluster 7 expressed most reported candidate LSC markers, including TP63 [37], KRT14 [38], KRT15 [39], ITGA6 [40] etc. (Additional file 1: Fig. S4a and Additional file 2). To investigate expression differences among subpopulations in hESCs-derived LSCs, we further focused on two-two comparisons among cluster 5, cluster 6, and cluster 7 (Additional file 1: Fig. S4b). Differential expression analysis demonstrated that upregulated genes in cluster 5 and cluster 6 showed significant enrichment in cell cycle process. In addition, genes involved in cell migration regulation were highly expressed in cluster 5 compared to cluster 6, including cadherin genes CDH5 and CDH13, integrin genes ITGA2, ITGA6, ITGA3, ITGB6, ITGB1, ITGA5 and ITGAV, collagen genes COL4A1, COL4A2, COL1A2 and COL3A1, and transcription factors SOX9, MYC, STAT3 etc. (Additional file 2). "X, Y, Z hypothesis" of corneal epithelial maintenance suggested that proliferation of basal cells (X) and migration of centripetal cells can replace lost cells from the ocular surface (Z) to support the corneal epithelial homeostasis [41]. Within the cornea, nuclear p63 (TP63) is expressed by the basal cells of the limbal epithelium, but not by the transit amplifying cells (TACs)s covering the corneal surface [37]. Therefore, these results suggested that cells in cluster 7 (TP63 + hESCs-derived LSCs) give rise to TACs (TP63 − hESCs-derived LSCs) in cluster 5 and cluster 6 ( Fig. 3a), both of which are the progeny of LSCs exhibiting higher, but limited proliferative activity [37,42].

Discussion
In this study, we performed a time-course transcriptome profiling of hESC-derived LSCs to revealed their gene expression changes and developmental trajectory at the single-cell level. Previous studies have shown that bona fide LSCs have the potential to establish and maintain long-term corneal repair. The identity of human LSCs have been investigated, and several candidate LSCs markers have been identified, such as TP63 [37], KRT14 [38], ITGA6 [40], NTRK1 [43], ABCG2 [30,31], KRT15 [39], ABCB5 [44]. Besides, the terminally differentiated markers KRT3 and KRT12 were absent in LSCs [14,45]. However, according to our single cell expression profiling data, hESC-derived LSCs showed significantly cellular heterogeneity using current protocols in our laboratory. For example, TP63 expressed cells only accounted for 6.034% of cells at Day 7, 11.21% of cells at Day 14, and 2.46% at Day 21 (Fig. 1e). Although several studies have successfully established candidate LSCs derived from human ESCs or iPSCs [10,15,28,[46][47][48][49][50][51], these protocols of differentiation are still time-consuming, expensive, or inefficient [52]. In addition, the reproducibility of differentiation is still one of the major challenges for in vitro LSCs induction, which may be affected by many factors, such as different differentiation propensities of cell lines or clones, batch effects in materials, different markers for efficiency and potency testing, etc. [53]. In this study, LSCs were differentiated from human ESCs according to the improved protocols [15,28], which were based on replicating signaling cues by small-molecule inhibitors and activators to promote ocular surface ectoderm development. This protocol is more compliant to good manufacturing practice standards for the future possible clinical applications. Although high differentiation efficiency was reported in the protocols we referred, our data indicated a complex developmental trajectory and the existence of heterogenic subpopulations which need to be further characterized. And the current hESCsderived LSCs differentiation methods with different cell lines need to be optimized.
Until recently, the developmental origin of LSCs remained elusive [14], and LSCs could be developmental descendants of the surface ectoderm as well as the periocular mesenchyme. Our scRNA-seq data revealed that EMT program were activated in the cluster of cells with neural crest characteristics at early hESC-derived LSCs differentiation stage (Additional file 1: Fig. S3). During organogenesis, epithelial cells can give rise to mesenchymal cells through EMT while the reverse process, mesenchymal-epithelial transition (MET), can generate epithelial cells [54], suggesting LSCs could be differentiated from the periocular mesenchyme through MET. However, our pseudotime trajectory analysis showed that induced mesenchymal cells did not generate LSCs under current culture conditions, and whether the periocular mesenchyme could give rise to LSCs remain to be confirmed. Meanwhile, we found excessive cell detachment occurred in cells cultured in the medium beyond 20 days, indicating the medium used need to be improved for LSCs generation and maintenance. Nevertheless, our pseudotime analysis identified a hESC-derived LSCs developmental trajectory. According to the trajectory, cell cycle related genes, such as CCNB1, CDC20, MKI67, and TOP2A, showed variable expression across hESCderived LSCs developmental pseudotime (Fig. 3f ). During organogenesis, cell cycle modulation is important for cell fate determination [55].
For long term restoration of visual function caused by LSCD, LSCs based transplantation either through autologous or allogenic grafting of limbal tissue, or cultured and expanded limbal cells have already shown effectiveness in the treatment [8]. However, so far, only TP63 positive LSCs were reported to be associated with therapeutic success [29]. But TP63 could not be applied to sort pure population of LSCs, and isolation of pure LSCs is still the bottleneck concerning the clinical application of LSCs. Therefore, other molecular markers are needed for successful prospective enrichment of LSC cells capable of long-term corneal restoration [14]. Identification of specific biomarkers for isolating and characterizing LSCs is crucial for both understanding their basic biology and translating in clinical application [14,19]. According to our scRNA-seq data, TP63 expressed LSCs showed relative quiescence compared to their progenies, and genes related to cell cycle were significantly upregulated in highly proliferative progenies (TACs), which are in line with previous reports that epithelial stem cells are relatively quiescent and give rise to TACs [56]. Besides reported markers-TP63 and ITGA6, TFs such as CXXC5, IRF6, SKIL, NR2F2, IRX4 etc., and CD genes Fig. 4 Transcriptional difference of subdipopulations in hESCs-derived LSCs. a-c Barplots showing GO biological process enrichment for upregulated genes compared between cluster 5 and cluster 7 (a), between cluster 6 and cluster 7 (b), and between cluster 5 and cluster 6 (c). Five terms with lowest p-value were presented. d Heatmap representing differentially expressed TFs among cluster 5, cluster 6, and cluster 7. e Heatmap representing differentially expressed top CD genes among cluster 5, cluster 6, and cluster 7. Wilcoxon Rank Sum test were performed for significant test and ten genes with lowest p_val_adj were presented such as SDC1, CD9, IGF1R, ALCAM etc., were newly identified as potential markers that highly expressed in TP63 expressed hESC-derived LSCs (Fig. 4c, d). Thus, these data provided valuable sources for characterization of LSCs and optimization of hESC-derived LSCs differentiation protocols.

Conclusions
In summary, we studied the time-course gene expression changes during hESC-derived LSCs differentiation in vitro at the single-cell level, and revealed significant transcriptional heterogeneity. Based on current differentiation protocol used in this study, expression heterogeneity of reported LSC markers were identified in subpopulations of differentiated cells. EMT has been shown to occur during differentiation process, which could possibly result in generation of untargeted cells. Pseudotime trajectory revealed transcriptional changes and signatures of commitment for LSCs and their progeny (TACs) that derived from pluripotent stem cells. Furthermore, some new potential markers for LSCs were identified, which are valuable for future investigation to elucidate identity and developmental origin of human LSCs.

Cell culture
The Ethics Committee of BGI-IRB approved this study. Human ESC lines H9 were cultured as previous description [57]. Briefly, cells were retrieved from liquid nitrogen tank and cultured in hESC medium (DMEM/F12 basic medium (Life Technologies), 20% knockout serum replacement (KSR, Life Technologies), 1 × L-glutamine (Life Technologies), 1 × MEM NEAA (Life Technologies), 0.1 mM 2-Mercaptoethanol (Life Technologies) and 50 ng/mL human FGF-2 (Life Technologies)) on mitomycin C (Sigma) treated murine embryonic fibroblasts (MEFs). To sustain undifferentiated states, cells were fed daily with fresh medium. For passaging, colonies were dispersed into small clumps with 1 mg/mL Collagenase IV (Life Technologies) for 20 min at 37℃, then plated onto Matrigel hESC-qualified Matrix (Corning)-coated dishes in mTeSR1 medium (Stemcell Technologies) at a ratio of 1:3 to 1:6. In the feeder-free medium, ReLeSR ™ (Stemcell Technologies) were used for dissociation and passaging according to the manual.

scRNA-seq library construction and sequencing
scRNA-seq experiments were performed by Chromium Single Cell 5′ Library & Gel Bead Kit (10 × Genomics), according to the manufacturer's protocol. Briefly, cells were digested with TrypLE ™ Select (ThermoFisher Scientific) and single cell suspension were harvested, washed with PBS twice, and filtered by 40 μm cell strainers (BD Falcon) before Gel Bead-In Emulsions (GEMs) generation and barcoding. Single-cell RNA-seq libraries were obtained following the 10 × Genomics recommended protocol, using the reagents included in the kit. Libraries were sequenced on the BGISEQ-500 (BGI) instrument [58] using 26 cycles (cell barcode and UMI [59]) for read1 and 108 cycles (sample index and transcript 5′ end) for read2.

Quality control
The scRNA-seq data were processed using cellranger-3.0.2 for each sample with default parameters mapping to the human GRCh38 genome expect the number of recovered cells (-expect-cells option) was set to 8 000.
For each library, we filtered outlier cells using the median absolute deviation from the median total library size (logarithmic scale), total gene numbers (logarithmic scale), as well as mitochondrial percentage, as implemented in scran, using a cutoff of 3 (isOutlier, nmads = 3) [60]. For filtering lowly or none expressed genes, genes expressed across all the cells detected in less than 10 cells were removed, and totally 22 501 genes were kept for downstream analysis. Then, clean gene-cell UMI count matrix was loaded as Seurat object using R package Seurat 3.0 [61] or cds object using R package monocle 3 [62] to manage our dataset for the further analysis with default parameters otherwise will be mentioned in detail.

Cell cycle phase assignment
To assign cell cycle phase, cell cycle scores (i.e., G2/M scores and S scores) and phases (i.e. G1, G2/M, and S) for each cell on the basis of scores using function CellCycle-Scoring from R package Seurat based on the expression levels of a panel of phase-specific marker genes [63].

Normalization and dimension reduction
The quality controlled dataset was then analyzed using the Seurat v.3.0 pipeline with NormalizeData function to normalize our data, FindVariableFeatures funtion to assign top 2000 highly variably expressed genes, Scale-Data function of argument vars.to.regress to remove confounding sources of variation (variables to be regressed out including mitochondrial mapping percentage, number of UMI). Following normalization and scaling, Run-PCA function were performed to capture principal components using the top 2000 highly variably expressed genes. UMAP was applied to visualize and explore data in two-dimensional coordinates, generated by RunU-MAP function in Seurat.

Cell cluster
For cell clustering, a graph-based clustering approach [61] were used to cluster the cells into candidate subpopulations. The first 50 PCs in the data were applied to construct an SNN matrix using the FindNeighbors function in Seurat v3 with k.param set to 20. We then identified clusters using the FindClusters command with the resolution parameter set to 0.5.

Differential expression analysis
To find differential expressed genes (DEGs), Wilcoxon Rank Sum test were performed for significant test using Seurat function FindAllMarkers for each cluster compared to all remaining cells and FindMarkers for distinguishing each other. Genes with average natural log fold change more than 0.25 and FDR less than 0.01 were assigned as DEGs.

Pseudotime trajectories analysis
For pseudotime trajectories analysis, the quality control dataset with cell clustering information were analyzed using the monocle3 (http://cole-trapn ell-lab.githu b.io/ monocle3/) pipeline. The new_cell_data_set function in the package was used to create cds object, and preproc-ess_cds function was applied for data normalization and principal component analysis with num_dim setting to 50. Then, reduce_dimension, cluster_cells, and learn_graph functions were used for data reduction, cell clustering, and pseudotime trajectories construction, respectively. UMAP was applied to visualize and explore data in two-dimensional coordinates using plot_cells function.
Additional file 1: Figure S1. Overview of the experimental procedure and data quality. Figure S2. Cell cycle phase assigned for cells throughout hESCs-derived LSCs differentiation. Figure S3. Pseudotime analysis characterizes expression changes throughout hESCs-derived LSCs differentiation. Figure S4. Transcriptional difference of subpopulations in hESCs-derived LSCs.
Additional file 2: Table S1. List of DEGs between cluster 8 and cluster 0. Table S2. List of DEGs between cluster 7 and cluster 0. Table S3. List of DEGs between cluster 6 and cluster 0. Table S4. List of DEGs between cluster 5 and cluster 0. Table S5. List of DEGs between cluster 7 and cluster 6. Table S6. List of DEGs between cluster 6 and cluster 8. Table S7. List of DEGs between cluster 5 and cluster 8. Table S8. List of DEGs between cluster 6 and cluster 7. Table S9. List of DEGs between cluster 5 and cluster 7. Table S10. List of DEGs between cluster 6 and cluster 5.