Single-cell DNA methylation sequencing by combinatorial indexing and enzymatic DNA methylation conversion

Background DNA methylation is a critical molecular mark involved in cellular differentiation and cell-specific processes. Single-cell whole genome DNA methylation profiling methods hold great potential to resolve the DNA methylation profiles of individual cell-types. Here we present a method that couples single-cell combinatorial indexing (sci) with enzymatic conversion (sciEM) of unmethylated cytosines. Results The sciEM method facilitates DNA methylation profiling of single-cells that is highly correlated with single-cell bisulfite-based workflows (r2 > 0.99) whilst improving sequencing alignment rates, reducing adapter contamination and over-estimation of DNA methylation levels (CpG and non-CpG). As proof-of-concept we perform sciEM analysis of the temporal lobe, motor cortex, hippocampus and cerebellum of the human brain to resolve single-cell DNA methylation of all major cell-types. Conclusion To our knowledge sciEM represents the first non-bisulfite single-cell DNA methylation sequencing approach with single-base resolution. Supplementary Information The online version contains supplementary material available at 10.1186/s13578-022-00938-9.


Introduction
The covalent addition of a methyl group to cytosine bases in mammalian DNA (DNA methylation) is one of the most highly studied epigenetic modifications [1]. Primarily occurring in the CpG context, DNA methylation is critical for organism development [2] and plays an essential role in regulating gene expression during cellular differentiation [3]. Cell-types have highly specific DNA methylation patterns [4] necessitating the analysis of DNA methylation in pure cellular populations, however limited cell surface markers or highly interconnected tissue networks prohibit cell isolation from tissues such as the human brain.
Single-cell whole genome bisulfite sequencing techniques have recently been described [5][6][7][8] that can produce single-base resolution DNA methylation information from which cell-specific whole genome DNA methylation profiles (methylomes) can be reconstructed bioinformatically. However, these sequencing library preparations are prohibitively expensive for most labs because of high reagent costs associated with single-cell single-well reactions. Recently a single-cell combinatorial indexing (sci-) bisulfite sequencing approach (termed sciMET) was described in which nuclei are sorted and tagged with sequencing indexes over multiple rounds, forming unique combinations of indexes per nuclei [9]. The sciMET approach allows multiple-nuclei single-well reactions, thus reducing reagent costs.
Single-cell DNA methylation sequencing is still in its infancy, however the technique's importance cannot be overstated as it enables the DNA methylation profiling of cell-types that lack cell surface markers that can be used for their isolation, such as those in the brain [10]. Further, defining cell-specific DNA methylation patterns have been critical for understanding mechanisms of disease, such as cancer [11], and represent important markers for cell-of-origin molecular diagnostic assays [12]. Notably, targeted single-cell epigenomics is emerging as a promising approach for high throughput functional screening of disease relevant gene regulation [13]. To our knowledge high throughput functional screens using single-cell DNA methylation sequencing have yet to arrive, however, single-cell reduced representation bisulfite sequencing workflows can lower experimental costs by enriching for dense regions of CpG dinucleotides (CpG Islands) that are important in gene regulation at the expense of whole genome coverage [14].
Bisulfite sequencing is the gold standard method for DNA methylation analysis [15,16] but is not without limitations. It has been estimated that 84-96% of DNA is degraded during the bisulfite conversion reaction [17]. Additionally, methylated DNA is overrepresented in WGBS libraries leading to an over-estimate of DNA methylation levels [18] particularly in CHG and CHH contexts [19]. This is especially relevant in the analysis of DNA methylation in embryonic stem cells and neurons that have been reported to exhibit high levels of CHG and CHH methylation [20][21][22][23][24][25].
Unmethylated cytosines can also be deaminated by APOBEC enzymes, resulting in base changes analogous to bisulfite conversion (sequenced as T) [26]. Notably, enzymatic conversion is less degradative to DNA and can produce high quality single-base resolution DNA methylation data [26]. Such attributes may be particularly beneficial in situations where DNA content is limited such as the single-cell analysis of DNA methylation. However, enzymatic conversion in single cells is challenging due, in part, to multiple reaction cleanup steps required that results in DNA loss. A major advantage of the sciapproach over single-cell/single-well methods is the ability to perform deamination reactions of multiple cells per-well, thus increasing per-well DNA content. Here we combine sci-with enzymatic conversion (termed sciEM, Fig. 1a) and show application by characterizing singlebase DNA methylation profiles of human brain cell-types without the need for cell-type markers (e.g. NeuN). The sciEM approach accurately captures CpG methylation dynamics across annotated regulatory features of the human genome. Both CpG and non-CpG (CpH) methylation estimates are lower than bisulfite conversion and we find no evidence of higher global CpH DNA methylation within neurons from the temporal lobe, motor cortex, hippocampus or cerebellum of the human brain. The sciEM approach represents an economical method for single-cell single-base resolution DNA methylation analysis.

Library construction and sequencing read processing
Using frozen post-mortem brain tissue from mouse (NextSeq, n = 1) and human (NextSeq, n = 1 and NovaSeq, n = 4) we were able to construct both sci-MET and sciEM single-cell libraries in parallel (Additional file 1: Fig. S1). Within the sciEM workflow we use a G-depleted (mg) random linear primer that we observed to improve CpH mapping within preliminary experiments (NextSeq, Additional file 1: Fig. S2). Both bisulfite and enzymatic conversion efficiencies were high, 99.99% and 99.94% respectively, however the sci-MET method produced ~ 10× the amount of library than sciEM (518 nM vs. 53 nM by RT-qPCR). Post-sequencing (NovaSeq), single-nuclei were identified by unique barcode combinations (Tn5, i5 and i7 barcodes). Singlenuclei with > 100 unique mapped reads were observed to have significantly higher mapping efficiency, more paired-reads, larger insert sizes and a lower proportion of reads mapped using local alignment (Students t-test p-value's < 5 × 10 −17 , Additional file 1: Fig. S2a-d), representing high-quality single-nuclei. Following k-means clustering of unique mapped reads, a total of 710 and 64 high quality single-nuclei from sciEM and sciMET workflows were retained for analysis (Additional file 1: Fig. S3e, f ), representing 54 and 58% of the nuclei fluorescently sorted for each workflow respectively. The mapping efficiency was 58 ± 5% and 64.9 ± 10% for sci-MET and sciEM libraries respectively (Fig. 1b), but we note the sciMET mapping efficiencies were lower than previously reported [9], results that are partially attributed to the removal of 4% of reads that contained substantial linear primer sequences (Fig. 1b). The number of mapped reads were higher in sciMET (mean 254,194 ± 191,560) than sciEM (150,515 ± 147,388) (Students t-test p-value = 7.25 × 10 −5 , Fig. 1b) leading to higher coverage of mappable cytosine dinucleotides (e.g., CpG p = 1.91 × 10 −10 , Additional file 1: Fig. S4a). However, the library loading concentration largely influences read counts and, proportional to mapped reads, the sciEM method covered a greater number of all cytosine dinucleotides, particularly CpT and CpC dinucleotides (p < 6.1 × 10 −10 , Fig. 1c).

DNA methylation
Global DNA methylation levels were observed to be lower in sciEM than sciMET libraries for CpG (54% vs. 71%), CHG (1.0 vs. 1.2%) and CHH (1.2 vs. 2.2%) cytosine contexts (p-value < 7.10 × 10 −8 ). WGBS library preparation methods have previously been reported to have an over-representation of methylated fragments [18]. In line with these reports, we observed an increase in the 5mC levels from 5′ to 3′ of sciMET reads for CpG, CHG and CHH contexts (student t-test p-value < 0.003, first 50% bases vs. last 50% bases). In contrast, 5mC levels were stable across sciEM reads (Fig. 1d). Of note, the 5mC levels of chromosome 21 were significantly lower than other chromosomes, an effect that was pronounced within sciEM libraries. We observed a very high correlation between the CpG methylation of sciEM and sci-MET across annotated genomic features (e.g., R 2 = 0.996, ± 5 kb Ensemble genes, Pearson's p-value = 6.9 × 10 −120 ).
CpG hypomethylation is generally associated with chromatin accessibility and gene activity and sciEM successfully captured global DNA methylation dynamics across regulatory regions e.g. hypomethylation of gene promoter regions, CpG Islands and open chromatin (DNase-seq) (Fig. 2a-c). We observed CpG hypomethylation across annotated regions enriched for active histone marks (e.g. H3K4me3, H3K9ac, H3K27ac) and H3K4me1 boundaries ( Fig. 2d-g). The H3K27me3 histone modification is a marker of bivalent polycomb regulated promoters in which dynamic crosstalk between DNA methylation controls gene expression [27] and we observed CpG hypomethylation across genomic regions enriched for H3K27me3 (Fig. 2h). Conversely, hypermethylation was observed within annotated regions enriched for repressive histone modifications (H3K36me3 and H3K9me3) (Fig. 3i, j). Of note, sciEM CpG methylation levels are significantly lower than sciMET measurements e.g., 14% lower across annotated genes (± 5 kb) (paired t-test p-value = 3.46 × 10 −65 ). The difference in CpG DNA methylation between sciEM and sciMET was highly (adapted from [9]). b Per cell read processing metric's. c Cytosine dinucleotides covered as percentage of mapped bases. d, e DNA methylation bias plots for sciEM and sciMET methods respectively for reads mapping to autosome's and the X-chromosome, with close-up of CHG and CHH DNA methylation (bottom). H = A, C or T correlated to the underlying CpG methylation levels ( Fig. 2k).

Single-cell clustering
To assess the ability of sciEM to discriminate cell types we summarized CpG DNA methylation of regulatory regions (Ensembl Regulatory Build) and CpH methylation across 100 kb genomic bins (used to cluster Neuronal cell-types [7]) across 710 high quality single nuclei (methods). The DNA methylation information of each single nuclei was combined using NMF and projected into 2-dimensional space (tSNE) from which 19 clusters were identified (Fig. 3a). We observed distinct patterns of DNA methylation within established Differentially Methylated Regions (DMRs) distinct to Neurons and Non-Neuronal cell-types from the human brain [22] which enabled the identification of 11 neuron (n = 430) and 8 non-neuronal clusters (n = 280) (Fig. 3b). As gene promoter hypomethylation is associated with gene activity, the promoter DNA methylation status of established Differentially Expressed Genes (DEG's) of non-neuron cell-types [28] enabled the identification of Astrocytes (n = 30), Endothelial cells (n = 12), Microglia (n = 45), Oligodendrocyte precursor cells (OPC's, n = 5), Oligodendrocytes (n = 130) and Pericytes (n = 8) (Fig. 3c). Using the CpG DNA methylation levels at annotated neuronal cell-type CpG DMRs [7] we identified 6 excitatory (n = 279) and 5 inhibitory (n = 151) neuronal cell-type clusters (Fig. 3d). We found no significant difference in the per-nuclei CpG, CHG or CHH are defined by unique colors. b Heatmap of summarized CpG methylation z-score's of clusters across annotated Neuron and Non-neuronal DMR's. c Heatmap of summarized CpG methylation z-score's of non-neuronal cell clusters across annotated non-neuronal DEG's. d Heatmap of summarized CpG methylation z-score's of neuronal clusters across annotated neuronal subtype CpG DMR's. e Boxplots of CpG, CHG and CHH DNA methylation of each cell-type. Ast astrocyte, Exc excitatory neuron, End endothelial cells, Inh inhibitory neuron, Mic microglia, Olig oligodendrocyte, OPC oligodendrocyte precursor cells, Per pericyte DNA methylation levels between neuron and non-neuronal cell-type clusters (Fig. 3e).

Discussion
To our knowledge we present the first enzyme based single-cell DNA methylation method with single-base resolution. The original sciEM method extends singlecell combinatorial indexing approaches developed using sodium bisulfite (sciMET). Bisulfite sequencing is problematic for single-cell sequencing as it degrades the limited amount of DNA in each cell [29], however enzymatic based conversion of unmethylated cytosines has been shown to be less degradative to the DNA, resulting in more genomic coverage, even at 100 pg amounts [26]. However, the enzymatic conversion of ~ 132pg (22 nuclei) produced ~ 10× less library than bisulfite conversion, likely due to the increased number of wash steps. We further theorized that a greater reduction in the library would have occurred if the DNA content of each reaction was further reduced via a single-cell single-well system. Hence, our adoption of the combination of the enzymatic conversion process with a multiple-nuclei single-well approach, such as single-cell combinatorial indexing, has likely contributed to the success of creating the first enzymatic DNA methylation single-cell libraries.
To generate sciEM libraries we replaced the 9-N linear amplification primer of the sciMET protocol with a G-depleted random primer which we discovered to have bound more efficiently to genomic fragments devoid of cytosines following conversion (enzymatic or bisulfite converts ~ 95% of cytosines to uracils). G-depleted random primers have been previously shown to improve library complexity, coverage uniformity and reduce artefactual reads [8]. Generally, the sciEM approach results in higher library loss during construction, and a lower amount of library input into the second barcoding PCR (i5 and i7), preferential amplification of smaller molecules (reduced insert size), and a higher rate of duplicate sequences compared to sciMET. However, the data quality is high. We show high correlation of DNA methylation between sciEM and sciMET (r 2 = 0.99) approaches, accurate recapitulation of DNA methylation dynamics across gene features and the ability to resolve single base resolution of single-cell types.
Since the first whole genome bisulfite sequencing (WGBS) study, a multitude of techniques have been developed to characterize genome-wide methylation [30]. However, as DNA methylation patterns are unique to single-cell-types, it is essential to move towards single-cell DNA methylation profiling [31]. In disease states, DNA methylation patterns are known to be altered, leading to aberrant cascades of molecular changes [3]. Hence it is important to have accurate base level estimates of DNA methylation levels. Whole genome bisulfite sequencing approaches have been reported to over-represent methylated fragments [18]. We observed 5mC bias within sci-MET sequencing reads, elevating from 5′ to 3′, that were not observed within sciEM. Further, 5mC levels were lower in sciEM, an effect that was greater within CHH loci (typically unmethylated) relative to CpG loci (typically methylated). Whilst we cannot rule out over conversion of 5mC within sciEM, our observations of a 5mC bias within sciMET sequencing reads (elevating from 5′ to 3′), comparatively higher 5mCHH (typically unmethylated) then 5mCpG (typically methylated) and a slightly higher conversion efficiency indicate a potentially overrepresentation of methylated fragments in sciMET.
Neurons exhibit distinct DNA methylation patterns, particularly in non-CpG (CpH) loci, compared to nonneuronal brain cells [22,32] as well as between neuronal subtypes [7,33]. To our knowledge we present the first whole genome DNA methylation assessment of brain cell-types using enzymatic DNA methylation assessment. We did not observe significantly higher CpH methylation within neurons as previously reported [22]. Further, our single-cell analysis of brain cell types omits the use the NeuN antibody for neuron selection, hence we cannot rule out the possibility that NeuN-positivity (nuclei surface marker) of Neurons relates to CpH DNA methylation.

Conclusion
The brain is a highly heterogeneous environment, comprising multiple neuronal and glial cell types with unique functions and at various stages of differentiation. Previously, the understanding of disease mechanisms progressed by studying cell populations in bulk which revealed only the average features of the population's constituents and can obscure the cell-to-cell variability. Since the first whole genome bisulfite sequencing (WGBS) study, a multitude of techniques have been developed to characterize genome-wide methylation [30] at singlebase resolution. Moreover, as DNA methylation patterns are unique to cell-types, it is essential to move towards single-cell DNA methylation profiling [31]. Single-cell strategies have already yielded novel mechanistic insights into brain function [34]. The sciEM approach represents an invaluable tool in assessing CpH DNA methylation function within cell-types with reportedly increased CpH DNA methylation, such as the brain and stem cells [35].
Many DNA methylation signatures have been described for distinct cellular phenotypes including celltype, pluripotency [36], age [37] and disease state [11]. As sci-based approaches offer a lower cost to entry for single-cell DNA methylation analysis, we anticipate that combining sciEM with high-content screening, such as library of small molecules and CRISPR pooled screens, will create powerful new tools to evaluate mediators and mechanisms of cellular phenotypes in human health and disease and beyond into non-medical fields such as agriculture. The sciEM method represents an economical, high-throughput approach for single-cell DNA methylation at single-base resolution.

Brain sample and nuclei isolation
NextSeq-Post-mortem flash-frozen prefrontal cortex tissue was obtained from a 93-year-old female donor with no diagnosis of neurological disease. Post-mortem flashfrozen cortex was obtained from a genetically modified (C9orf72) mouse. Following the protocols of Mulqueen et al., Brain tissue sections were resuspended in 5 mL of ice-cold NIB-HEPES solution (20 mM HEPES, 10 mM NaCl, 3 mM MgCl 2 ). The tissues were equilibrated (5 min) and then dounce homogenized (10 loose strokes and 5 tight strokes) and filtered through 35-40 μm cell strainers (BD Biosciences, 352235). Nuclei were pelleted (600 g) and were transferred to a fresh tube containing 5 mL ice cold NIB-HEPES solution.
NovaSeq-Post-mortem flash-frozen tissue from the Primary Motor Cortex (BA4), Banks of the superior temporal sulcus (BA 22,41/42 [BA22]), Cerebellum (CRB) and Hippocampus (HIP) were obtained from a 47-yearold female donor with no diagnosis of neurological disease. Brain tissue was acquired from the NeuroBioBank (NIH) and approved by the Research Integrity and Ethics Administration of the University of Sydney. A high amount of cellular debris was observed by nuclei isolation protocols described in Mulqueen et al. [9], therefore nuclei were isolated instead following protocols described in Matevossian et al. [39] and resuspended in 5 mL ice cold NIB-HEPES solution.

Bisulfite conversion
Prior to bisulfite conversion, 35 pg of pre-tagmented unmethylated lambda DNA was spiked into wells receiving 10 nuclei (sort 2). NextSeq; Each well was made up to 50 µL with H 2 O and bisulfite conversion was performed following manufacturer protocols using the EZ-96 DNA Methylation Kit (Zymo, Cat. D5004) and eluted twice (12.5 µL each using elution buffer) for a final volume of 25 µL. NovaSeq; Each well was made up to 20 µL with H 2 O and bisulfite conversion was performed following manufacturer protocols using the MethylCode BC conversion kit (Applied Biosystems, Cat. MECOV50) and eluted twice (12.5 µL each using elution buffer) for a final volume of 25 µL.

Enzymatic DNA methylation (EM) conversion
Prior to the EM conversion, 35 pg of pre-tagmented unmethylated lambda DNA and 14 pg of the tagmented CpG methylated pUC19 DNA were spiked into wells receiving 10 nuclei (sort 2). Sample volumes were made up to 20 µL with H 2 O and cleanup was performed using 1.8× using AMPure XP cleanup beads (Beckman Coulter, Cat. A63881) following manufacturer protocol with the exception that samples were incubated for 10 min at room temperature followed by a single wash step using 60 µL 80% EtOH and eluted 29 µL of elution buffer. Enzymatic conversion was then performed using the NEBNext Enzymatic Methyl-seq Conversion Module (New England Biolabs, Cat. E7125L) following manufacturers protocol (steps 1.5 to 1.9.11) for inserts 370-420 bp. Briefly, 5-Methylcytosines and 5-Hydroxymethylcytosines were oxidized using TET2 enzyme. DNA was cleaned using AMPure XP cleanup beads in place of NEBNext Sample Purification Beads. DNA was denatured in 0.1 N NaOH and cytosines were deaminated by APOBEC3A, cleaned and eluted using 25 µL of Elution Buffer.

Library indexing and quantification
Indexing PCRs were performed in a 96-well plate to incorporate i5 and i7 indexes. The full elution's from the linear amplification reaction were mixed with 2 µL each of the 10 µM forward and reverse indexing primers [9], 25  . The libraries were then pooled, cleaned using AMPure XP beads (0.8×) and eluted in 20 µL of 10 mM Tris-HCl (pH 8.5) as previously described [9]. Quantification of each sciMET(n9), sciMET(mg) and sciEM (combined n9 and mg) were performed using the KAPA qPCR Illumina library quantification kit (Kapa Biosystems Cat. KR0405) and the mean of each sciMET result (n9 = 397 nM, and mg = 638 nM) was calculated.

Bioinformatics
All scripts used for the processing and analysis of sci-MET/sciEM data have been deposited and documented within https:// github. com/ zchatt/ sciem_ scrip ts.

Single-cell discrimination and quality control
Single nuclei with < 100 unique mapped reads were removed. The unique read counts of single nuclei have previously been used to discriminate high quality single cells [9]. Briefly, k-means clustering (k = 3) of unique aligned reads per barcode (k-means, k = 3) was performed and normal distributions were fitted to each cluster (Additional file 1: Fig. S3). Barcodes with unique read counts passing 95% confidence interval threshold (cluster 1) were retained (64 sciMET and 710 sciEM). Bisulfite and enzymatic conversion efficiencies were, calculated as the 5mC % of reads aligned to unmethylated lambda phage genome. Mapping efficiencies were assessed (reads aligned/reads assigned per barcode). We determined the number of uniquely mappable cytosine dinucleotides by intersecting the within the hg38 reference genome with umap files (k = 100) downloaded https:// bismap. hoffm anlab. org/ [44] using bedtools software with options "getfasta" and umap files (k = 100). NextSeq reads were processed as above with the exception that no unique mapped read thresholds were applied, and single nuclei were assigned as mouse or human based on the greatest read mapping efficiency to gr39 and hg38 genomes independently.

Cell-type clustering analysis
We performed non-negative matrix factorization (NMF) on summarized CpH methylation across 100 kb genomic bins and CpG methylation across the Ensembl Regulatory Build [45] setting k = 12, as previously described [9]. CpH and CpG NMF matrices were weighted, merged by cell, and plotted into two-dimensional space using students t-distributed stochastic neighbor embedding (t-SNE). Cell clustering was performed using DBSCAN, as previously described [9] using an epsilon value of 1.3. Clustering analysis was performed using all sciEM single-cells (n = 710) identifying 16 clusters (Fig. 3A). In addition, we evaluated clustering using both sciMET and sciEM single-cells using summarized CpH and CpG (Additional file 1: Fig. S5), summarized CpG alone (Additional file 1: Fig.  S6), as well as summarized CpG for sciEM single-cells alone (Additional file 1: Fig. S7). To identify the celltype of each cluster, sequencing reads of all cell-types within a cluster were collapsed and CpG methylation