Transposome production
Recombinant transposase enzyme (Tn5) was grown (pTXB1-Tn5 vector) and purified following the protocols described in Picelli et al. [36]. Cytosine depleted sciMET transposase-loaded oligonucleotides (1–96) were annealed (10 µL each 100 µM) to 10 µL 5′-[Phos]-CTGTCTCTTATACACATCT-3′ oligonucleotide (100 µM) within 80 µL EB buffer (Qiagen), incubating 2 min at 95 °C and cooled to room temperature (0.1 °C/s), following protocols [37]. Annealed oligonucleotides were dilute 2:5 (EB buffer), mixed with glycerol (50% final solution) and loaded (equal volume) into the recombinant Tn5 (15 μm) by incubation for 20 min at room temperature. Annealed oligonucleotide loading was confirmed by gel-shift assay and fragmentation efficiency of each transposome was confirmed (> 50%) by qPCR analysis [38].
Brain sample and nuclei isolation
NextSeq—Post-mortem flash-frozen prefrontal cortex tissue was obtained from a 93-year-old female donor with no diagnosis of neurological disease. Post-mortem flash-frozen cortex was obtained from a genetically modified (C9orf72) mouse. Following the protocols of Mulqueen et al., Brain tissue sections were resuspended in 5 mL of ice-cold NIB-HEPES solution (20 mM HEPES, 10 mM NaCl, 3 mM MgCl2). The tissues were equilibrated (5 min) and then dounce homogenized (10 loose strokes and 5 tight strokes) and filtered through 35–40 μm cell strainers (BD Biosciences, 352235). Nuclei were pelleted (600 g) and were transferred to a fresh tube containing 5 mL ice cold NIB-HEPES solution.
NovaSeq—Post-mortem flash-frozen tissue from the Primary Motor Cortex (BA4), Banks of the superior temporal sulcus (BA 22,41/42 [BA22]), Cerebellum (CRB) and Hippocampus (HIP) were obtained from a 47-year-old female donor with no diagnosis of neurological disease. Brain tissue was acquired from the NeuroBioBank (NIH) and approved by the Research Integrity and Ethics Administration of the University of Sydney. A high amount of cellular debris was observed by nuclei isolation protocols described in Mulqueen et al. [9], therefore nuclei were isolated instead following protocols described in Matevossian et al. [39] and resuspended in 5 mL ice cold NIB-HEPES solution.
Nucleosome depletion
Following the protocols of Mulqueen et al., nuclei were cross-linked using 135 µL of 37% formaldehyde, quenched with 400 µL of 2.5 M glycine and resuspended in 5 mL of ice-cold NIB (10 mM Tris HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Igepal (v/v), 1× protease inhibitors) solution, pelleted (500 g for 5 min), and washed using 900 µL of 1× NEBuffer 2.1 (NEB, B7202). To denature proteins, nuclei were mixed with 800 µL of 1× NEBuffer 2.1 and 12 µL SDS solution (20%) and incubated at 42 °C for 30 min with vigorous shaking. Nuclei were then mixed with 20 µL of 10% Triton-X (Sigma, 9002-93-1) and incubated at 42 °C for 30 min solubilize proteins/ increase nuclei permeabilization.
Fluorescent activated nuclei-sorting (FANS) and tagmentation
The nuclei were stained using 8 µL of 5 mg/mL DAPI dye (Thermo-Fisher, Cat. D1306) and filtered through a 35–40 μm cell strainer. FANS was performed on BD InFlux-7 L (sort 1), separating 1000 single nuclei per well in a 96-well plate containing 5 µL of 2×TB buffer [20 mM Tris(hydroxymethyl)aminomethane, 10 mM MgCl2 and 20% (v/v) dimethylformamide (DMF)] and 5 µL of NIB solution. To each well, 4 µL of 4.56 µM unique transposome (1–96) was added and incubated at 55 °C for 15 min with gentle shaking, adding the “Tn5 index”. All wells were then pooled, re-stained with 8 µL of 5 mg/mL DAPI and filtered. FANS was performed again (sort 2), separating 22 or 10 (control wells) single nuclei per well in a 96-well plate containing 2.5 µL of M-digestion buffer (Zymo, Cat. D5020-9), 0.25 µL of Proteinase K (Zymo, D3001-2-5), and 2.5 µL of H2O. Nuclei were then digested at 50 °C for 20 min with gentle shaking and the plate was then spun at 600 g for 5 min at 4 °C.
Bisulfite conversion
Prior to bisulfite conversion, 35 pg of pre-tagmented unmethylated lambda DNA was spiked into wells receiving 10 nuclei (sort 2). NextSeq; Each well was made up to 50 µL with H2O and bisulfite conversion was performed following manufacturer protocols using the EZ-96 DNA Methylation Kit (Zymo, Cat. D5004) and eluted twice (12.5 µL each using elution buffer) for a final volume of 25 µL. NovaSeq; Each well was made up to 20 µL with H2O and bisulfite conversion was performed following manufacturer protocols using the MethylCode BC conversion kit (Applied Biosystems, Cat. MECOV50) and eluted twice (12.5 µL each using elution buffer) for a final volume of 25 µL.
Enzymatic DNA methylation (EM) conversion
Prior to the EM conversion, 35 pg of pre-tagmented unmethylated lambda DNA and 14 pg of the tagmented CpG methylated pUC19 DNA were spiked into wells receiving 10 nuclei (sort 2). Sample volumes were made up to 20 µL with H2O and cleanup was performed using 1.8× using AMPure XP cleanup beads (Beckman Coulter, Cat. A63881) following manufacturer protocol with the exception that samples were incubated for 10 min at room temperature followed by a single wash step using 60 µL 80% EtOH and eluted 29 µL of elution buffer. Enzymatic conversion was then performed using the NEBNext Enzymatic Methyl-seq Conversion Module (New England Biolabs, Cat. E7125L) following manufacturers protocol (steps 1.5 to 1.9.11) for inserts 370–420 bp. Briefly, 5-Methylcytosines and 5-Hydroxymethylcytosines were oxidized using TET2 enzyme. DNA was cleaned using AMPure XP cleanup beads in place of NEBNext Sample Purification Beads. DNA was denatured in 0.1 N NaOH and cytosines were deaminated by APOBEC3A, cleaned and eluted using 25 µL of Elution Buffer.
Linear amplification
Full elution’s from both the bisulfite converted and EM converted libraries were transferred to a plate prepared with the following: 16 µL of PCR-clean H2O, 5 µL of 10×NEBuffer 2.1, 2 µL of 10 mM dNTP mix (New England Biolabs, Cat. N0447), and 2 µL of 10 µM of either the 9-nucleotide random primer (n9) previously described in the sciMET protocols [9] or the G-depleted (mg) random primer [8], containing a partial Illumina Standard Read 2 sequencing primer 5′-GGAGTTCAGACGTGTGCTCTTCCGATCT(H1:33340033)(H1)(H1)(H1)(H1)(H1)(H1)(H1)(H1)-3′. Four rounds of linear amplification were performed using 10 U of Klenow (3′–5′ exo) polymerase (Enzymatics, Cat. P7010-LC-L) followed by AMPure XP cleanup (1.1×) and elution in 21 µL of 10 mM Tris-HCl (pH 8.5) as previously described [9].
Library indexing and quantification
Indexing PCRs were performed in a 96-well plate to incorporate i5 and i7 indexes. The full elution’s from the linear amplification reaction were mixed with 2 µL each of the 10 µM forward and reverse indexing primers [9], 25 µL of 2xKAPA NEBNext Q5 Hot Start HiFi PCR Master Mix (New England Biolabs, Cat. M0543L), and 0.5 µL of 100× SYBR Green I dye (FMC BioProducts, Cat. 50513). Real-time PCR was performed on a QuantStudio 6 Flex real-time thermocycler (Applied Biosystems) with the following thermocycling conditions: 95 °C for 2 min, 20 cycles of 94 °C for 80 s, 65 °C for 30 s and 72 °C for 30 s [image]. The libraries were then pooled, cleaned using AMPure XP beads (0.8×) and eluted in 20 µL of 10 mM Tris-HCl (pH 8.5) as previously described [9]. Quantification of each sciMET(n9), sciMET(mg) and sciEM (combined n9 and mg) were performed using the KAPA qPCR Illumina library quantification kit (Kapa Biosystems Cat. KR0405) and the mean of each sciMET result (n9 = 397 nM, and mg = 638 nM) was calculated.
Library sequencing
NextSeq—sciMET (n9), sciMET (mg) and sciEM (combined n9 and mg libraries) were quantified separately by High Sensitivity D1000 ScreenTape (Agilent, Cat. 5067-5584). Libraries were pooled and sequenced on the Illumina NextSeq 500 (v2 2 × 75 bp cycle Mid-Output Kit) using a 0.9 pM loading concentration, 30% PhiX and custom Read 1 and Index 2 (i5) oligonucleotides matching chemistry temperatures [9]. Sequencing was performed using custom chemistry (Read1: 100 imaged cycles; Read2: 10 imaged cycles; Index1: 10 imaged cycles; Index2: 11 imaged cycles, 9 dark cycles, and 9 imaged cycles).
NovaSeq—sciMET(n9) and sciEM(mg) libraries were pooled and then quantified (as above). Libraries were sequenced on the NovaSeq 6000 (v1.5 2 × 300 bp SP Kit) using 116 pM loading concentration, 10% PhiX and custom Read 1 (as above). Sequencing was performed using custom chemistry (Read1: 142 imaged cycles; Index1: 10 imaged cycles; Index2: 7 dark cycles, 10 imaged cycles, 16 dark cycles, and 11 imaged cycles; Read2: 142 imaged cycles).
Bioinformatics
All scripts used for the processing and analysis of sciMET/sciEM data have been deposited and documented within https://github.com/zchatt/sciem_scripts.
Sequence read demultiplexing
NextSeq—BCL files were converted to fastq format using bcl2fastq “--create-fastq-for-index-reads --with-failed-reads --use-bases-mask Y*,I10,I20,Y*” generating 2 Read files (R1[100 bp] & R2[25 bp]) and 2 Index files (I1[10 bp, i7 index] & I2[20 bp; Tn5 & i5 indexes]) for each sequencing lane. Each R1, R2, I1 and I2 from multiple sequencing lanes were combined by linux cat command and I2 was split into individual Tn5[11 bp] and i5[9 bp] index files using linux awk command. Fastq files were demultiplexed if all 3 indexes (i5, i7 and Tn5) had a Hamming distance < 3 from the reference, as previously described [9].
NovaSeq—BCL files were converted to fastq format using bcl2fastq “--create-fastq-for-index-reads --use-bases-mask Y*,I10,I21,Y*” generating 2 Read files (R1[142 bp] & R2[142 bp]) and 2 Index files (I1[10 bp, i7] & I2[21 bp; Tn5 & i5]) for each sequencing lane. Reads were processed as described above with the exception I2 was split into individual Tn5[11 bp] and i5[10 bp] index files.
Sequence read trimming, alignment and DNA methylation extraction
Reads were firstly trimmed (trim 1) using TrimGalore! Software (v0.38.0) with options “--illumina --stringency 3”. A high number of sequences corresponding to the Linear Primer and read-through of the P7 flow-cell were observed, therefore reads were trimmed again (trim 2) using cutadapt software (v1.8.3) with options “--anywhere = AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --anywhere=GAAGAGCACACGTCTGAACTC --anywhere=ATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGG --minimum-length=20 --times=2” [40]. Read 2 sequences from the NovaSeq 6000 instrument displayed increasing “G” content > 60 bp that were largely poly-G sequences (1.2% reads) indicative of low signal intensity. Read 2 sequences were truncated using fastp software (v0.19.6) with options “--max_len1 60”. The human (GRCh38) or mouse (GRCm39) reference genomes were each combined with the lambda phage reference genome that is used for bisulfite/ enzymatic conversion control. Alignment of reads were performed using scBS-map software using the options “-l 9 -p 12 -n 10” [41]. Aligned reads were deduplicated using samtools software with options “rmdup” [42]. DNA methylation information was extracted from aligned deduplicated BAM files using cgmaptools with options “convert bam2cgmap” [43].
Single-cell discrimination and quality control
Single nuclei with < 100 unique mapped reads were removed. The unique read counts of single nuclei have previously been used to discriminate high quality single cells [9]. Briefly, k-means clustering (k = 3) of unique aligned reads per barcode (k-means, k = 3) was performed and normal distributions were fitted to each cluster (Additional file 1: Fig. S3). Barcodes with unique read counts passing 95% confidence interval threshold (cluster 1) were retained (64 sciMET and 710 sciEM). Bisulfite and enzymatic conversion efficiencies were, calculated as the 5mC % of reads aligned to unmethylated lambda phage genome. Mapping efficiencies were assessed (reads aligned/reads assigned per barcode). We determined the number of uniquely mappable cytosine dinucleotides by intersecting the within the hg38 reference genome with umap files (k = 100) downloaded https://bismap.hoffmanlab.org/ [44] using bedtools software with options “getfasta” and umap files (k = 100). NextSeq reads were processed as above with the exception that no unique mapped read thresholds were applied, and single nuclei were assigned as mouse or human based on the greatest read mapping efficiency to gr39 and hg38 genomes independently.
DNA methylation across genomic annotations
CpG and CpH methylation were summarized (3% window) for each single-cell across (± 5 kb) Ensembl gene annotations, CpG Islands (CGI) as well as ChIP-seq and DNase-seq annotations from the middle frontal cortex (ENCFF146VKE, ENCFF225RTW, ENCFF600AYY, ENCFF724XKK, ENCFF727KZF, ENCFF729EZH, ENCFF835ZYG, ENCFF860MVH from https://www.encodeproject.org) using cgmaptools with options “mfg” [43]. Frontal gyrus NeuN+/− CpG Differentially Methylated Regions (DMRs) generated by Lister et al. [22] were downloaded from http://brainome.ucsd.edu/BrainMethylomeData/CG_DMR_lists.tar.gz and converted to hg38 using rtracklayer and hg18ToHg38.over.chain. Neuron CpG DMRs for each of the 21 Neuron cluster described by Luo et al. [7] were converted to hg38 using rtracklayer and hg19ToHg38.over.chain. The hg38 locations of Differentially Expressed Genes (DEG’s) across non-neuron cell-types identified by Lake et al. [28] were extracted using R software biomart package (v 2.46.3) and were separated into gene body and promoter (1.5 kb upstream TSS). CpG and CpH methylation were summarized for each DMR and DEG across each single-cell using cgmaptools with options “mtr” [43].
Cell-type clustering analysis
We performed non-negative matrix factorization (NMF) on summarized CpH methylation across 100 kb genomic bins and CpG methylation across the Ensembl Regulatory Build [45] setting k = 12, as previously described [9]. CpH and CpG NMF matrices were weighted, merged by cell, and plotted into two-dimensional space using students t-distributed stochastic neighbor embedding (t-SNE). Cell clustering was performed using DBSCAN, as previously described [9] using an epsilon value of 1.3. Clustering analysis was performed using all sciEM single-cells (n = 710) identifying 16 clusters (Fig. 3A). In addition, we evaluated clustering using both sciMET and sciEM single-cells using summarized CpH and CpG (Additional file 1: Fig. S5), summarized CpG alone (Additional file 1: Fig. S6), as well as summarized CpG for sciEM single-cells alone (Additional file 1: Fig. S7). To identify the cell-type of each cluster, sequencing reads of all cell-types within a cluster were collapsed and CpG methylation was summarized for NeuN+/− DMR’s, as described above. Broad subtypes of non-neuronal cells were further classified by CpG methylation summarization of non-neuron DEG promoters, as described above. Non-neuron cell subtypes were defined by lowest (hypomethylated promoters) z-score (annotation × cluster matrix). Broad excitatory and inhibitory neuron subtypes were classified by CpG methylation summarization of promoter CpG DMRs of 21 neuron subtypes, as described above, and defined by hierarchical clustering of z-scores. We performed a linear regression analysis between neuron (n = 430) and non-neuronal (n = 280) cell-types using per-nuclei CpG, CHG and CHH DNA methylation levels controlling for read depth using R statistic software.