This manuscript was published in Nature Genetics in February 2021. Click here to download PDF version of paper.
Raw sequencing data for this paper were deposited with NCBI GEO under accession number GSE137193..
Data used to train, evaluate and interpret the BPNet models are found on zenodo at https://doi.org/10.5281/zenodo.3371215. Trained BPNet models and all the model interpretation results are on zenodo at https://doi.org/10.5281/zenodo.3371163. The BPNet model trained on ChIP–nexus data is available on Kipoi under the name BPNet-OSKN (http://kipoi.org/models/BPNet-OSKN/). Genome browser tracks showing observed/predicted ChIP–nexus signal and contribution scores for all factors are available at https://genome.ucsc.edu/s/mlweilert/mesc_OSKN_tracks. ATAC-seq data in mouse ESCs used in Fig. 2 and Supplementary Fig. 7 were obtained from GSE134680. Blacklisted regions used to filter genomic coordinates throughout the analysis are available at https://www.encodeproject.org/files/ENCFF547MET. RepeatMasker mm10 annotations were obtained from http://www.repeatmasker.org/genomes/mm10/RepeatMasker-rm405-db20140131/mm10.fa.out.gz. The nuclear magnetic resonance structure 1O4X used to render Sox2 and Oct1 in Fig. 3 is available at https://www.rcsb.org/structure/1o4x. TRANSFAC (v.7.0) was used to identify the TFIIIC B-box discussed in Fig. 3. The PH0134.1 Pbx PWM used for motif validation in Supplementary Fig. 8 and Extended Data Fig. 5 was obtained from JASPAR at http://jaspar.genereg.net/api/v1/matrix/PH0134.1.jaspar. The MA0141.1 Esrrb PWM used in Extended Data Fig. 5 was obtained from JASPAR at http://jaspar.genereg.net/api/v1/matrix/MA0141.1.jaspar. The transfer RNA database GtRNAdb (v.2.0, release 17.1) annotations and associated tRNAscan-SE scores used in Extended Data Fig. 5 were obtained from http://gtrnadb.ucsc.edu/GtRNAdb_archives/release17/genomes/eukaryota/Mmusc10/mm10-tRNAs.tar.gz. Source data are provided with this paper.