Stay updated with breaking news from Genome biol. Get real-time updates on events, politics, business, and more. Visit us for reliable news and exclusive interviews.
AIRE is an unconventional transcription factor that enhances the expression of thousands of genes in medullary thymic epithelial cells and promotes clonal deletion or phenotypic diversion of self-reactive T cells1–4. The biological logic of AIRE’s target specificity remains largely unclear as, in contrast to many transcription factors, it does not bind to a particular DNA sequence motif. Here we implemented two orthogonal approaches to investigate AIRE’s cis-regulatory mechanisms: construction of a convolutional neural network and leveraging natural genetic variation through analysis of F1 hybrid mice5. Both approaches nominated Z-DNA and NFE2–MAF as putative positive influences on AIRE’s target choices. Genome-wide mapping studies revealed that Z-DNA-forming and NFE2L2-binding motifs were positively associated with the inherent ability of a gene’s promoter to generate DNA double-stranded breaks, and promoters showing strong double-stranded break gen ....
Divergence of cis-regulatory elements drives species-specific traits1, but how this manifests in the evolution of the neocortex at the molecular and cellular level remains unclear. Here we investigated the gene regulatory programs in the primary motor cortex of human, macaque, marmoset and mouse using single-cell multiomics assays, generating gene expression, chromatin accessibility, DNA methylome and chromosomal conformation profiles from a total of over 200,000 cells. From these data, we show evidence that divergence of transcription factor expression corresponds to species-specific epigenome landscapes. We find that conserved and divergent gene regulatory features are reflected in the evolution of the three-dimensional genome. Transposable elements contribute to nearly 80% of the human-specific candidate cis-regulatory elements in cortical cells. Through machine learning, we develop sequence-based predictors of candidate cis-regulatory elements in different species and demonstrate t ....
Type 2 diabetes mellitus (T2D), a major cause of worldwide morbidity and mortality, is characterized by dysfunction of insulin-producing pancreatic islet β cells1,2. T2D genome-wide association studies (GWAS) have identified hundreds of signals in non-coding and β cell regulatory genomic regions, but deciphering their biological mechanisms remains challenging3–5. Here, to identify early disease-driving events, we performed traditional and multiplexed pancreatic tissue imaging, sorted-islet cell transcriptomics and islet functional analysis of early-stage T2D and control donors. By integrating diverse modalities, we show that early-stage T2D is characterized by β cell-intrinsic defects that can be proportioned into gene regulatory modules with enrichment in signals of genetic risk. After identifying the β cell hub gene and transcription factor RFX6 within one such module, we demonstrated multiple layers of genetic risk that converge on an RFX6 ....
In diploid organisms, biallelic gene expression enables the production of adequate levels of mRNA1,2. This is essential for haploinsufficient genes, which require biallelic expression for optimal function to prevent the onset of developmental disorders1,3. Whether and how a biallelic or monoallelic state is determined in a cell-type-specific manner at individual loci remains unclear. MSL2 is known for dosage compensation of the male X chromosome in flies. Here we identify a role of MSL2 in regulating allelic expression in mammals. Allele-specific bulk and single-cell analyses in mouse neural progenitor cells revealed that, in addition to the targets showing biallelic downregulation, a class of genes transitions from biallelic to monoallelic expression after MSL2 loss. Many of these genes are haploinsufficient. In the absence of MSL2, one allele remains active, retaining active histone modifications and transcription factor binding, whereas the other allele is silenced, exhibiting loss ....
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant g ....