'Dark gene' hidden in human DNA just revealed
There may still be thousands of 'dark' genes missing from our record of the human genome. These hard-to-identify sequences of genetic material can code for tiny proteins, some involved in disease processes such as cancer and immunology, a global consortium of researchers has confirmed. They could explain why past estimates of the size of our genome were much larger than the Human Genome Project discovered 20 years ago.
The new international study, still awaiting peer review, shows that our human gene library is a work in progress, as advances in technology have picked up more subtle genetic features, and as continuous searches uncover gaps and errors in the record. .
These overlooked genes are hidden in regions of our DNA that don't code for proteins. These regions were once dismissed as 'junk DNA' but it turns out that small bits of these sequences are still being used as instructions for mini-proteins.
Institute of Systems Biology proteomicist Eric Deutsch and colleagues found a large cache of them by searching genetic data from 95,520 experiments for fragments of protein-coding sequences. This includes research using mass spectrometry to investigate small proteins, as well as a catalog of protein snippets detected by our own immune system.
Instead of the long, well-known codes that start reading DNA instructions to make proteins, indicating the starting point of a gene, these 'dark' genes are preceded by shortened versions that have allowed them to be overlooked by scientists.
Despite these missing segments in their start sequence, non-canonical open reading frames (ncORF) genes are still used as a template for making RNA, and some of these are then used to make small proteins with a few amino acids. Previous studies have shown that cancer cells contain hundreds of small proteins.
"We believe that the identification of these newly-confirmed ncORF proteins is of great importance," the team wrote in their paper. "Their proteins... may have direct biomedical relevance, highlighting the growing interest in targeting such cryptic peptides through cancer immunotherapy, including cellular therapies and therapeutic vaccines."
Some of the genes that encode these secret peptides are transposons that roam our genomes, including sequences inserted into us by viruses. Others are what researchers call spurious. For example, some of the proteins that exist from mass spectrometry evidence were only found in cancer samples, so their corresponding genes may not naturally belong in our bodies.
“So, it’s possible that some of the NCORF peptides reflect aberrant proteins whose existence is thought to be out of context with the canonical proteome,” Deutsch and team explain.
Of this set of 7,264 non-canonical genes, the researchers found that at least a quarter of them can make proteins. That amounts to at least 3,000 new peptide-coding genes to be added to the human genome, and the team suspects that there are thousands more, which previous proteomic techniques have missed.
“It’s not every day that you open a research direction and say, ‘We might have a whole new class of drug targets for patients,’” neurooncologist John Prensner of the University of Michigan tells Elizabeth Pennisi in Science .
The tools the team has developed will help other researchers continue to unravel this dark genetic subject.
The study is awaiting peer review on bioRxiv .