Skip to content

Charting thousands of novel gene loci in the human and mouse genomes through targeted full-length long-read RNA sequencing

Notifications You must be signed in to change notification settings

guigolab/CLS3_GENCODE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The GENCODE CLS Project

massively expanding the lncRNA catalog through capture long-read RNA sequencing

Tamara Perteghella1,2,*, Gazaldeep Kaur1,*, Sílvia Carbonell-Sala1,*, Jose Gonzalez-Martinez3,*, Toby Hunt3,*, Tomasz Mądry4, Irwin Jungreis5,6, Fabien Degalez1, Carme Arnan1, Ramil Nurtdinov1, Julien Lagarde1,7, Beatrice Borsari8,9, Cristina Sisu10, Yunzhe Jiang8,9, Ruth Bennett3, Andrew Berry3, Marta Blangiewicz4, Daniel Cerdán-Vélez11, Kelly Cochran12, Covadonga Vara13, Claire Davidson3, Sarah Donaldson3, Cagatay Dursun8,9, Silvia González-López1,2, Sasti Gopal Das4, Kathryn Lawrence14, Daniel Nachun14, Matthew Hardy3, Zoe Hollis3, Mike Kay3, José Carlos Montañés13, Pengyu Ni8,9, Emilio Palumbo1, Carlos Pulido-Quetglas15,16, Marie-Marthe Suner3, Xuezhu Yu8,9, Dingyao Zhang8,9, Francois Aguet6, Kristin Ardlie6, Stephen B. Montgomery14,17,18, Jane E. Loveland3, M. Mar Albà13,19, Mark Diekhans20, Andrea Tanzer21, Jonathan M. Mudge3, Paul Flicek3, Fergal J Martin3, Mark Gerstein8,9, Manolis Kellis5,6, Anshul Kundaje12,14, Benedict Paten20, Michael L. Tress11, Rory Johnson15,16, Barbara Uszczynska-Ratajczak4, Adam Frankish3, Roderic Guigó1,2

1. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain.
2. Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF).
3. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
4. Department of Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
5. Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA 02139, USA.
6. The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA.
7. Flomics Biotech, SL, Carrer de Roc Boronat 31, 08005 Barcelona, Catalonia, Spain.
8. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.
9. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
10. Department of Life Sciences, Brunel University London, Uxbridge, London, UB8 3PH, UK.
11. Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.
12. Department of Computer Science, Stanford University, Stanford, CA, USA.
13. Hospital del Mar Research Institute, Dr. Aiguader 88, Barcelona 08003, Spain.
14. Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
15. Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland.
16. School of Biology and Environmental Science, University College Dublin, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland.
17. Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
18. Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
19. Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain.
20. UC Santa Cruz Genomics Institute, 2300 Delaware Avenue, University of California, Santa Cruz, CA 95060, USA.
21. University of Vienna, Department of Biochemistry and Cell Biology, Vienna, Austria

* Equal contribution
Correspondence should be addressed to R.G. ([email protected])

GENCODE is a 20-year international project focused on producing high-quality annotations for human and mouse genomes, crucial for understanding gene function. While the human gene catalog for protein-coding genes is nearly complete, long non-coding RNA (lncRNA) annotations have remained inconsistent across different catalogs. To address this, GENCODE used targeted RNA sequencing to unify and expand lncRNA annotations in human and mouse, employing full-length sequencing across diverse tissues. This effort resulted in 16,817 new human genes and 22,210 new mouse genes, significantly increasing the lncRNA catalog and improving orthology mapping between species. These new annotations enhance the functional interpretation of genome data, linking previously unannotated regions to biological functions.

In this repository:

Summary of the steps taken to process long-read data, upon sequencing but prior to LyRic. Measures undertaken to assess the quality of the data prior to downstream processing are also detailed here.

List of the files used in this work, complemented with descriptions of the steps taken to generate them, links to direct download, and detailed information about formats and tags.

Datasets used in this work, complemented with useful information regarding the files and their processing prior to analysis.

Codes used in various downstream analyses.

About

Charting thousands of novel gene loci in the human and mouse genomes through targeted full-length long-read RNA sequencing

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •