Machine learning and gene regulation

Genome dynamics

High throughput experiments such as next generation sequencing are often used to answer simple biological questions; “which genes are more expressed in breast cancer compared to normal?”.
Given the huge amount of information generated for each experiment, this is equivalent to having a privileged access to an oracle and asking “what time is it?”.
Machine learning is an excellent tool for discovering hidden information in large amounts of data. These not only allow life scientists to get better answers but also to generate novel hypotheses.
Our lab looks for opportunities in medical and fundamental biology data where information theory and machine learning can make a substantial impact.
A few examples of our discoveries include using Shannon’s Entropy to discover transcriptional disorder in cancer (PLoS CB, 2008), simulating a biologists behavior to identify a method to detect microRNA targets (Nature Methods, 2009) and using novel bioinformatics strategies to discover the impact of introns on gene expression (Cell, 2013; Genome Biology 2017; Nature Communications 2017).

Symposium Artificial Intelligence in Biology and Health, October 2018: Back to the symposium with all the videos of the speakers

We are holding a one-day symposium that will bring together world-class experts from the fields of Artificial Intelligence, Biology and health. The speakers have been selected based on their expertise in their field and their ability to speak to a broad audience. Our objective is to enable the audience to move outside of their own comfort zone and discover recent breakthroughs in a different field of science. The theme of the day and subject of our roundtable discussion will be on how to get the fields of AI, biology, and health to work together in science and academia. This experiment is free but requires registration as places are limited.

Videos of the Symposium

Dr. Hervé Seitz - Biology by numbers

Dr. Gregory Beurrier - Genetic algorithms in biology

Dr. Mohammad Afshar - 'Genomic driven precision medicine paradigm ...'

Dr. Felix Balazard - 'Addressing the challenges of privacy and AI literacy for healthcare'

Dr. Fabien Michel - Individual based modelling

Dr. Chedy Raissi - Deep learning and data-mining of rare events

Dr. Thomas Walter - Artificial Intelligence for ImageryDr. Thomas Walter - Artificial Intelligence for Imagery

Pr. Van Parunak - Multi-agent systems

Pr. Pierre Le Coz - Conférence grand public en français 'L'homme et la technique : de l'artefact artisanal à l'intelligence artificielle'


iMOKA: ?-mer based software to analyze large collections of sequencing data

Claudio Lorenzi, Sylvain Barriere, Jean-Philippe Villemin, Laureline Dejardin Bretones, Alban Mancheron, William Ritchie


TALC: Transcription Aware Long Read Correction

Lucile Broseus, Aubin Thomas, Andrew J Oldfield, Dany Severac, Emeric Dubois, William Ritchie


GECKO is a genetic algorithm to classify and explore high throughput sequencing data

Aubin Thomas, Sylvain Barriere, Lucile Broseus, Julie Brooke, Claudio Lorenzi, Jean-Philippe Villemin, Gregory Beurier, Robert Sabatier, Christelle Reynes, Alban Mancheron & William Ritchie


Exploring the Roles of CREBRF and TRIM2 in the Regulation of Angiogenesis by High-Density Lipoproteins.

Wong NKP, Cheung H, Solly EL, Vanags LZ, Ritchie W, Nicholls SJ, Ng MKC, Bursill CA, Tan JTM

Intron retention enhances gene regulatory complexity in vertebrates.

Schmitz U, Pinello N, Jia F, Alasmari S, Ritchie W, Keightley MC, Shini S, Lieschke GJ, Wong JJ, Rasko JEJ

microRNA Target Prediction

Ritchie W

IRFinder: assessing the impact of intron retention on mammalian gene expression

Middleton R, Gao D, Thomas A, Singh B, Au A, Wong JJ, Bomane A, Cosson B, Eyras E, Rasko JE, Ritchie W.

Intron retention is regulated by altered MeCP2-mediated splicing factor recruitment

Wong JJ, Gao D, Nguyen TV, Kwok CT, van Geldermalsen M, Middleton R, Pinello N, Thoeng A, Nagarajah R, Holst J, Ritchie W, Rasko JEJ


An NF90/NF110-mediated feedback amplification loop regulates dicer expression and controls ovarian carcinoma progression.

Barbier J, Chen X, Sanchez G, Cai M, Helsmoortel M, Higuchi T, Giraud P, Contreras X, Yuan G, Feng Z, Nait-Saidi R, Deas O, Bluy L, Judde JG, Rouquier S, Ritchie W, Sakamoto S, Xie D, Kiernan R
2018 - Cell Res , 28(5):556-571 29563539
Service porteur : Gene regulation


send a message

send a message


Ultra-long sequencing to detect cancer-associated intron retention

(William Ritchie, Aubin Thomas)

Intron retention (IR) occurs when an intron is included in a mature mRNA. Previously regarded as a by-product of faulty splicing, transcripts with retained introns are often rapidly degraded by a surveillance mechanism called nonsense-mediated decay (NMD). We discovered that numerous cell types make use of this mechanism by increasing the amount of transcripts with retained introns for degradation in granulopoiesis (Cell, 2013), pluripotent stem cells (Nature, 2014) and erythrocyte differentiation (Blood, 2016). IR was recently found to have a major role in modulating tumour suppressor genes in hundreds of different cancers (Nature Genetics, 2015). However, because IR could not previously be correctly identified, numerous studies have overlooked potential biomarkers and therapeutic targets linked to this novel type of gene regulation. In this project we will combine new long RNA sequencing with classical Illumina sequencing to define IR with unprecedented accuracy. This will enable us to define IR features that contribute to normal development and disease. 

Programming genetic networks to extract hidden information in sequencing data

(William Ritchie, Aubin Thomas, Sylvain Barriere )

Advances in next generation sequencing methods have revealed that transcription is more pervasive, more diverse and more cryptic than expected. Despite this heterogeneity in information and despite the fact that our understanding of transcript architecture is incomplete, bioinformatics analyses of these data are frequently initiated through a common, biased procedure; they are mapped to a reference genome or transcriptome. This step does not account for major changes in the genome or transcriptome as can be the case in multiple cancers nor does it account for small sequence variations common between individuals. As a result, only a portion of transcriptional information measured by NGS is used to discover meaningful signatures between different biological samples.