9.30 - 9.45 · Hannes Svardal, Wellcome Trust Sanger Institute
Africa-wide whole genome sequencing of vervet monkeys reveals strong polygenic selection on known HIV-interacting genes and on genes up-regulated after infection with the simian immunodeficiency virus (SIV)
Hannes Svardal (1,4); Anna Jasinska (2); Wesley C Warren (3); Nelson B Freimer (2); Magnus Nordborg (4)
(1) Wellcome Trust Sanger Institute, Cambridge, UK
(2) University of California Los Angeles, Los Angeles, USA
(3) Washington University in St. Louis, St. Louis, USA
(4) Gregor Mendel Institute, Austrian Academy of Sciences, Vienna, Austria
With their abundance in savannahs and riverine forests of sub-Saharan Africa, vervet monkeys (genus Chlorocebus) are amongst the most widespread non-human primates and show considerable phenotypic diversity. A model for human disease traits, vervet monkeys are also of interest for being a natural host to the simian immunodeficiency virus (SIV) with a high viral prevalence across most of the species range. We use whole genome sequencing data from 163 monkeys of five sub-taxa sampled across the whole continent to infer subspecies relationships and demonstrate cross-taxon gene-flow. Identifying more than 50 million single nucleotide polymorphisms, we find both high diversity within sub-taxa, differentiation across sub-taxa and a substantial amount of shared variation. A scan for diversifying selection across sub-taxa is highly enriched in viral response genes and genes that have been demonstrated to interact with HIV, pointing to candidate loci for the adaptation to SIV and other viral pathogens. Furthermore, selection scores are highly elevated in genes that show a response to SIV-infection in vervet monkeys but not in macaques.
9.45 - 9.50 · Jonathan Coleman, Institute of Psychiatry, Psychology and Neuroscience, King’s College London
The contribution of polygenic risk to the relationship between depression and body mass index in the UK Biobank
Jonathan R. I. Coleman (1), Thalia C. Eley (1,2), Gerome Breen (1,2)
(1) MRC Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, UK
(2) National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Trust and Institute of Psychiatry, Psychology and Neuroscience, UK
Body mass index (BMI) is increased on average in depression cases, but the relationship is complex, the relative contributions of genetic and non-genetic factors are unclear, and the direction of causality is unknown. Recent findings suggest that BMI may share a genetic component with psychiatric disorders. The explanatory power of polygenic risk scores (as proxies for the genetic component of variance) was investigated in a bidirectional analysis between BMI and depression, using participants from the UK Biobank cohort (N = 21,039).Participants from the first wave of genotyping released from the UK Biobank were assigned depression case or control status according to self-report and inpatient hospital episodes data. Subtype (typical and atypical depression) diagnoses were unavailable. Polygenic risk scores were derived using the latest published meta-analyses of large genetic consortia, and linear and logistic models constructed to assess the independent and interactive effects of polygenic risk and trait status on each trait, correcting for covariates including age, sex, socioeconomic status and geographic location.A small but significant positive correlation between depression status and BMI was observed. Polygenic risk contributed significantly to variance within-trait, and did not alter the observed phenotypic correlation substantially, but no cross-trait associations between polygenic risk and depression or BMI survived correction for multiple testing. The genetic correlation between BMI and depression was non-significant, and the genetic influences on BMI did not differ between depression cases and controls (genetic correlation = 1).Individuals with depression in the first wave of the UK Biobank data have a higher BMI than control individuals. This relationship does not appear to arise from a shared genetic basis, suggesting an effect of factors not controlled for within the analysis.
9.50 - 10.05 · Stefan Dentro, Wellcome Trust Sanger Institute
Large-scale pan-cancer subclonal reconstruction analysis of whole genome sequences reveals wide-spread intra-tumour heterogeneity
Stefan C. Dentro (1), Kerstin Haase (2), Keiran M. Raine (1), Jonas Demeulemeester (2), Inigo Martincorena (1), Ludmil B. Alexandrov (1), Henry Lee-Six (1), Kevin Dawson (1), David J. Adams (1), Peter Van Loo (2), David C. Wedge (1), for the Evolution and Heterogeneity Working Group of the ICGC Pan-Cancer Analysis of Whole Genomes initiative
(1) Wellcome Trust Sanger Instititute, (2) The Francis Crick Institute
Tumours evolve through a series of clonal expansions. Over time, changes in the DNA of tumour cells occur, which can be measured through massively parallel sequencing. The International Cancer Genome Consortium Pan-Cancer Analysis of Whole Genomes contains whole genome sequences of 2900 tumours spanning 46 different cancer types. We extended previously developed methods to obtain allele specific subclonal copy number based on haplotype phasing of 1000 Genomes SNPs and to reconstruct the subclonal architecture of tumours by clustering point mutations using a Bayesian Dirichlet process. Here we apply this suite of subclonal reconstruction methods to 1700 tumours, after rigorous quality control of subclonal copy number profiles.After correcting for the power to detect subclonal populations, we observe that intra-tumour heterogeneity is nearly universal across most cancer types. We infer that in the majority of cancers, the most recent common ancestor cell emerges late, that selection occurs throughout a tumours’ life history and that mutational signatures can change during tumour evolution. We observe clear differences between cancer types. In the typical cancer, approximately 80% of point mutations and 65% of copy number changes are clonal. Pancreatic Endocrine tumours acquire most of their copy number changes early, with 85% of changes appearing fully clonal. Haematological cancers acquire most of their copy number changes late as only 30% of changes are clonal. In sharp contrast, melanomas appear to be mostly clonal based on point mutations, but continue to acquire copy number changes.Our large-scale analysis of whole genomes shows that cancers continue to evolve, and that individual cancer types each show particular characteristics in their evolutionary history and subclonal architecture.
10.05 - 10.10 · Eva Krapohl, King’s College London
The nature of nurture: Education-associated single nucleotide polymorphisms explain variation in children's home environments and in their associations with child outcomes
Eva Krapohl (1); Paul F O’Reilly (1); Robert Plomin (1)
(1) MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK
Understanding the complex relationships between environmental factors and developmental outcomes is a fundamental goal of epidemiology. Genetics can help elucidate cause and effect, because inherited genetic variants cannot be subject to reverse causation. Using genome-wide polygenic models in a UK-representative sample of 6,710 children, we investigated the effect of education-associated single nucleotide polymorphisms (a) on children’s home environments and (b) on the covariance between children’s home environments and child outcomes. Variation in education-associated alleles was significantly associated with variation in children’s home environments (e.g. breastfeeding: 2.1%; household income: 3.2%; television: 2.9%; number of books in household: 2.6%) and explained covariance between home environments and child outcomes, independently of population stratification. Three examples: the association between breastfeeding and child IQ, that between number of books and child educational achievement, and that between television and child conduct disorder were significantly tagged by education-associated alleles. These findings highlight the importance of taking genetics into account when investigating the association between environment and developmental outcomes.
10.10 - 10.25 · Hannah Meyer, European Bioinformatics Institute
Understanding cardiac structure and function in humans using 4D imaging genetics.
Hannah V Meyer (1), Antonio de Marvao (2), Timothy JW Dawes (2), Wenzhe Shi (2) , Tamara Diamond (2), Daniel Rueckert (2), Enrico Petretto (2), Leonardo Bottolo (2), Declan P O’Regan (2), Ewan Birney (1), Stuart A Cook (2)
(1) European Bioinformatics Institute (EMBL-EBI), Hinxton, CB101SD, United Kingdom
(2) Medical Research Council, Clinical Sciences Centre, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
Human health is dependent on the long lasting function of many organ systems; these in turn develop due to complex genetic programs and are maintained over a lifespan. Many human diseases are related to cardiac structure and function, from relatively common cardiac infarctions through to more rare but serious diseases such as different cardiomyopathies. Understanding the biology of the human heart is informative for both basic and translational research.We have created the first at scale cohort of 1,500 detailed cardiac images from healthy volunteers. We used a 1.5T Philips MRI scanner to acquire detailed 4D images of the heart in a single breath hold. This provides a far more detailed and consistent cardiac measurement than the traditional combination of 2D planar cardiac images. We are able to map these 4D images into a consistent volumetric reference, and derive over 27,000 measurements per individual representing the heart. The individuals were also genotyped on a modern SNP array and imputed using a combination of 1000 Genomes and UK10K known variants, leading to 9.4 million variants for use in association studies.We have successfully used a dimension reduction process to reduce the large image based metrics to a more compact latent variable space (100 dimensions). Using this projection, we are able to find a number of genetic loci which show strong association with the heart structure. Interestingly, some of these hits are present in enhancers of known heart development genes, and pre-existing knockout studies in mice confirm a heart phenotype. Inspired by the model organism data, we have shown that a similar phenotype, measured as the non-compacted to compacted ratio in the heart at specific points, is also present in the human population. This work shows that imaging genetics provides an unbiased discovery process for exploring the underlying biology of human organs, with an impact on our understanding of both healthy and disease physiology.
10.25 - 10.30 · Richa Gupta, University of Helsinki
Neuregulin Signaling Pathway in Smoking Behavior
Richa Gupta (1,2); Beenish Qaiser (1,2); Liang He (2,3); Tero Hiekkalinna (1,4); Miina Ollikainen (1,2); Samuli Ripatti (1,2,5); Markus Perola (4); Pamela A. F. Madden (6); Tellervo Korhonen (1,4); Jaakko Kaprio (1,2,4); Anu Loukola (1,2)
(1) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; (2) Department of Public Health, University of Helsinki, Helsinki, Finland; (3) Duke Population Research Centre, Duke University, North Carolina, USA; (4) National Institute for Health and Welfare, Helsinki, Finland; (5) Wellcome Trust Sanger Institute, Cambridge, UK; (6) Department of Psychiatry, Washington University School of Medicine, Saint Louis, Missouri, USA
Smoking is a major risk factor for many somatic diseases and is also emerging as a causal factor for neuropsychiatric disorders. Understanding the molecular processes that link comorbid disorders such as tobacco smoking and mental disorders can provide new therapeutic targets. Neuregulin signaling pathway (NSP) genes have previously been implicated in schizophrenia, a neurodevelopmental disorder with high-comorbidity to smoking. Recently, we performed a genome-wide association study in a Finnish twin family sample (N=1104) and detected association between DSM-IV defined nicotine dependence and ERBB4, a neuregulin receptor (Loukola 2014 Mol Psychiatry). Using a subset of the same sample, we have previously identified linkage for regular smoking at 2q33, overlapping the ERBB4 locus (Loukola 2008 Pharmacogenomics J). Further, Neuregulin3 has been shown to associate with nicotine withdrawal in a behavioral mouse model (Turner 2014 Mol Psychiatry). In this study we scrutinized association and linkage between common and rare genetic variants (22450 SNPs) in ten NSP genes and regular smoker, nicotine dependence, and nicotine withdrawal phenotypes. By using an extended Finnish twin family sample (N=1998) we detected 183 significantly (FDR p<0.05) associated SNPs. Diligent annotation of these associations using expression (eQTL) and methylation quantitative loci (meQTL) analysis in a Finnish population sample, as well as available eQTL and splicing quantitative trait loci (sQTL) databases, revealed plausible functional roles for several associating variants. Our results further support the involvement of NSP in smoking behavior and highlights the utility of functional annotations.
12.00 - 12.15 · Robert Beagrie, Max Delbruck Centre for Molecular Medicine
Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM), a novel ligation-free approach
Robert A. Beagrie (1,2,3); Antonio Scialdone (4); Markus Schueler (1); Dorothee C.A. Kraemer (1); Mita Chotalia (2); Sheila Q. Xie (2); Ines de Santiago (2); Liron-Mark Lavitas (1,2); Miguel R. Branco (2); Laurence Game (5); Niall Dillon (3); Paul A.W. Edwards (6); Mario Nicodemi (4); Ana Pombo (1,2)
(1)Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrueck Centre for Molecular Medicine, Robert-Raessle Strasse, Berlin-Buch 13092, Germany; (2) Genome Function Group, (3) Gene Regulation and Chromatin Group and (5) Genomics Laboratory, MRC Clinical Sciences Centre, Imperial College London, Hammersmith Hospital Campus, London W12 0NN, UK; (4) Dipartimento di Fisica, Universite di Napoli Federico II, and INFN Napoli, CNR-SPIN, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (6) Hutchison/MRC Research Centre and Department of Pathology, University of Cambridge, Cambridge, United Kingdom
Mutations that alter the behaviour of enhancers are known to be important contributors to a number of human diseases, but many disease-linked sequence variants that overlap putative enhancers remain otherwise uncharacterised. Target genes can be identified based on the physical interactions formed by enhancers, but current genome-wide approaches based on chromatin conformation capture (3C) require the ligation of two restriction-digested DNA ends to identify a chromatin interaction. This limits their ability to identify contacts between more than two loci interacting simultaneously in the same cell.Capturing the full complexity of enhancer interactions in single cells may be crucial to uncovering their regulatory functions. We present Genome Architecture Mapping (GAM), a new ligation-free method for determining chromatin interactions on a genome-wide scale, which is capable of detecting simultaneous interactions between three or more genomic loci. In contrast with 3C-based approaches, GAM data presents less intrinsic bias, whilst requiring a smaller number of cells.We generate a genome-wide dataset of chromatin interactions in mouse ES cells using GAM, which we compare with published Hi-C data and analyse using a tailor-made statistical model. We identify preferential chromatin contacts spanning tens of megabases, including especially prominent interactions between enhancers and active genes, and validate these contacts by independent FISH experiments. By exploiting the unique ability of GAM to interrogate high-multiplicity interactions, we are able to detect a striking pattern of abundant, simultaneous three-way contacts genome-wide. These “triplet” contacts include interactions between highly transcribed topological domains (TADs) and/or TADs containing super-enhancers, identifying the simultaneous association of multiple regulatory regions in the same nucleus as an important aspect of genome architecture.
12.15 - 12.20 · Karishma D’Sa, UCL
An insight into gene regulation in human brain with allele specific expression
Karishma D’Sa*(1,2), Jana Vandrovcova* (1,2), Adaikalavan Ramasamy*(1,2,3), Sebastian Guelfi * (1,2), Juan A. Botia(1,2), Daniah Trabzuni(1,4), J. Raphael Gibbs(5), Colin Smith(6), Mar Matarin(1), Vibin Varghese(2), Paola Forabosco(2,7), The UK Brain Expression Consortium (UKBEC), John Hardy(1), Michael E. Weale(2) & Mina Ryten(1,2)
(1) Reta Lila Weston Institute and Department of Molecular Neuroscience, UCL Institute of Neurology, London WC1N 3BG, UK; (2) Department of Medical & Molecular Genetics, King’s College London SE1 9RT, UK; (3) Jenner Institute, University of Oxford, Oxford OX3 7DQ, UK; (4) Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia; (5) Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA ; (6) MRC Sudden Death Brain Bank Project, University of Edinburgh, Department of Neuropathology, Edinburgh, EH8 9AG ;(7) Istituto di Ricerca Genetica e Biomedica, Cittadella Universitaria di Cagliari, 09042 Monserrato, Sardinia, Italy.
Allele specific expression (ASE) is the differential expression of the two alleles at a transcribed locus. Being a within individual comparison, it helps avoid potential confounding factors and can be used in the study of gene regulation in single individuals or small rare tissue datasets.We examined 84 substantia nigra and putamen samples from 53 neuropathologically control post-mortem human brains of the UKBEC dataset for ASE. Gene expression including both pre-mRNA and mRNA was investigated using mRNA enriched total RNA and exome sequencing data. 7.8% of the heterozygous variants we studied were identified as ASE signals at a False Discovery Rate < 5%. A validation of our signals with an independent dataset of lymphoblastoid cell lines, in addition to a strong concordance, also showed brain specific signals that are not detected even with 10 times the number of individuals. Multiple underlying causes to ASEs were observed (1) highly deleterious variants, (2) imprinting and (3) expression quantitative trait loci (eQTLs). 25% of the protein truncating variants we studied had significant ASE signals compared to only 3% in the intronic sites. We saw that a drop in expression caused by nonsense mediated decay was compensated by increased expression of the common allele. An enrichment of imprinted genes was seen in ASE signals that had a reversal in direction between individuals. We also observed common variants with unidirectional ASE signals, tagged eQTLs.Thus we see that ASE is an efficient way of finding gene regulatory processes in small datasets, thereby underlining its power.
12.20 - 12.35 · Kaur Alasoo, Wellcome Trust Sanger Institute
Fine-mapping condition-specific regulatory variants in human macrophages using ATAC-seq
Kaur Alasoo, Julia Rodrigues, Subhankar Mukhopadhyay, Gordon Dougan, Daniel Gaffney
Wellcome Trust Sanger Institute, Hinxton, UK
Quantitative trait loci (QTL) mapping studies of cellular phenotypes such as gene expression can provide mechanistic insights into the functions of disease-associated variants. However, many molecular QTLs are cell type and context specific. This is particularly relevant for immune cells, where external cues can substantially alter cellular function and behavior. In addition, fine-mapping causal regulatory variants is challenging, which often limits mechanistic understanding. In this study we differentiated macrophages from induced pluripotent stem cells from 85 unrelated, healthy individuals derived as part of the Human Induced Pluripotent Stem Cells Initiative (HipSci.org). We generated gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) data from these cells in four experimental conditions: naive, treated with interferon-gamma (IFNg) for 18h, infected with Salmonella for 5h, and IFNg treatment followed by Salmonella infection. Across these four conditions we detected expression QTLs (eQTLs) for 4326 genes, over 900 of which affected gene expression in a condition-specific manner. Many of these eQTLs overlapped known disease associations, including some that were only detectable in stimulated cells. Intersecting associated eQTL variants with ATAC-seq signal from the same individuals and cell population allowed us to greatly reduce the set of credible causal variants, often pinpointing a single most likely variant. In addition, joint analysis of eQTLs with chromatin accessibility QTLs (caQTLs) revealed that approximately 50% of stimulation-specific eQTLs manifest at the chromatin level in naÃ¯ve cells prior to stimulation. These analyses provide insight into the principles of condition-specific gene regulation and highlight putative trans-acting factors involved.
12.35 - 12.40 · Karel Brinda, LIGM Universite Paris-Est Marne-la-Vallee
BWT-based indexing structure for metagenomic classification
Karel Brinda, Gregory Kucherov, Kamil Salikhov, Maciej Sykulski
LIGM Universite Paris-Est Marne-la-Vallee
Metagenomics is a powerful approach to study genetic content of environmental samples, which has been strongly promoted by NGS technologies. One of the main tasks is the assignment of reads of a metagenome to taxonomic units, and the subsequent abundance estimation. Most of recently developed programs for this task (such as LMAT, KRAKEN, KALLISTO) perform the assignment based on shared k-mers between reads and references. In such an approach, two major algorithmic subproblems can be distinguished: designing a k-mer index for a huge database of reference genomes and a given taxonomic tree, and designing an algorithm for assigning reads to taxonomic units from information on shared k-mers.In this talk, we consider the problem of index design and present a novel data structure that provides a full list of genomes containing a queried k-mer. The structure is based on BWT-index applied to sequences encoding k-mers proper to each node of the taxonomic tree. We analyse the usefulness of this index and evaluate it in terms of speed and memory requirements.
12.40 - 12.55 · Tommaso Leonardi, EMBL-EBI
Positional conservation identifies topological anchor point (tap)RNAs linked to developmental loci
Tommaso Leonardi (1,2), Paulo P. Amaral (3), Namshik Han (3), Emmanuelle Vire (3), Dennis Gascoigne (3), Raul A. Carrasco (4), Magdalena Buescher (3), Anda Zhang (5), Stefano Pluchino (2), Vinicius Maracaja-Coutinho (4), Helder I. Nakaya (6), Martin Hemberg (7), Ramin Shiekhattar (5), Anton J. Enright (1), Tony Kouzarides (3)
1. EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
2. Department of Clinical Neurosciences; Wellcome Trust-Medical Research Council Stem Cell Institute, University of Cambridge, Clifford Allbutt Building-Cambridge Biosciences Campus, Hills Road, Cambridge, CB2 0PY, UK.
3. The Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.
4. Centro de GenÃ³mica y Bioinformatica, Facultad de Ciencias, Universidad Mayor, Chile.
5. University of Miami Miller School of Medicine, Sylvester Comprehensive Cancer Center, Department of Human Genetics, Biomedical Research Building, Miami, FL 33136, USA.
6. School of Pharmaceutical Sciences, University of SÃ£o Paulo, Av. Prof. Lineu Prestes 580, SÃ£o Paulo 05508, Brazil.
7. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK.
The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider positional conservation across mammalian genomes as an indicator of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human genomes that are preserved in genomic position relative to orthologous coding genes. The identified positionally conserved lncRNAs are primarily associated with developmental transcription factors with which they are co-expressed in a tissue-specific manner. Strikingly, a substantial proportion of positionally conserved RNAs have features linked to chromatin organization: they overlap the binding site for the CTCF chromatin organizer and are located at the chromatin loop anchor points and topologically associating domains (TADs). These topological anchor point (tap)RNAs, possess conserved sequence domains that are enriched in potential recognition motifs for Zinc Finger proteins. Characterization of these non-coding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other’s expression and influence metastatic phenotypic characteristics of cancer cells in vitro in a similar fashion. Thus, interrogation of positionally conserved lncRNAs identifies a subset of tapRNAs with shared functional properties, which are linked to chromatin topology and the regulation of developmental transcription factor loci.
12.55 - 13.00 · Lucy van Dorp, UCL
The Genetic Legacy of the Kuba Kingdom in the present-day Democratic Republic of Congo
Lucy van Dorp (1,2), Nathan Nunn (3), James A Robinson (4), Jonathan Weigel (5), Joseph Henrich (6), Mark G Thomas (1), Garrett Hellenthal (1)
(1) Department of Genetics, Evolution and Environment. University College London.
(2) Centre for Mathematics and Physics in the Life Sciences and EXperimental Biology (CoMPLEX). University College London.
(3) Department of Economics. University of Harvard.
(4) Harris School of Public Policy. University of Chicago.
(5) Department of Political Economy and Government. University of Harvard.
(6) Department of Evolutionary Biology. University of Havard.
The pre-colonial centralized state of the Kuba Kingdom was founded by King Shyamm in the 17th century in the present-day Democratic Republic of Congo. The Kuba Kingdom was characteristic of a centralized state with an enforced taxation system, elected political office, police force, and a formal court system with trial by jury, but considered unusual in that these socio-political institutions were developed without Western influence. As part of a collaboration with the Department of Economics at Havard, we explore the genetic structure in a novel data collection consisting of over 250,000 SNPs in each of 788 individuals from 29 modern day groups existing both inside and outside of the former Kuba Kingdom, relating genetics to cultural belief systems and oral traditions involving the Kingdom. We demonstrate that genetic structure in the region is subtle, so that the standard techniques in population genetics such as principal-components-analysis (PCA) and FST do not elucidate clear patterns. Instead we describe a haplotype-based technique that exploits associations among neighbouring SNPs to increase power and here illustrates a clear correlation between genetics and geography. In preliminary work we demonstrate that the group that is most genetically differentiated from the other Congolese tribes are the Lele, who live outside the geographic span of the former Kuba Kingdom and are documented to have had different political and economic institutions to geographically proximal tribes. Using this and further statistical modelling, we provide insight in to how historical socio-political factors can impact on present-day human genetic diversity.
14.45 - 15.00 · Kieran Campbell, Wellcome Trust Centre for Human Genetics, University of Oxford
Incorporating prior knowledge in single-cell trajectory learning using Bayesian nonlinear factor analysis
Kieran Campbell (1); Christopher Yau (1,2)
(1) Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN (2) Department of Statistics, University of Oxford, 24-29 St Giles’, Oxford, OX1 3LB
The transcriptomes of single cells undergoing diverse biological processes – such as differentiation or apoptosis – display remarkable heterogeneity that is averaged over in bulk sequencing. Single-cell sequencing itself offers only a snapshot of these processes by capturing cells of variable and unknown progression through them. Consequently, one outstanding problem in single-cell genomics is to find an ordering of cells (known as their pseudotime) that best reflects their progression, for which several computational methods have been proposed. Such methods emphasise an unsupervised “data-driven” approach that typically involves dimensionality reduction on a large gene-set followed by curve fitting in the reduced space. Here we present an alternative approach for pseudotime inference that allows the user to specify the desired behaviour of a set of marker genes. Using a Bayesian generative model, such knowledge – such as a given gene turning on or off at a specified point in the trajectory – is incorporated through informative priors. Our novel method solves several problems in single-cell trajectory learning including pseudotime orientation, implicit length scales and robustness to gene selection and noise. We demonstrate the superiority of our method on synthetic data before examining several real-world use cases.
15.00 - 15.05 · Marc Williams, UCL & Barts Cancer Institute, QMUL
Cancer genome sequencing reveals only the earliest events in cancer development
Marc Williams (1,2,3), Benjamin Werner (4), Chris Barnes (3), Andrea Sottoriva (4), Trevor Graham (2)
(1) Centre for Mathematics and Physics in the life sciences and experimental Biology (CoMPLEX), UCL
(2) Tumour Biology, Barts Cancer Instititute, QMUL
(3) Cell and Developmental Biology, UCL
(4) Institute of Cancer Reasearch
Clonal evolution, the acquisition of selectively advantageous mutations followed by their fixation in the population has long been the traditional view of tumour evolution. Using mathematical modelling we recently showed that sequencing data from primary human cancers often (~30% of cases) exhibit a signature of neutral evolutionary dynamics (Williams et al 2016). Here following the acquisition of a full set of genetic alterations sufficient for malignancy, tumours grow as single clonal expansions with all subsequent mutations being effectively neutral, ie having no effect on the growth of subpopulations of cells within the tumour. Here, using a branching process type simulation of tumour growth and a multi-stage sampling scheme to generate synthetic data sets that share the characteristics of real sequencing data, we explore the consequences of relaxing some of the assumptions of this neutral model. Thus exploring what type of evolutionary dynamics may explain the 70% of cases that do no fit neutral evolutionary dynamics. We find that due to the expanding population and the limited resolution of sequencing data, selection events must happen early and have relatively large fitness effects to be detectable in typical sequencing of bulk tissue samples. This demonstrates that sequencing of cancer samples only reveals the earliest events post-transformation. Using our model together with approximate Bayesian computation statistical inference, we then infer the evolutionary dynamics for individual samples that do not conform to the neutral model.By linking the dynamics of tumour growth to NGS data, our theoretical framework provides a powerful new way to interpret genomic studies of cancer and opens up opportunities to decipher functional vs non-functional heterogeneity, measure in vivo mutation rates and infer mutational timelines.Williams et al (2016). Identification of neutral tumor evolution across cancer types. Nature Genetics.
15.05 - 15.20 · Phelim Bradley, WTCHG
Mykrobe predictor : Rapid antibiotic-resistance predictions from genome sequence data using de Bruijn graphs.
Phelim Bradley(1), N. Claire Gordon(2), Timothy M. Walker(2), Laura Dunn(2), Simon Heys(1), Bill Huang(1), Sarah Earle(2), Louise J. Pankhurst(2), Luke Anson(2), Mariateresa de Cesare(1), Paolo Piazza(1), Antonina A. Votintseva(2), Tanya Golubchik(2), Daniel J. Wilson(1),(2), David H. Wyllie(2), Roland Diel(5), Stefan Niemann(6),(7), Silke Feuerriegel(6),(7), Thomas A. Kohl(6), Nazir Ismail(8), Shaheed V. Omar(8), E. Grace Smith(4), David Buck(1), Gil McVean(1), A. Sarah Walker(2),(3), Tim E.A. Peto(2),(3), Derrick W. Crook(2),(3),(4), Zamin Iqbal1*
(1) Wellcome Trust Centre for Human Genetics, University of Oxford, UK. (2) Nuffield Department of Medicine, University of Oxford, UK. (3) NIHR (National Institutes of Health Research) Oxford Biomedical Research Centre, Oxford, UK (4) Public Health England, UK. (5) Institute for Epidemiology, University Medical Hospital Schleswig-Holstein, Kiel, Germany. (6) Research Centre Borstel, Borstel, Germany. (7) German Centre for Infection Research, Partner Site Borstel, Borstel, Germany 8National Institute for Communicable Diseases, Johannesberg, South Africa.
Since bacterial species, drug-susceptibility profiles and virulence factors are encoded in the genome, we can recover this information from whole genome sequence data. Transforming genome-sequencing data into clinically useful information currently requires hours of processing on a powerful computer, followed by expert analysis. Our goal was to remove this bottleneck.Our approach (Mykrobe predictor) starts with a curated knowledge base of resistant/susceptible alleles, which we use with different genetic backgrounds and many examples of resistance genes to assemble a de Bruijn graph. This forms our reference graph. Our approach then directly compares the de Bruijn graph of the sample with the reference graph (similar to “pseudoalignment”). This results in statistical tests for the presence of resistance alleles that are unbiased by choice of reference or assumptions of clonality. We sequenced 987 S. aureus and 1900 M. tuberculosis isolates on Illumina platforms and applied our method to predict the antimicrobial resistance profile for each sample. For S. aureus, our results show sensitivity/specificity of 99.1%/99.6% across 12 drugs. For M. tuberculosis, our sensitivity of 82.6% is limited by our understanding of the genetics, and specificity was 98.5%. Importantly, detection of minor alleles improved sensitivity for 2nd line drugs (capreomycin, amikacin, ofloxacin) by >12%. This has great public health potential for distinguishing MDR from XDR-TB.Finally, we apply our method to the new Oxford Nanopore MinION USB-sequencer. We show that full concordance with phenotype is achievable both for gene and SNP-based resistance
15.20 - 15.25 · Matteo Fumagalli, University College London
Inference of ploidy from short read sequencing data with application to fungal pathogenicity
Matteo Fumagalli (1); Simon O’Hanlon (2); Trenton Garner (3); Rasmus Nielsen (4); Matthew Fisher (2); Francois Balloux (1)
(1) Department of Genetics, Evolution and Environment, University College London, UK; (2) School of Public Health, Imperial College London, UK; (3) Institute of Zoology, Zoological Society of London, UK; (4) Department of Integrative Biology & Statistics, University of California, Berkeley, USA
High-throughput sequencing machines are now providing researchers with massive amount of DNA data. However, the data produced is typically affected by large sequencing errors and inferences of individual genotypes and variants are challenging when a low-depth strategy is employed. Recently, statistical methods that take genotype uncertainty into account have been introduced in population genetics, allowing for an accurate estimation of nucleotide diversity even when little data is present. However, most of the available software and approaches are based on classic assumptions of random mating and diploidy.To solve this issue, here we propose a novel statistical framework to estimate ploidy from sequencing data, taking into account base qualities and depth, through a composite likelihood ratio test. We also show how this method can be adopted to perform variant and genotype calling under an arbitrary number of ploidy directly from genotype likelihoods, and set the basis for the estimation of summary statistics for population genetics analyses. We finally propose an extension of this method when more than one sample is available. Behavior and accuracy are assessed through simulations, and a dedicated software is currently under development.We finally demonstrate the utility of such method for estimating the chromosomal copy number variation in Batrachochytrium dendrobatis (Bd) from whole genome sequencing data. Bd is an amphibian fungus that is imposing a huge burden on its host. Genomes of Bd strains have been shown to be highly dynamic, with changes in ploidy observed even over short timescales. Unveiling how ploidy variation relates to fungal pathogenicity might hold the key for effective molecular monitoring.
15.25 - 15.40 · John Lees, Wellcome Trust Sanger Institute
Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes
John A. Lees (1); Minna Vehkala (2); Niko VÃ¤limÃ¤ki (3); Simon R. Harris (1); Claire Chewapreecha (4); Nicholas J. Croucher (5); Pekka Marttinen (6,7); Mark R. Davies (8); Andrew C. Steer (9,10); Stephen Y. C. Tong (11); Antti Honkela (12); Julian Parkhill (1); Stephen D. Bentley (1); Jukka Corander (2)
(1) Pathogen Genomics, Wellcome Trust Sanger Institute, Cambridge, UK; (2) Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland; (3) Department of Medical and Clinical Genetics, Genome-Scale Biology Research Program, University of Helsinki; (4) Department of Medicine, University of Cambridge, Cambridge, UK; (5) Department of Infectious Disease Epidemiology, Imperial College, London, UK; (6) Department of Computer Science, Aalto University, Espoo, Finland; (7) Helsinki Institute of Information Technology HIIT, Department of Computer Science, Aalto University, Espoo, Finland; (8) Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Australia; (9) Centre for International Child Health, Department of Paediatrics, University of Melbourne, Australia; (10) Group A Streptococcal Research Group, Murdoch Children’s Research Institute; (11) Menzies School of Health Research, Darwin, Australia; (12) Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
Bacterial genomes vary extensively in terms of both gene content and gene sequence – this plasticity hampers the use of traditional SNP-based methods for identifying all genetic associations with phenotypic variation. Here we introduce a computationally scalable and widely applicable statistical method (SEER) for the identification of sequence elements that are significantly enriched in a phenotype of interest. SEER is applicable to even tens of thousands of genomes by counting variable-length k-mers using a distributed string-mining algorithm. Robust options are provided for association analysis that also correct for the clonal population structure of bacteria. Using large collections of genomes of the major human pathogens Streptococcus pneumoniae and Streptococcus pyogenes, SEER identifies relevant previously characterised resistance determinants for several antibiotics and discovers potential novel factors related to the invasiveness of S. pyogenes. We thus demonstrate that our method can answer important biologically and medically relevant questions.
15.40 - 15.45 · Vladimir Kiselev, Sanger Institute
SC3 - consensus clustering of single-cell RNA-Seq data
Vladimir Yu. Kiselev (1), Kristina Kirschner (2), Michael T. Schaub (3,4), Tallulah Andrews (1), Tamir Chandra (1,5), Kedar N Natarajan (1,6), Wolf Reik (1,5,7), Mauricio Barahona (8), Anthony R Green (2), Martin Hemberg (1)
(1) Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK
(2) Cambridge Institute for Medical Research, Wellcome Trust/MRC Stem Cell Institute and Department of Haematology, University of Cambridge, Hills Road, Cambridge, UK
(3) Department of Mathematics and naXys, University of Namur, Belgium
(4) ICTEAM, UniversitÃ© catholique de Louvain, Belgium
(5) Epigenetics Programme, The Babraham Institute, Babraham, Cambridge, UK
(6) EMBL-European Bioinformatics Institute, Hinxton, Cambridge, UK
(7) Centre for Trophoblast Research, University of Cambridge, Cambridge, UK
(8) Department of Mathematics, Imperial College London, London, UK
Using single-cell RNA-seq (scRNA-seq), the full transcriptome of individual cells can be acquired, enabling a quantitative cell-type characterisation based on expression profiles. Due to the large variability in gene expression, assigning cells into groups based on the transcriptome remains challenging. We present Single-Cell Consensus Clustering (SC3), a tool for unsupervised clustering of scRNA-seq data. SC3 achieves high accuracy and robustness by consistently integrating different clustering solutions through a consensus approach. Tests on nine published datasets show that SC3 outperforms 4 existing methods, while remaining scalable for large datasets, as shown by the analysis of a dataset containing ~45,000 cells. Moreover, an interactive graphical implementation makes SC3 accessible to a wide audience of users. Importantly, SC3 aids the biological interpretation by identifying marker genes, differentially expressed genes and outlier cells. We illustrate the capabilities of SC3 by characterising newly obtained transcriptomes of subclones of neoplastic cells collected from clinical patients.
16.45 - 17.00 · Stefano Nardone, Bar Ilan University (Faculty of Medicine), Israel (IL)
DNA methylation profile of cortical neurons in autism spectrum disorder
Stefano Nardone (1,2), Dev Sharan Sams (1), Nili Avidan (3), Milana Frenkel-Morgenstern (1), Liat Linde (3) , Evan Elliott (1)
1 Bar Ilan University, Faculty of Medicine, Safed, IL
2 Department of Department of Experimental Pharmacology , University of Naples Federico II, Naples, IT
3 Rappaport Faculty of Medicine & Research Institute, Technion-Israel Institute of Technology, Haifa, IL
Autism Spectrum Disorder (ASD) is a complex neuropsychiatric syndrome with a largely unknown aetiology. The potential for non-genetic influence to mediate part of the risk of ASD has prompted several studies to date, all showing evidences for epigenetic alterations in autistic subjects. Establishment of DNA methylation during brain development has been widely accepted as key factor in defining neuron molecular identity. However, one of the most challenging task to face in epigenetic studies is the cellular mosaicism, particularly in the brain. In order to improve the quality of methylation data and unravel the contribution of neuronal population to the entire epigenetic signature in ASD we employed two techniques: Fluorescent Activated Cell Sorting (FACS) followed by hybridization on 450K Methylation Array (Illumina), that profiles around 485,000 CpG sites throughout the entire genome. We identified 12 Differentially Methylated Regions (DMRs) at FDR <0.01. Interestingly, various genes were part of GABAergic system whose involvement has been strongly suspected in ASD. Weighted Gene Co-Expression Network Analysis (WGCNA) pinpointed three co-methylation modules correlated to autism/control status at p value <0.0001. Two of them resulted inversely correlated to autism/control status and were enriched for synaptic and neuronal genes, while the third module showed a direct correlation and was enriched by immune response processes. Finally, we established the specificity of these 3 modules to ASD assessing their enrichment for GWAS databases related to other psychiatric and non-psychiatric disorders. This study identifies alterations of DNA methylation in cortical neurons as possible factor involved in the aetiopathogenesis of ASD and promotes a more systematic use of cell-specific approach in psychiatry.
17.00 - 17.05 · Alexander Young, University of Oxford
Discovery of non-additive loci affecting body mass index using a heteroskedastic linear mixed model
Alexander Young (1), Fabian Wauthier (1,2), Peter Donnelly (1,2)
(1) Wellcome Trust Centre for Human Genetics, University of Oxford
(2) Department of Statistics, University of Oxford
There is a major open question as to how important gene-gene and gene-environment interaction effects are in the genetic architecture of human diseases and traits. The controversy remains unresolved partly due to a lack of powerful methods for detecting these effects and partly due to the lack of suitably sized datasets. The imminent availability of large population based studies, including biobanks, will for the first time offer the sample size required to properly address this question. While most genetic association studies model how the phenotypic mean changes with genotype, they ignore any change in phenotypic variance with genotype. Changes in variance with genotype are characteristic of loci involved in non-additive effects, including gene-gene and gene-environment interactions. To improve power to detect loci involved in non-additive effects, we introduce a test statistic that jointly tests for mean and variance effects. To better control for confounding and to increase power, we incorporate our test statistic in a linear mixed model whose residual error term is influenced by an arbitrary vector of covariates, which we term the heteroskedastic linear mixed model, and we give a novel algorithm for fitting this model whose complexity scales linearly with sample size. We use this in a subsample of the UK Biobank (n~145,000) to search for non-additive loci affecting body mass index. We find eight such novel loci and five previously known loci. Three of the novel loci would not have been discovered by additive association testing, demonstrating there are types of loci that have been missed by additive testing. Following from this, we discovered a novel interaction between the TCF7L2 risk allele and diabetes treatment affecting BMI. We anticipate that more non-additive loci will be discovered at larger sample sizes and that the genome-wide test statistics will give insight into the importance of non-additivity for different traits.
17.05 - 17.20 · Goran Micevic, Yale University
The role and targets of DNA methylation in melanoma formation and progression
Goran Micevic (1), Marcus Bosenberg (1)
(1) Yale University School of Medicine, New Haven, CT 06510, United States of America
Melanoma is the deadliest form of skin cancer with an enormous toll on human life and health. It is estimated that nearly 10,000 deaths and 74,000 new cases of melanoma occurred in the United States alone in 2015, while 132,000 new melanoma cases were reported worldwide. Genetic changes in melanoma have been largely well described over the past decade, but epigenetic changes and their functional roles in melanoma formation remain, comparatively, poorly understood. DNA methylation is an epigenetic change that is almost universally abnormal in melanoma. However, the specific role of individual DNMT enzymes, their methylation targets in melanoma, and signaling pathways affected are largely elusive. Herein, we used a mouse model of melanoma to investigate the role, signaling changes and targets of DNA methyltransferases during melanoma formation and progression. Results, described herein, suggest that DNMT3B is the crucial methyltransferase during melanoma formation, and may be a target for melanoma therapy. Specifically, inactivation leads to a striking prolongation of median survival and was associated with loss of mTORC2 signaling. We found that Dnmt3b is overexpressed in human melanoma, associated with shorter 5-year overall survival, and allows for long term activation of mTORC2 by silencing repressive miRNAs. Using RNA-Seq and RRBS, we identified that Dnmt3b methylates genes marked by the histone modification H3K27me3, is an important regulator of global methylation in melanoma, and targets many genes well recognized to be aberrantly methylated in melanoma. Apart from mechanistic insights and potential therapeutic targets, we uncovered a methylation based gene signature that is associated with overall patient survival, and may be a valuable biomarker. Collectively, our studies shed light on the role of DNA methyltransferases in melanoma, uncover target pathways and genes, and contribute to our overall understanding of DNA methylation in melanoma.
17.20 - 17.25 · Tiphaine Martin, King’s College London
MetDiff: a novel computational method for detecting differential DNA methylation regions from Medip-seq data in unique and repetitive mapping regions
Tiphaine C. Martin (1), Catalina Vallejos (2,3), Gwenael Leday (2), Tim Spector (1), Sylvia Richardson (2)
(1) King’s College London, The Department of Twin Research & Genetic Epidemiology , St Thomas’ Hospital, 4th Floor, Block D, South Wing, SE1 7EH, London, United Kingdom
(2) University of Cambridge, Biostatistics unit, Cambridge Institute of Public Health, Forvie Site, Robinson Way, Cambridge Biomedical Campus, Cambridge, CB2 0SR, United Kingdom
(3) EMBL European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
One of first steps in analysis of high throughput sequencing data, such as MEDIP-seq data, is to discard reads with low mapping quality. Most of these discarded reads fall in repetitive elements as virtually 60% of human DNA is composed of repetitive sequences and over 50% of CpG dinucleotides belong to them. However, the functional properties of these latter sequences are of significant biological interest such as structural organisation of the chromosome, gene regulation and the evolutionary dynamics of the genome. We propose a two-step computational method to analyse both unique and multiple mapping regions that is inspired by methodologies developed in the context of RNA-seq datasets. The first part concerns detection of methylation regions on genome for unique mapping reads and estimation of the level of methylation for each chimeric assembly of repetitive element subfamilies. The second part includes identification of differential methylated regions associated to the phenotype of interest using a Bayesian method. We show that about 58% of single-end 42nt-size reads fall or overlap repetitive elements, of which 37% have a unique mapping on the reference human genome. Detection of methylation regions on genome shows a broad size distribution from 100nt to 35,000nt with a peak around the fragment size (here 350nt). It can explain why the methods of detection of peaks and differential enrichment for Chip-seq data fail for DNA methylation data. In addition, we applied this method to EWAS of autoimmune thyroid diseases in 43 discordant monozygotic twin pairs. PRIMA4_LTR subfamily of HERV, which is believed to be pathogenic family in several autoimmune diseases, and several unique mapping regions showed differential methylations. In our knowledge, it is the first time that differential methylation in both repetitive and non-repetitive regions is studied in EWAS using MEDIP-seq data. This study is currently extended to a larger set of twins and other repetitive region
17.25 - 17.40 · Rajbir Batra, Cancer Research UK, Cambridge Institute, University of Cambridge
Comprehensive sequencing-based characterisation of the DNA methylation landscape of 1300 breast tumours
Rajbir N Batra (1,2), Ana T Vidakovic (1), Suet-Feung Chin (1), Harry Clifford (1), Maurizio Callari (1), Ankita S Batra (1), Alejandra Bruna (1), Stephen-John Sammut (1), Elena Provenzano (3), Oscar M Rueda (1), Carlos Caldas (1,3)
(1) Cancer Research UK Cambridge Institute, University of Cambridge, UK
(2) Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, UK
(3) Department of Oncology, University of Cambridge, Addenbrooke’s Hospital, Hills Road, Cambridge, UK
IntroductionBreast cancer is one of the leading causes of cancer death in women, and is unanimously considered a heterogeneous disease displaying distinct therapeutic responses and outcomes. While recent advances have led to the integration of the genomic and transcriptomic architecture of breast cancers to refine the molecular classification of the disease, the epigenetic landscape has received less attention.We are conducting a large Next-generation sequencing-based breast cancer methylome study in order to provide a comprehensive investigation of the DNA methylation landscape of breast cancer. Materials and MethodsReduced Representation Bisulfite Sequencing (RRBS) was performed on 1300 primary breast tumours (and 300 matched normal tissue samples) from the METABRIC cohort. Statistical methods accounting for spatial correlation of neighbouring CpG sites were used to identify differentially methylated regions (DMRs) between tumours and normals, as well as between different tumour subtypes. Results and discussionWe identified hyper and hypo DMRs between tumours and normals in different genomics features (such as gene promoters and enhancers) that illuminate the regulatory role of methylation alterations in tumorigenesis. We also determined that DNA methylation contributes to breast cancer heterogeneity by identifying DMRs between breast cancer subtypes. In addition, gene expression was used to functionally characterise the DMRs in these subtypes, that led to the identification of subtype-specific candidate targets in breast cancer. Our findings also revealed complementary epigenetic and genomic aberration patterns associated with transcription across breast cancer patients.Finally, I discuss the investigation of DNA methylation markers using RRBS in a panel of Patient Derived Tumour Xenografts, that constitute one of the best pre-clinical models available today and are able to recapitulate inter and intra-tumour heterogeneity observed in patients.
17.40 - 17.45 · Katie Burnham, Wellcome Trust Centre for Human Genetics
Inter-individual variation in the host transcriptomic response to sepsis
Katie L Burnham (1); Emma E Davenport (1); Jayachandran Radhakrishnan (1); Peter Humburg (1); Paula Hutton (2); Christopher Garrard (2); Charles J Hinds (3); Julian C Knight (1).
(1) Wellcome Trust Centre for Human Genetics, University of Oxford, UK; (2) Adult Intensive Care Unit, John Radcliffe Hospital, Oxford, UK; (3) William Harvey Research Institute, Barts and The London School of Medicine, UK
Sepsis remains a major global health issue with mortality rates >30%. Although conventionally considered a single unified disease, substantial clinical heterogeneity is seen. Investigation of this variation could yield insights into pathogenesis and provide opportunities for precision medicine. We therefore aim to use transcriptomic profiling to identify clinically relevant differences between patients upon admission to the intensive care unit (ICU).We present data for 505 patients with sepsis due to community acquired pneumonia (CAP) or faecal peritonitis (FP) recruited to the Genomic Advances in Sepsis study. Detailed phenotypic information was recorded and serial samples taken over the first five days following admission to ICU. Gene expression in leukocytes was quantified for 47,231 probes using Illumina HumanHT-12v4 Expression BeadChip arrays. We hypothesised that inter-individual patient heterogeneity would exist both within and between sepsis aetiology groups CAP and FP.We identified two subgroups with distinct immune response profiles in the CAP discovery cohort (n=265), one of which had higher mortality (14-day mortality following ICU admission p=0.005) and features of immunosuppression. We designed a classification model, in which gene expression was more informative than clinical covariates, and replicated our findings in a CAP validation cohort (n=106). We observed comparable groups within FP patients (n=117), with an immunosuppressed phenotype similarly associating with mortality (p=0.0096). Differential gene expression between CAP and FP patients indicated an anti-viral response unique to the CAP patients, who also demonstrated a stronger pro-inflammatory response.Our findings highlight the value of functional genomic approaches for identifying heterogeneity within patient cohorts and have important implications for clinical management and patient stratification.