Making science work for health

Can long-read sequencing transform genetic diagnoses?

PHG Foundation Season 1 Episode 5

Heather Turner, Policy Analyst at the PHG Foundation,  explains for you long-read sequencing, its potential applications and how this new technology measures up against traditional short-read sequencing.

Welcome back to Making science work for health, the PHG Foundation podcast that explains the most promising developments in science and their implications for healthcare.
 
In each episode, host Ofori Canacoo discusses with a PHG Foundation policy analyst, the underpinning science, the ambitions for improving population health and the impact it could have on patients, on society and on the people delivering your healthcare.
 
If you would like to find out more about what was discussed in this episode, you can find additional information on our website, phgfoundation.org.

Heather recently wrote two briefings on long-read sequencing. Both are freely available: Clinical long-read sequencing and Long-read sequencing: Clinical applications and implementation.

If you have any questions about the topic then you can email us at intelligence@phgfoundation.org

Ofori: 0:12

Welcome to 'Making science work for health', the PHG Foundation's podcast exploring developments in genomics and related emerging health technologies. Social media and the many digital news outlets now mean more of us than ever are aware of the progress being made by teams of intrepid scientists and researchers around the world. Many of the latest advances feature genomics and 'omics related technologies, the field in which the PHG Foundation has 25 years of experience, helping policymakers get to grips with practical, on the ground delivery. 'Making science work for health' aims to strip away the gloss and explain what new science means for patients, health professionals and members of society. My name is Ofori Canacoo, part of the communications team at the PHG Foundation and host of 'Making science work for health'. For this episode, we're talking about long-read sequencing. Recent developments have highlighted the potential of long read sequencing, and today we are discussing what this could mean for clinical genomics. Joining me for this episode is Heather Turner, policy analyst at the PHG Foundation. Hello, Heather.

Heather: 1:22

Hi, Ofori. Thanks for having me.

Ofori: 1:24

My pleasure. First of all, would you like to tell us a little bit about yourself?

Heather: 1:27

Sure. So I'm a policy analyst here at the PHG Foundation and I have a background in medical science from the University of Exeter and genomic medicine, which I studied as a master's here in Cambridge. And here at the PHG Foundation, I've worked on a wide range of projects, but notably I have been working on host genomics in response to infectious diseases, on our existing work program on polygenic scores, and I've worked on some commissioned projects around pathogen sequencing and diagnostics. More recently I wrote a pair of briefings on long-read sequencing and that's what I'm here to talk about today.

Ofori: 2:08

Great stuff. So could you briefly talk us through what DNA sequencing is?

Heather: 2:13

Sequencing is the process of reading part or all of the DNA of an organism, and to do this, DNA is broken into small fragments. These fragments need to be compared against a reference sequence to identify the original sequence and to identify any changes. Different sequencing providers are used in clinical genomics, and the main provider is Illumina, a short read sequencing company that produces reads of approximately 75 to 500 base pairs, depending on the method that's used.

Ofori: 2:44

What is the reference sequence?

Heather: 2:47

So, the reference genome builds on the Human Genome Project, and this project built a model human genome, which is the reference sequence used for all human genome analysis. This genome is not static though, it's being constantly reviewed and updated to build a better and more representative genome.

Ofori: 3:07

And, as a reminder, how is DNA sequencing used in healthcare?

Heather: 3:11

So sequencing is used to provide a wide range of tests, and these can vary from more specific tests around a particular disease, for example, cystic fibrosis, through to whole genome sequencing, which is now available in the UK for a number of indications, particularly in rare disease and cancer.

Ofori: 3:29

So, you just mentioned short-read sequencing, and we're going to be talking about long-read sequencing too. What are the key differences between the two?

Heather: 3:38

So, where short-read sequencing produces reads of up to 500 base pairs, long-read sequencing can produce reads that are in excess of 10,000 base pairs, and simply put, these longer sequences can unlock the genome. They simplify the process of reconstructing the original sequence, and this has a number of opportunities. Perhaps the most important of these is identifying larger and more complex changes in the genome, and this is something that's currently a limitation in clinical genomics.

Ofori: 4:10

And could you tell me how people are using long-read sequencing for rare disease?

Heather: 4:16

Sure. So, one of the important things to understand about rare disease is the diagnostic odyssey. So, for many rare disease patients, it can take a very long time from the point at which a disease is identified to actually getting a diagnosis, which can then really change their care. And there have been significant improvements in the diagnostic odyssey and the number of projects, including the 100,000 Genomes Project, have contributed to this. However, there is still quite a large diagnostic gap. So the overall diagnostic rate from the 100,000 Genomes Project was 25% and there are many reasons for this and now there is a lot of interest thinking beyond whole genome sequencing as to what other technologies could be useful. Long-read sequencing is one of these. There are a number of opportunities that are being explored, but as I mentioned, large variants or structural variants are particularly of interest. Structural variants are variants resulting from deletions or duplications or other combinations of large structural changes to the chromosome. And because they're so large, they're much more likely to disrupt the original sequence, and this can result in disease. So identifying these variants is very important as a way of finding these diagnoses. From short-read sequencing, these variants can be very difficult to identify, so long-read sequencing can help with that analytical and interpretation process, improving diagnostic rates. There are, of course, other opportunities, which I highlighted in my briefing. Notably, you have repeat expansion variants such as Huntington's disease, which we can't identify very well using short-read sequencing data, and also pseudogenes, which are versions of a gene that do not produce a protein, but because they're similar to your gene of interest, you can get reads which map incorrectly, and this can make identifying true disease causing variants very complicated. Ultimately, all of this comes together in the potential of finding new diagnoses for patients.

Ofori: 6:29

So how is long-read sequencing being used to improve diagnoses?

Heather: 6:33

Long-read sequencing is being used in large genome sequencing projects to establish diagnoses for patients. One major publication was the Genome Answers for Kids program. This program sought to find diagnoses for pediatric rare disease patients, and through long-read sequencing they were able to increase their discovery rate of structural variance by more than fourfold.

Ofori: 6:55

So how could long read sequencing be used for cancer diagnostics?

Heather: 7:00

In cancer diagnostics, the tumor is sequenced to identify any cancer specific changes, and this is known as somatic sequencing. There are often many variants, and the challenge is identifying the variants that are driving the cancer. And currently, we have a variety of tests which we use to make clinical decisions, and this can be either to inform a patient's prognosis and, ideally, improve treatment decisions. And as I've described, for rare disease, long-read sequencing really has the advantage that you can identify more complex variants and overcome some of the complexity from the cancer genome to identify variants that are relevant to a patient's care. Cancers are very heterogeneous, however, and important mutations may only be present in a small part of the tumor. Long-read sequencing has the advantage that it maintains the integrity of the DNA and this can make it possible to identify mutations that are only present in part of the tumor.

Ofori: 8:00

And what do you mean by that?

Heather: 8:02

So really what I mean by this is that where long-read sequencing you have quite long pieces of DNA, short-read sequencing, you fragment the DNA, and as part of that sequencing process, this can introduce artifacts, which are changes introduced as part of the sequencing process. And from a quality point of view, it can be difficult to identify variants which are only present at a small fraction of the tumor compared to these variants which are actually errors. And therefore, where long-read sequencing, you perhaps get more information, you're also not sequencing that strand again, and you're not introducing those errors. And therefore, you're more confident when you're calling some of these rarer variants. And this provides a real opportunity for cancer where a variant is really significant, but only in a small fraction of the tumor, you may be able to predict resistance, because that variant may be known to be important when you make a particular treatment decision for your patient. Generally within the context of long-read sequencing, this is one of the areas that people are really keen to explore. Another area where there is big interest is in the opportunity for omics, also rare disease, because there are inherent advantages over short-read sequencing.

Ofori: 9:24

So how have researchers thought about using long-read sequencing as a cancer diagnostic?

Heather: 9:29

One challenge in cancer is monitoring patients to quickly respond when their cancer returns or progresses. One area that people are particularly interested in using long-read sequencing is for chronic myeloid leukemia. Because currently we have very good targeted therapies. However, our diagnostics rely upon PCR, which can't tell us what change or mutation drives resistance when it occurs. So researchers are looking to use long-read sequencing to do this monitoring. This would allow you to monitor both disease progression and identify the variant driving the disease in one test, and fundamentally this would enable a more agile clinical response, for example, to change treatment.

Ofori: 10:16

Could you tell us more about opportunities using long-read sequencing for omics?

Heather: 10:22

There are two main areas that people have been exploring. The first is transcriptomics. Transcriptomics is sequencing all the RNA that is present in a sample at any given time. And this often focuses on mRNA, which is the intermediary between the DNA and codes for proteins. And therefore sequencing your transcriptome can tell you about gene activity. It tells us which genes are on, which are off. And this can tell us about the cell or tissue. what is happening, and in the context of disease, we can use this to help with the interpretation process. Most transcriptomics has been done with short-read sequencing, and this breaks up the RNA and tries to predict using algorithms what RNA has been expressed. However, one gene can produce multiple different versions of multiple different transcripts. Long-read sequencing has the advantage that it maintains this original sequence and it improves the accuracy when you're then trying to work out which transcripts have been produced. And in the context of disease, this really improves the analysis process and can inform both the interpretation or even primarily studying of the disease.

Ofori: 11:38

And what's the other area?

Heather: 11:39

The other area that people are really interested in is epigenomics, and this is the study of the factors that control the gene activity, so they control whether a gene is on or off, and long-read sequencing can detect one of the key changes that people are interested in, which is known as DNA modifications. And these are chemical changes on the DNA, such as DNA methylation. These modifications change gene expression, and this is very important because our current methods are quite limited, and the game changer from this is that DNA modifications can be identified alongside the DNA sequence and this allows this information to be included as part of that analytical process. And while it's a little early to say what the significance of this is, some areas people are talking about is, for example, in the context of rare disease, some diseases are caused by changes to the DNA modifications, such as imprinting diseases, or you can identify signatures which correlate with the genetic change, and you can use this to increase your confidence in the cause of disease. In the context of cancer, epigenetics is also known to be very important. So, really we're only starting to scratch the surface, and I would say both transcriptomics and epigenomics are areas to watch in the context of long-read sequencing.

Ofori: 13:06

So if we just circle back for a moment with regards to the reference sequence or the reference genome, you mentioned that it's constantly being improved. Does long-read sequencing come into this?

Heather: 13:18

So the goal of building a genome is to represent the whole of the contained sequence, and at the moment, some regions of the genome are just less well characterized, and this is particularly for some regions of chromosomes, particularly the center, known as the centromere, and the ends, known as telomeres. Long-read sequencing is really central to the efforts to build and improve on this reference genome so that we can have telomere to telomere where we capture all of the sequence contained. And in the context of clinical genomics this is going to be critical if we want to be better at identifying variants we need to start with a better reference sequence and this will lead to better diagnoses.

Ofori: 14:01

What are the limitations, if any, of long-read sequencing?

Heather: 14:05

Everything I've said until now has sounded very promising. However, in practice, when you try to bring a technology and implement it, it's never so easy. And the key disadvantages that people talk about are around accuracy, and then around cost and throughput. And I think the first thing to understand is this landscape is changing very, very fast. There have been significant improvements on all fronts, particularly around accuracy, where increasingly long-read sequencing is seen to be more comparable to Illumina sequencing, where before it was seen as being quite a lot less accurate. The other thing to consider is that there are different technologies available, and these have different specifications and different opportunities, but also limitations, and these would need to be considered on their own. For all sequencing technology, the infrastructure is key, and really, there needs to be the right tools, not just around the sequencing, but particularly around the analysis and interpretation of the data. And currently, this is actually really where the limitations are being seen. And now, as these technologies are used increasingly, the goal needs to be developing end to end workflows and these need to be reliable and ready to implement in order for these technologies to be used in clinical genomics.

Ofori: 15:31

Where do you envisage long-read sequencing playing a part in the future?

Heather: 15:37

I think this is a very interesting question. Short-read sequencing has been very successful in clinical genomics, and I don't think it's realistic to think that long-read sequencing is going to replace this infrastructure. However, there are clear opportunities and limitations of short-read sequencing where long-read sequencing could really complement this existing infrastructure. Long-read sequencing has significant advantages, particularly I've highlighted around interpreting structural variants and for some of these omics applications. And there is now a lot of activity exploring how long-read sequencing could be useful. In particular, thinking about Genomics England, they have new programs in both rare disease and in cancer, exploring how different long-read sequencing technologies could complement the existing landscape for both their diagnostic work, but also in the context of research. So really, we can only wait for the evidence for the value of long-read sequencing to grow, and hopefully in the future this could pave the way for adoption.

Ofori: 16:46

Thank you, Heather, for joining us today. Could you tell us what you're working on next?

Heather: 16:50

So, one area I wanted to highlight is that I'm continuing to work on host genomics, and this is really focused on unpacking this science and many of the challenges around it to help make it useful to patients. I will also continue to be involved in PHG's, many projects, both commissioned and internal, to continue to make science work for health.

Ofori: 17:15

Wonderful. Heather, once again, thank you.

Heather: 17:17

Thanks for having me, Ofori.

Ofori: 17:23

Well that brings us to the end of the episode. If you liked it, please leave us a rating and review, and make sure to subscribe. If you would like to find out more about what was discussed in this episode, there are useful links included in the podcast description, most notably two briefings that Heather recently wrote on long-read sequencing, which were generously supported by the WYNG Foundation. These discuss the potential in clinical genomics, as well as clinical applications and implementation of long-read sequencing. You can also find additional information on our website, phgfoundation.org. And if you have any further questions about the topic, then you can email us at intelligence@phgfoundation.org. Thank you for listening. My name is Ofori Canacoo and I look forward to bringing you a new topic in the next episode.