Supporting NGS-Based In Vitro Diagnostics: A Knowledge Sharing Approach

The below text is a transcript from the webinar. Because it is a transcript, there may be oddities that arise from the process of translating speech into text. We recommend accessing the recording, above, to gain full context.

Introduction

I'd like to start by talking about the agenda of the presentation today. We'll first describe large assays including those that are RUO or research use only assays that are being validated as laboratory drive tested individual laboratories and those that are moving to in vitro diagnostics as Josh indicated. Second, we'll describe how we've solved and are continuing to solve challenges with LDTs using our knowledgebase. We'll then describe how our knowledgebase inferencing and sharing within the PierianDx network works. We'll then describe the applied framework on that knowledgebase that we will extend in order to support more ready-to-sign-out reports for IVDs, and then we will talk about knowledgebase inferencing and conclude by talking about how that advance knowledgebase is now ready and more ready-to-sign-out reports that come out of in vitro diagnostics.

Complexities Associated with Larger Assays and In Vitro Diagnostics (IVD)

So first are we ready to take on these large assays? These in vitro diagnostics, as Josh mentioned there's a trend to move to larger and larger assays from targeted hotspots to capture based assays and whole exome. This includes coverage of a number variants and variant types from SNVs, indels, copy number variants and gene fusions as well as advanced biomarkers, such as tumor mutation load, microsatellite and stability, evaluating the tumor's mutational signature as well as performing immunoprofiling.

As you guys are well aware, there are a large number of targeted therapies, and those continue to increase, including a FDA announcement yesterday for approval of the Loxo Oncology drug Vitrakvi, and these targeted therapies and immunotherapies are driving the need for these larger assays. Here we show the growth of these large assays that first are released as a laboratory drive test, including those from Illumina, Memorial Sloan Kettering, PGDx, Foundation Medicine and ThermoFisher. These assays are not only available today, in some cases as RUOs, LDTs but many of them, most of them have public plans for in vitro diagnostics.

Larger Assays and IVDs in the CAP/AMP/ASCO Classification Scheme

Before we jump in and talk about the knowledgebase, and its inferencing abilities, I wanted to flash up here the latest CAP/AMP/ASCO classification scheme, that many of us are utilizing in our review and sign-out of sematic cancer. And yes, reports this type of classification and tiering is more and more complicated as the assays get bigger. For example, very large assays like the TSO-500 would result in hundreds of variants that may take greater than four, eight hours to review a case and sign-out without much more automation.

Clinical Next Generation Sequencing (NGS) Workflow

Here I describe the clinical next generation sequencing workflow, starting from sample processing all the way from different sample types, such as FFPE or tubes of blood, to DNA isolation or DNA or RNA isolation, laboratory preparation to sequencing, followed by the informatics process to call variants visualize and do QC Analysis on cases as well as individual variants and regions prioritized variants for clinical review, clinical interpretation and classification of variants and finally then medical director sign-out followed by integration of that report then into the medical record or third party systems through discreet data output for our systems.

The core product, the clinical genomics workspace, that includes the clinical genetics knowledgebase, facilitates the entire parts of the workflow there that are shown. The orange sections, the other services that we provide are add on services that allow us to provide support in validating assays as customers adopt current research only use assays to validate within their environment, professional services that we provide as interpretation services through our variant analyst team and medical director team to review, classify and interpret variants and sign them out. And laboratory services through the distributive model that we described in a previous webinar.

The PierianDx Clinical Genomics Knowledgebase

Our knowledgebase, in a summary of that knowledgebase is shown here. The knowledgebase first of all is 100% clinically focused over a 1,000 genes have been curated with at least one curated rule, and we'll describe what that means in a minute. Over six mega bases of sequence coverage is included in this knowledgebase and our knowledgebase includes the ability to interrogate and identify articles of interest from over 18 million published articles. Our own experience in creating this knowledgebase and curating this knowledgebase includes almost 50 PierianDx annotators who are dedicated to curating particular sources, as well as classifying and interpreting variants in patient cases as part of our interpretation services team, they are led by a board certified molecular pathologist. And very importantly any plus molecular pathologist customers also contribute to this knowledgebase through our partner sharing network.

I'd like to now describe some of the key components in our knowledgebase sharing and inferencing. First describe the different sources, at the very lowest level we have publicly available sources that are versioned locally and kept locally. These include human genome builds, Issue 19, Issue 38, Gene RNA protein models, on those builds that then can be utilized to report out on particular assays, population frequency databases that could be used to assess how common or how rare a variant is, as well as publicly available sources that we process and clean normalize to ontologies. These include sources like COSMIC, TCGA, and ClinVar, germ line databases that could be used for inherited cancer risk, as well as the published literature that can be searched much more robustly through oncologies and vocabularies that we utilize. We have sources that we also highly curate, these are manual curations that are done by our dedicated curation team. These include FDA approved labels, practice guidelines for NNCN and ASCO as well as active, recruiting clinical trials.

And then finally the knowledgebase includes content from our partner sharing network, where in customers, as they sign-out patient cases are able to save a variant classifications as well as interpretations in the context of the patient's tumor type or disease. And this content then infers in future cases as that variant is encountered, or those variants are encountered in the context of either the same disease or tumor type or different disease or tumor type.

As I said before our knowledgebase includes about six mega bases of sequence coverage. It's not a simple a variant look up. Namely, there are inferences that can be happening at the HGVS syntax level. So genomic coding and protein syntax, rules that can be written based on genomic coding or protein coordinates that include codon or protein domain ranges, functional characteristics of a variant that infer. For example, all in frame indels in a particular gene in a particular exon infer matching known as well as novel fusion partners. When that may be the case limiting copy number variants to range of copy number gains and losses, to distinguish low versus high copy gains. As well as rules that can be written that encompass one or more variants that are required to be present in the patient sample across genes or within genes. And these examples are very important in establishing the right clinical context and the interpretive context, when the entire set of variants for a patient's case are evaluated.

How the PierianDx Clinical Genomics Knowledgebase Works

Here I describe how the knowledgebase actually works with regard to annotation and curation source rules. So on the left side I describe what I call are the annotation rules. These annotation rules are both build and gene model specific and apply different minor allele frequency databases or cancer databases such as COSMIC, DCGA, computational predictions, clinical variant databases like Clinvar in order to create what's called an annotation rule. These annotations then can automatically map to one or more variants as they are found in a patient case.

Similarly, there are curated rules that are written on drug labels, professional guidelines, clinical trial registries, as well as patient cases where in predicates that I've described previously either syntax or ranges of coordinates or copy number, or a fusion partner are what form the predicate and the inferences then are specific to each source where in the inferences may for example, in drug labels and guidelines refer to the fact that a particular drug is either a therapeutic prognostic or diagnostic, and if it's therapeutic whether it's responsive or nonresponsive to a particular tumor that may be what the patient has.

As you think about processing variants or biomarkers in a patient's case, that we show on the left there, those variants or biomarkers in a patient's case get processed through our entire knowledgebase of annotation and curation rules. They then get associated to a variant or biomarker or a combination of variants or biomarkers that I'm showing down below. And then these individual variants, biomarkers associated to these annotation or curation rules are then evaluated. And that's what I'm showing on this slide, where these variants or biomarkers which have associated annotations or curations are evaluated and they're manually ranked, asserted, classified, then interpreted by medical directors that are using our system, residents and fellows, varied scientists, genome analysts are using our system. As well as our own PierianDx interpretation team.

This then results in assertions being made for each of these rules, that then result in the ranking, those assertions then when combined enable the individual to classify, and interpret the variant or variants that they have been processing.

Technology and Human Acumen

Now in the next section I'd like to talk about how we integrate technology through data science and the human acumen in reviewing and classifying and interpreting cases in the system currently. So once again, I'm showing the CAP/AMP/ASCO Consensus Guidelines and the Classification Scheme and both tiers as well as evidence levels in order to illustrate how the system currently enables users to either automatically or manually assign a variant to an appropriate tier/ evidence level.

This slide I really start by talking about how tiers three and four are driven. And they're largely driven by the fact that common variants or benign polymorphisms are driven largely by population frequency databases. As well as tier three are variants of uncertain clinical significance are driven by the fact that there's a lack of evidence in any annotation or curation rule that would otherwise place that variant in tier one or two.

Variants that are then associated to either NCCN, ASCO or an FDA source using those curated rules I described, then can be assessed by medical directors. And those variants can then be classified appropriately in either tier 1A or 2C based on whether the curated rule was found in the patient's tumor type or another tumor type, in order to then make that classification. Here's an example of how that works. For example, BRAF p.V600E with the patient having melanoma, would automatically be classified as tier 1A with Vemurafenib being responsive in that context. Whereas a patient having breast cancer with that same variant, that variant would then be classified in tier 2C, because the patient's tumor type does not match the tumor type associated to the curated rule.

A third example that I'm showing is when combinations of variants are found in CML for example, so the ABL1 point mutation as well as the BCR-ABL1 fusion together result in responsiveness to Ponatnib in CML. Now in order to review evidence that comes from the published literature, we have a literature search tool that enables searching by gene, variant syntax, disease, and drug. That then allows users to then filter by clinical relevance or different study types that indicate the level of evidence and that information is then used to draft interpretations as well as classify variants.

We also expose the shared medical content that's part of our knowledgebase, in order to help medical directors make decisions i.e., a prior case where that same variant was found and was classified and interpreted.

The clinical trials are also layered in to order to identify trials that may be of significance and classify variant in tier 2C. And those trial rules are associated to one or variants exactly the same way as FDA approved labels or practice guidelines are through that curator rule process that I described previously.

Here I show you how evidence from prior cases are evaluated by our medical directors. So for example, using data aggregation we could show how a variant in different tumor types how often it has been found. And that's shown on the left side, with regard to the entire network as well as your own set of cases and also how that variant has been classified with regard to the classification scheme within your network as well as across the network.

I am also showing on the right side, interpretations that have inferred automatically in the context of different tumor types, that then can be evaluated as you're reviewing your patient case and the tumor type for your patient.

One thing that I'd like to highlight here is that variants can contain conflicting classifications and that information can be gleaned from these types of interfaces. And I'll show you how we can actually utilize this information in the curation that we'll be proposing here. And the process that we'll use to address and resolve conflicts in classifications.

Performance of the PierianDx Clinical Genomics Knowledgebase

I'd like to spend a few minutes on talking about performance of the current knowledgebases that exist. So this slide really shows you the breadth of content, so in this case we used the Illumina TruSight Tumor 170 gene set and showed that about a third of the genes have been reported in over ten thousand cases in our knowledgebase. Over half have been reported in five thousand cases. Over two thirds in 2,500 cases and a 100% of the genes have been reported in over 1,200 cases. Indicating that our knowledgebase has seen through the medical directors' eyes, if you will, these genes in thousands of cases and has a rich history of how to classify and interpret variants across these genes.

We also evaluated the performance of the knowledgebase through the determination of classification accuracy. And what we did here is evaluated how variants were classified automatically by our knowledgebase as well as how medical directors then either agreed with that classification or didn't agree with it across the set of cases that they signed-out. So this represents almost three quarters of a million variants across hundreds of cases. We calculated specificity, sensitivity, positive predictive value and accuracy and what you can see here is that the system is extraordinarily specific. Very highly sensitive and has a very, very high accuracy. As we evaluated variants that were either false positives or false negatives, what became clear was that broader rules that would allow variants to infer automatically into the correct classification, would solve false positive issues, false negative issues, were arranged where medical directors actually conflicted with each other on classifications.

The next thing we did is use that exact same data set to evaluate the clinical management yield of the knowledgebase. And so what we looked at was to evaluate for every case how often was an FDA approved label, or practice guideline, or some other clinically actionable inference made and how often were these sources inferring in order for a medical director to determine the clinical actionability of a variant. And what we show here is that there is a large amount of evidence, such that in over a quarter of the cases clinically relevant variants were inferred based on information that's available outside of the FDA approved label and guidelines. And if you only utilize those sources then only two thirds of the cases have actionability. Whereas over 95% of cases have some clinical management yield if additional sources are included, such as those in our knowledgebase.

Using the PierianDx Clinical Genomics Database in Support of IVDs

I'd now like to talk about how we're extending this knowledge framework and this application framework in order to support in vitro diagnostics, in order to generate more complete ready-to-sign-out reports. As I described earlier, we have annotation and curation rules, we're extending the sources that we're curating to include additional professional guidelines and registries. As well as using curated rules directly on the published literature.

We're also going to profile gene and variant or biomarker frequencies and tumor types to identify particular biomarkers to preclassify and interpret. And I show an example here for KIT mutations, where in we profiled KIT mutations from COSMIC and understand that almost half of the KIT mutations from COSMIC maintained reading frame. Those that maintain that reading frame are deletions as well as insertions, and that these in frame mutations are clustered in exons eight through eleven. And the other set of variants that tend to predominate in KIT are mutational hotspots, and their frequencies are shown in section D there. And so in some we can use this type of information to write rules at an exon level using consequence predicates to cover things like in frame indels and different exons. As well as hotspot rules in order to capture the clinical intent of FDA approved labels and guidelines, as well as other levels of evidence.

Now one of the additional pieces of work then we'll do, as we're performing this activity in preparation for in vitro diagnostics, is as we run gene variant frequencies and tumor types. We will again make associations to the annotation rules as well as the existing and new curation rules in order to then evaluate each variant or set of variants for their classification as well as their interpretation.

Now that activity will actually be performed by PierianDx, wherein PierianDx will go through the exact same process that I described earlier and will proactively rank, assert, classify and interpret one or more variants within the context of a tumor type, and classify and interpret each of those variants.

The set of all of those variants or biomarkers, their rankings, their assertions and their final classifications and interpretations, then forms the first version of our in vitro diagnostic ready knowledgebase. That knowledgebase then can improve over time, and I'm showing you how we do that through consensus building, and to proactively assess and resolve conflicts here, where what I'm showing you is within our partner sharing network. We can evaluate how variants have been classified previously and if we find variants that have conflicting classifications or interpretations for medical directors. Those then can actually be assessed by our PierianDx curation team and a recommendation made by that PierianDx curation team that's then approved or modified by our knowledgebase clinical expert panel. So these are a set of like a pathologist, oncologist, geneticist, counselors and others who can help review the resolution of conflicting classifications and interpretations. In order to have a consensus classification as well as an interpretation that may describe the underlying source of the conflict and make a decision. That resolution then for that variant is put back into the knowledgebase, the in vitro diagnostics knowledgebase, in order to form then version 2.0 of that knowledgebase.

Conclusion

So in conclusion, what I've show you is that PierianDx is taking a leadership position in the cancer community by supporting current Research Use Only (RUO) and Laboratory Developed Tests (LDT) case processing through our knowledgebase, through our workflow and through our supporting service solutions. I've described the key differentiators in our knowledgebase including the sharing network as well as the inferencing concepts. I just showed you how we would facilitate consensus based knowledgebased building for in vitro diagnostics, and a very key component of that is the continued sharing of classifications and interpretations by our medical directors. That in and of itself as well as updates to approved labels, practice guidelines, the literature, that will be continually curated by our dedicated curator team and updates to classifications and interpretations, as a result that'll be made to create new versions of that IVD knowledgebase will improve the quality of the report that's generated by our system. And these reports as a result will become more comprehensive, more automated and closer to ready-to-sign-out. In order to meet the challenge of larger complex assays that are on the horizon and that will be in vitro diagnostics.

Questions and Answers

Question: How often is the knowledgebase updated? So it's kind of a generic or broad question, but maybe you can just give some insight into kind of the update process, what components get updated when and how we handle that.

Answer: As I described our knowledgebase contains a number of base sources including genome builds, Gene RNA protein models and other sources. Each of those sources is versioned and updated independently as updates are made available. Importantly those updates are not required to be adopted by individual laboratories running their LDTs. So for example, a new human genome build may exist but there's no requirement to utilize that new human genome build as you process your cases. When you are ready to adopt that new build, you can utilize our test environment in order to validate and verify on that new build across your analytical diagnostic samples. In order to then determine that you have equal or greater performance and then apply that to production. The same is true of other sources such as COSMIC, TCGA, minor allele frequency databases and the like.

Now curated knowledge sources, such as FDA approved labels, practice guidelines and trials are continually monitored and updated on a weekly basis by our team. They go through a rigorous QC process and an approval process and are then made available within the system such that cases that are run week to week can leverage and utilize the newest versions of those sources.

Question: There's a question here if I'm understanding correctly about an institution saving their own interpretations or classifications or what not will the work that we do when we update the knowledgebase, will that overwrite those interpretations or how does that, what's the difference there between the two?

Answer: That's a great question, so within our knowledge sharing network we provide the underlying evidence across all interpretations and classifications that have been written in the past and make them available such that they can be cross compared. And I showed an example of that across different tumor types earlier. So in short, no we do not overwrite anyone's classifications or interpretations and our inferencing engine will preferentially utilize site's own classification and interpretation as the default value as it processes cases. Although it'll make all other versions available for review by that medical director.

Question: Do you see such interpretation service or knowledgebase will go to FDA approval and be really ready-to-sign-out?

Answer: Yes, that's a great question. This isn't a new area of potential interest for the FDA, I think it as a changing and evolving space. The FDA historically has limited their purview to the reporting of companion diagnostics. As well as the reporting of analytically validated variants coming out of next generation sequencing tests and has left the knowledgebase that would then result in a more comprehensive report to be in the realm of the practice of medicine, and not within their purview. However, the FDA is also a published guidance on recognition of genetics variant databases and processes in order to apply for that recognition. That may be a path through which such knowledgebases can gain FDA recognition.

Question: Do we as PierianDx have tumor board capability to discuss cases by different specialists, pathologist, oncologists, geneticists etc.?

Answer: Great question, it is part of our interpretation services activities and capabilities. We're certainly able to participate in individual organizations as molecular tumor boards if we perform the interpretive services for that site. We can join molecular tumor boards and provide our input, scientific, medical, informatics and otherwise.

Question: So the next question has a couple related questions, having to do with variant types or assays that the knowledgebase supports. So you touched briefly on this at the beginning too but the question came in, does the knowledgebase support all variant types? Which assays does the software knowledgebase support? But maybe just give a general description there.

Answer: Sure, so the knowledgebase does support all the different variant types. Again [inaudible 00:38:02] type variants, indels [inaudible 00:38:06] variants, fusions, either DNA or RNA based fusions, biomarkers like TNB and MSI, and assays are supported based on the content that is curated within our system. And that's greater than 1,100 genes as I described, virtually every cancer gene that is in any assay that's being run by any of our customers are offered by assay vendors.

Question: How long will PierianDx store the patient identifier and patient data files submitted by the hospital?

Answer: That's a great question. So we work with our customers to be the storage and repository site for both primary files such as FASQ as well as any processed and intermediate files. Such as BAMs and VCF's and we would store that indefinitely based on our customers' requests and needs. We also make intermediate and final resulting files available for the customer to download and warehouse, should they choose to do that themselves. And very happy to do that in compliance of applicable CAP checklist.