There were some headlines the other day about the “first AI-discovered drug”, so that should send us to the work in question to see what’s going on. The company in question is called Deep Genomics, and here’s what its founder has to say:
“Making drugs has traditionally been a gambling game. Big Pharma is throwing a stick into the tree and seeing what happens,” Frey told FierceBiotech. “It’s like the Big Pharma companies come into a casino, put a million-dollar coin into a slot machine and with some probability like 10% or something, they get a win.”
Instead of gambling to get at the fruit higher up on the tree, Frey built Deep Genomics, a company using artificial intelligence to discover new disease targets as well as the best compounds to drug them. He calls it building a ladder.
This might be a good time to deal with that casino analogy (although I note that Frey abruptly relocates to a fruit orchard location in the middle of it). I have a presentation that I give that starts off with some of the analogies to drug discovery (making a Hollywood blockbuster film, wildcat oil exploration, etc.), but I don’t have million-dollar slot machines on the list. My problem with it is that everyone has seen slot machines and knows the low probabilities that they will deliver a huge payoff, and using them as a metaphor makes it sound like we in drug research just randomly throw things around until something delivers. That one-in-ten figure is indeed the success rate for drug candidates going into clinical trials and making it to eventual approval, but that’s our success rate after we’ve put as much thought as possible into possible targets, modes of action, screening cascades, and the drug candidates themselves. To use that metaphor, you’d have to imagine fighting your way through a jungle obstacle course just to get to that casino floor, a gauntlet that selects for the strong, the wary, the intelligent and the experienced and the fortunate as well. That one-out-of-ten is what we get after we’ve spent vast amounts of effort trying to drain as much random chance out of the process as we can.
At any rate, Deep Genomics seems to have its machine-learning models trained on (as the name implies) human genomic sequences, and they’re searching for drug targets there. Here’s the paper authored by the Deep Genomics team with the details on what they’ve done in this case: the story is on Wilson’s disease, a rare copper-storage pathology known to be caused by various mutations in the ATP7B gene, which codes for a copper-transporting protein in hepatocytes. There are a number of these mutations; this paper focuses on one called NM_000053.3:c.1934T>G, which is a Met645Arg substitution. It’s been found in some Wilson’s patients, most commonly in those of Spanish descent and in a couple of other European lineages as well. But that actual mutated protein doesn’t seem to really be impaired for copper transport in functional assays, which has led to some confusion in the literature about the status of this mutation.
Deep Genomics’ software is trained to look for potential splicing problems from sequence data, and it flagged this mutant for inducing exon-skipping. The variant protein is indeed functional, but not so much of it even gets made because of this effect – it happens with exon 6 in the sequence, in this case leading to frameshift trouble and nonfunctional protein. The group validated this hypothesis in cells, showing that 60% and more of the protein is affected by just this problem, which does take you into pathology. It’s worth noting that other exon-skipping events on that same Wilson’s gene have also been recently reported.
So where does this “first AI drug” part come in? The company’s press release talks about generating 12 candidate compounds, and those turn out to be antisense oligonucleotides, which are known to modify splicing behavior. That’s the mechanism for Spinraza (nusinersen), for spinal muscle atrophy, but it’s worth keeping in the mind the (very mixed) results in Duchenne muscular dystrophy as well (see this link again for more on that issue). Deep Genomics says that they’re taking one of their 12 ASOs, DG12P1, towards filing for an IND to go to the clinic. So far, so good.
But allow me to play Devils’ Advocate for a bit. Deep Genomics has indeed done what they say they’ve done, but I’d like to put that in a wider context. First off, it needs to be noted that there are (so far) about 500 mutations known to be associated with Wilson’s disease, of which this is, of course, one. This situation leads to a wide range of phenotypes, as you would well imagine, and there seems to be some confusion about just how many Wilson’s patients there are. At any rate, it appears from the literature cited in the Deep Genomics paper above that (outside of Spain) perhaps one out of fifty or so Wilson’s patients have this mutation (there is a paper studying several dozen unrelated Spanish patients that found about half of them with this mutation, though, so there are very strong variations in this number).
So how many patients are there? The National Association for Rare Disorders estimates that there are 2 to 3 thousand diagnosed Wilson’s patients in the US, and perhaps 9,000 total when you account for undiagnosed or misdiagnosed cases. If that 1/50 holds, then that means about 180 people in the US who could be helped by this treatment if it works out. Now, modeling rare disease therapy is not easy, but this would put the target population very far into the EU’s “ultra-orphan” category. Now comes a problem: those estimates of Wilson’s prevalence and for this particular variant of it are all over the place because the only way to identify these patients is through genetic sequencing. And it’s not like everyone gets sequenced at this level.
When you’re developing a rare disease therapy, some big considerations are: (1) is there an actionable target for therapy, (2) what are the chances of developing an agent that hits this target, (3) how many patients are in need of such a therapy, and (4) can these patients be identified (and identified in time to help them)? These are demanding criteria. Deep Genomics has nailed the first one, and because ASOs for exon-skipping diseases are precedented (and since you can piggyback on the known design of such agents from the targeted sequence and their known chemical properties that have been worked out by others), the second one is in reasonable shape. But those third and fourth considerations are going to be very hard indeed. If you can’t find the patients, you can’t treat them, so Deep Genomics is going to have to make sure that every new Wilson’s patient gets sequenced (which, to be sure, is increasingly common) and that they know about it and can find the patients they want (which is not common at all). I don’t even know if there is a genetic registry for such patients in the US. There is one in France (with 906 patients in it at the last count I’ve seen), which might represent a couple of dozen more potential patients for any Deep Genomics drug.
That last link notes that the mean age for diagnosis is about 18. The earlier you find out about something like this the better, naturally, since the damage that’s already been done will often be very hard to undo. The problem is that no one gets sequenced for Wilson’s until they’re already showing overt symptoms of the disease, and I would expect any drug that restores functional protein would mainly stop the damage from getting worse. I hope to be wrong about that, but that seems to be likely.
Putting all this together, here’s what I see. Deep Genomics has developed a machine-learning approach to finding potential exon-skipping mutations from sequence data. It identified a rare variant of a rare disease as a candidate. Since one can already predict a list of potential steric-blocking antisense oligos from such sequences, it is a relatively small step to generate candidates, at least compared to running a small-molecule discovery effort (especially one in this area!) But keep in mind that the development of such an agent is fraught with difficulty – although I will say that targeting the liver is a plus, since that’s where most of any such compound is going to end up anyway. All this makes the whole “AI-generated drug” part of the story rather less impressive to me; others may disagree. And as for a drug, that gets complicated. As detailed above, even outside of the trickiness of antisense oligos, this will be a very difficult development because of the small number of patients and the trouble in identifying them. Taken together these difficulties are significant and will, I think, make it difficult to raise money for such a development program at all. My own guess is that Deep Genomics will not be able to overcome them, but again, I’m open to being proven wrong.
But “Machine-learning approach identifies rare disease mutation that is still unlikely to lead to clinical trials, much less a marketed drug” is nowhere near as attention-getting a headline. I can see why they went with the one that they did. Deep Genomics, good luck to you, and I hope you find a way to make me completely mistaken.