Here’s a letter from Pat Walters and Mark Murcko of Relay Therapeutics on the September report from Insilico Medicine (blogged here) of a drug discovered by AI, specifically generative methods. Here’s their working definition of what that means, which I think most folks in the field can agree with:
. . .In this technique, a deep learning model is trained based on a corpus of existing molecules. The model typically ‘encodes’ a higher-dimensional representation, such as a SMILES (simplified molecular-input line-entry system)10, into a lower-dimensional representation, often referred to as a latent space. This latent space can then be ‘decoded’ back to the higher-dimensional representation to create new molecules. The exploration of this latent space can be coupled with a predictive model with the aim of discovering novel, active molecules. In a sense, generative models can be seen as a variation on the de novo design11 programs that were in vogue during the 1990s and early 2000s. As with de novo design, evaluating the significance of the output of these models is not straightforward. Although two groups have made initial efforts at developing methods for benchmarking generative models12,13, evaluating the novelty, and ultimately the significance, of the molecules generated by these methods remains an open question. . .
As Walters pointed out at the time, the best structure from Insilico for their target DDR1 was actually rather similar to the marketed kinase inhibitor Iclusig (ponatinib). It is (as shown at right) an inhibitor of DDR1, but of plenty more kinases as well, which is why it ended up with a black-box warning for toxicity. Now, the Insilico compound was reported with 44 kinase activities, Walters and Murcko point out that none of these overlap with the reported activities of ponatinib, which leaves the question of selectivity unresolved. A pointed (but important) question is brought up about the criteria for evaluating such work: “One has to ask whether a paper reporting work in which a team of chemists substituted an isoxazole for an amide carbonyl to generate a compound that is roughly equipotent with published compounds would be reviewed, let alone published” And that’s true, but one counterargument is of course that this was software doing it, not humans – but the counterargument to that is that we don’t really need fancy software to tell us to try such an analog, and that maybe we should get more excited about generative models when they suggest something to us that’s a little less obvious.
A larger question is about training sets for such generative models. The Insilico team provided references for the sources of the data used for the model, but did not give a detailed breakdown of just how these structures were used. As generative drug analoging grows in importance, it’s going to be crucial for people to make the entire training set available in detail when such work is published. The letter suggests a set of rules for future publications in this area, which seem reasonable and worth following up on.
Two of the key authors on the paper (Alex Zhavoronkov and Alán Aspuru-Guzik) respond in a back-to-back letter, which is good to see. Some of their points:
The critique of Murcko and Walters, and many similar online commentaries, fails to recognize that, as we state in our paper, our goal was to provide the first demonstration of the effectiveness of a novel generative approach; as such, in-depth validation of the molecules produced was not the main goal of our paper. We readily acknowledge that the compounds require further optimization. . .Regarding the statement that “compound 1 is selective,” it should be emphasized that selectivity versus DDR2, as well as against the small panel of kinases provided by Eurofins, is exactly what was claimed in our paper. There were many structures generated by GENTRL that were substantially different and likely to be more selective, but these were more difficult to synthesize in the short self-imposed ‘race’ mode of our original work.
Perhaps the “race” mode is part of the problem here. One hopes that as such methods improve and become less newsworthy that they can be used less in what might be termed “make a big splash” mode, which is what we’ve been seeing a lot of. Once we’re past demonstrating first uses, we can get down to business.
To that point, readers will recall that I expressed skepticism that the startup EQRx would be able to generate new follow-on drugs at the ferocious pace claimed in their publicity (a pace that is said to be enabled by advances in computational drug discovery). I wanted to mention that my prediction on this is now posted at Longbets.com, ready for all comers to challenge. The next step is for anyone who wants to take the other side of that wager to make themselves known on the Longbets site, at which point we will negotiate terms to have an officially posted wager, winnings to go to charities of our choice. Anyone want to take me up on it? I’m doing this because I have an allergy to the hype I detect in such cases, but on the other hand, if EQRx can prove me wrong, I will not be upset by that, either, because it will mean that we really have been making breakthroughs in how quickly drugs can be developed. But I don’t think that’s the case – not enough to enable ten new ones in ten years. Step right up!