Here’s how the press release starts, and I’ll say this for it, it does get the reader’s attention: “Atomwise Inc. seeks proposals from innovative university scientists to receive 72 potential medicines, generated specifically for their research by artificial intelligence.” As you’d imagine, this is the sort of thing that immediately engages my skeptical targeting systems, and probably many of yours, too, so let’s dive in and have a look. It goes on:
“It’s this easy: researchers tell us the disease and protein to target, we screen millions of molecules for them, and then they receive 72 custom-chosen compounds ready for testing,” said Dr. Han Lim, MD, PhD, Atomwise’s Academic Partnerships Executive. “As a former UC Berkeley principal investigator, I helped design the kind of program I wish existed for my own work.”
This puts me in mind of the scene from Henry IV, when Glendower says “I can summon spirits from the vasty deep”, and Hotspur answers him with “Why, so can I, or so can any man. But will they come when you do call for them?” I have no doubt that researchers who apply for this program will indeed receive 72 custom-chosen compounds. But will they work? Put another way, how much better will they work compared to 72 random compounds out of a screening deck, or (more stringently) 72 compounds imagined by a medicinal chemist after an hour sketching possibilities on a whiteboard?
The software is called AtomNet; here’s the paper on it. “AtomNet is the first deep convolutional neural network for molecular binding affinity prediction. It is also the first deep learning system that incorporates structural information about the target to make its predictions“. I will stipulate that I know nothing about deep convolutional neural networks. But here’s a nonmathematical description of what seems to be going on:
Chemical groups are defined by the spatial arrangement and bonding of multiple of atoms in space, but these atoms are proximate to each other. When chemical groups interact, e.g. through hydrogen bonding or π-bond stacking, the strength of their repulsion or attraction may vary with their type, distance, and angle, but these are predominantly local effects. More complex bioactivity features may be described by considering neighboring groups that strengthen or attenuate a given interaction but, because even in these cases distant atoms rarely affect each other, the enforced locality of a DCNN is appropriate. Additionally, as with edge detectors in DCNNs for images, the applicability of a detector for e.g., hydrogen bonding or π-bond stacking, is invariant across the receptive field. These local biochemical interaction detectors may then be hierarchically composed into more intricate features describing the complex and nonlinear phenomenon of molecular binding.
Now, I have no problem with the local bonding calculations that they’re talking about doing, although they’re subject to the usual disclaimers about the accuracy of the calculations. But the assumption that “distant atoms rarely affect each other” does not seem to me to be valid. Medicinal chemists are quite used to seeing changes in a structure-activity relationship when a reasonably distant atom is changed – “You can get away with a methyl there as long as you don’t have one over there”. There are SARs that do work on the “greatest hits” principle, where you can independently mix-and-match various regions of the molecule, but the great majority of the projects I’ve worked on haven’t gone that way, or not quite. And if I’m interpreting that paragraph correctly, it’s explicitly aimed at the mix-and-match. I’d say that the most common situation is the one where you can get away with independent changes within a given range, which can be a rather narrow one, and then all bets are off. And the only way to discover that you’ve gone outside those ranges is to go outside them.
As mentioned, AtomNet, to its credit, also brings in data about the binding target. But that’s a tricky business, too. As is well known, binding sites accommodate ligands by adjusting their own shapes – sometimes subtly, sometimes dramatically – and this is one of the hardest things to account for in virtual screening techniques. Likewise, the ligands themselves can adopt a range of conformations in response to a binding event, which also adds to the computational burden. I’m not at all sure how this software deals with these problems, particularly the protein mobility one, but if I come across more details, I’ll update this post.
From what I can see, the AIM program is screening databases of commercial compounds and furnishing the applicants with the 72 best purchasable hits. The compounds will be given an LC/MS quality check diluted to an appropriate concentration, and plated out, which is a good service. “Custom-chosen”, though, does not mean “custom-synthesized”, as you’d imagine (I don’t think anyone will be taking that on for free). They’re asking that people come to them, ideally, with targets that have an X-ray protein structure and an identified small-molecule binding site, which is fair enough.
I would very much like to know what the hit rates will be for these, and I suspect that AtomWise very much wants to know that, too, which is why they’re offering to do this for people. The awardees get some potentially interesting molecules to test, and the company gets a presumably diverse set of real-world examples to test their technology against. (I should note that they already have agreements with several academic groups, and one with Merck, for an unnamed project). Personally, I’ll be surprised if there’s much of an enhancement for many of these, but I wish the company luck, and I think that their commitment to putting their software to the test is admirable.
Is it “artificial intelligence”, though? That’s a topic I touched on in my talk last year in Manchester. I think that if you time-machined people from the 1950s into our present-day world and hit them with Google Maps (for example), they’d probably call that artificial intelligence. “Sure, that’s intelligent, although for some reason you only seem to have taught it about roads”. From that standpoint, AtomWise would also be called AI, but from a modern perspective, if that’s AI then so are the rest of the modeling and docking programs. I’ll put that one down to press release language, and hope that it doesn’t become a big part of their pitch.
The part that annoys me more is the “72 potential medicines” line. Screening hits are potential medicines in the same way that AtomWise is a potential Amazon.com – sure, they all start out this way, but not many make it through to the end. People are confused enough about where drugs come from and what it takes to get them there; I’m never happy to see more confusion being dumped in on top of what we’ve got.