I enjoyed this ACS Med. Chem. Letters perspective on AI and machine learning in medicinal chemistry. It has several good points to make, and it brought up one that I haven’t gone into here before: if you’re mining the literature, you will get what the literature can tell you. At the very best, the high end of the scale (we’re not there yet) the software can tell you things that you didn’t yet realize that the literature was telling you, but all you’ll get is what was there in the first place.
And in some cases, that could lead to trouble. Consider retrosynthesis software:
One potential drawback of this machine learning and pattern matching methodology is that it has the propensity to become self-reinforcing in certain areas. Take, for example, the emerging platforms for reaction planning. . .in suggesting best routes, such systems also often prioritize those routes based on frequency of utilization, i.e., those most used in analogue generation, such as Pd-mediated couplings and amide formations. In doing so, it is not unreasonable to suggest that over time, the utilization of these reactions increases ever further, making them more likely to be suggested by, and therefore potentially reducing the desired creativity and power of the AI systems to help us avoid over-reliance on certain reaction types.
I could definitely see that happening, especially since this sort of software is going to be used for bang-it-out let’s-just-make-some applications, for the most part. We have a lot of those in med-chem. So in terms of “did you get the compound?”, things will work fine, but just being able to make the compound won’t advance the science of synthesis very much – but perhaps this is just an amplification of what already happens. We self-amplify by hand, because as it is now, when it’s time to make some molecule any old way we can make it, we chemists tend to use reactions and routes that we already know and have used before. To see this on a smaller scale, it’s like the way people do palladium couplings: there are better catalysts than the ones that most of us reach for, but since those old favorites do tend to deliver product, we just take what comes out and move on. I remember an email a few years ago from a colleague in the scale-up group, to the effect of “Stop using tetrakis all the time you bozos”, but I’m not sure it did much good.
New reactions do catch on, of course, but they especially do so when they provide some sort of transformation that hasn’t been easy to do before. That’s why redox photochemistry has made inroads – it allows for bond formations that aren’t otherwise accessible. But a great new Pd coupling catalyst will take longer to catch on, because of the inertia just described. This means that there are two particular groups of synthetic customers who will adopt new reactions more quickly: total synthesis people in academia, who need to do difficult transformations in the highest yield possible, and (to a lesser extent) process chemists in industry, who need to maximize yield, reproducibility, cost, and suitability for large scale work. They’ll jump on new reactions (catalytic ones especially) that look more scalable, but they’ll check them out thoroughly to make sure that they actually work the same way every time. Both of these cohorts are outside the “just get some compound, willya” demands of some other fields, and one wonders if that might make them less likely to use the earlier versions of the retrosynthesis programs.
But the piling-on aspect of machine learning could extend further. This is something that (some) people in the area have been thinking about, but those of us seeing ML move into our fields may not have internalized it. If you’ve been involved in handling large data sets in general, these concerns are (or should be) second nature to you, but I’m more worried about the eventual customers who haven’t been doing that. The challenge will be to keep such systems from becoming “conventional wisdom machines”. Sticking with the retrosynthesis example, perhaps there could be separate “exemplification” and “novelty” scores available if you want them. You could imagine setting it along a scale, generating routes that range from “bound to work, everyone’s done this” to “potentially a lot easier/shorter, if these papers are real”.