Here’s an interesting paper from the Cronin lab at Glasgow. It’s titled “Controlling an organic synthesis robot with machine learning to search for new reactivity”, and that title alone will make some of the readership here eager to hear more, while sending others fleeing in dismay. It seems difficult to be neutral about such topics, but here we go:
The idea is to find new reactions. There are, broadly speaking, three ways to do that. One is by sheer brainpower (human or software): predicting a reaction where none is known by calling your shot based on chemical intuition, quantum mechanical simulations, etc. The second, at the other end of the scale, is by sheer serendipity: mix things together, over and over, and see what happens. And the third is a hybrid, a sort of directed serendipity that allows for random chance but tries to narrow the search down to promising areas instead of sampling more randomly.
Now, there have been a number of reports of new reaction searching by automated microscale synthesis, which sounds like the second method. But these are really more like the third, hybrid one, since they tend to start in areas (metal-catalyzed reactions, for example) where a good deal of interested reactivity is already taken for granted, and with substrates that are likely to participate. This latest paper is another hybrid approach, but this time, the plan is to evaluate a smaller set of test reactions and see if the system can predict what might happen across the rest of the reaction landscape.
It’s not a particularly high-throughput system – six reactions at a time, 36 a day, although that’s still more than any human could put up with. The robotic system is also hooked up to a flow NMR system and an IR spectrometer. The data from these instruments is used to tell the software “Was this reactive or not?”, that is, “Did something happen?” The model was trained on 72 reactive and non-reactive mixtures (as chosen by an actual human chemist), and the authors say that it was about 80% accurate in predicting reactivity in general after that. Note that the model itself is agnostic about chemical structure or known reactions – all it knows are representations that this is reactant one, that’s reactant two, etc., and that these are binned into broad classes (which we humans would refer to as “aldehydes”, “aromatic amines” and so on).
So now if you hand the system a new set of reactants, it starts out by running some random combinations of those and seeing if anything happens. It goes through a set of these and builds a model of what it thinks the “hot spots” for reactivity in the whole set might be, then runs the new combinations that seem most likely to do something based on that model. The results of these experiments are, naturally, fed back into that model for further refinement.
A test of the system was based on a large literature set of Suzuki couplings. After building a model of reactivity based on those (several thousand reactions), the robot was turned loose on a set of 576 potential reaction combinations (also from the literature, but not in the training set). And it did indeed pick up more reactive combinations first, as evaluated by looking over the known results for the first 100 reactions selected, the second 100, and so on – the fraction of these giving actual products started out high and went down progressively, with the last batch being most things that were predicted not to work (and in fact had not).
Of course, we’ve only been talking about “Does this reaction make a new product”, without going into the details of what these products might be. Suzuki couplings are a lot easier to predict, but what about the set of randomish ordinary reagents? The team went back to the combinations that were predicted to do things and looked, then, at what actually happened. Four new reactions were actually found this way, and one of these is shown at right (two of the others include reaction with DBU’s heterocyclic ring system, interestingly). They have an X-ray of that one, actually, in case you’re wondering.
My take on this is that the software model developed here could be usefully combined with some of the higher-throughput approaches described by other groups. That way, you could potentially set up entire plates of microscale reactions with (presumably) enhanced hit rates for new reactions already built into them. There are bound to be other ways to set up the neural-net prediction models, too – this attempt would suggest that the approach is feasible, at any rate. And of course, you could also imagine a follow-up system that takes some of these new reactions and then does an automated optimization of the conditions (via design-of-experiments or some such approach) if you pick out a reaction of particular interest. And you would also want to take the most robust of these new reactions and make sure that your retrosynthesis software package knows about them, too, right? As one automated system hands its results on to another. . .