Behold the rise of the machines. It’s been going on for a while, but there are landmarks along the way, and we may have just passed another one with the publication of this paper. It’s open-access, from an interestingly mixed team: the Polish Academy of Science, Northwestern University, the University of Warsaw, the Ulsan Institute in South Korea, and. . .MilliporeSigma. Those who are into scientific computing may have already guessed that the Polish connection is to the Chematica retrosynthesis software. It’s the MilliporeSigma one that makes things of particular interest here (a company that to certain generations of chemists will always be Aldrich or Sigma-Aldrich in their hearts).
I’ll let the summary to the paper lay out the case:
Here, we describe an experiment where the software program Chematica designed syntheses leading to eight commercially valuable and/or medicinally relevant targets; in each case tested, Chematica significantly improved on previous approaches or identified efficient routes to targets for which previous synthetic attempts had failed. These results indicate that now and in the future, chemists can finally benefit from having an “in silico colleague” that constantly learns, never forgets, and will never retire.
All right, then. As advertised, what this paper has done is to pick out six molecules of interest to the MilliporeSigma folks, all chosen because they are of strong commercial interest but had troublesome syntheses (low or inconsistent yields, or failed routes altogether). In addition, the cardiovascular drug dronedarone is on the list because there are numerous process patents detailing routes to its preparation, making this a good reality check for the software, and there is also a natural product (engelheptanoxide C) that has been recently described in the literature but not yet synthesized. The structures of these are shown at right, and the chemists in the crowd will not that this is a perfectly reasonable test: these are real compounds, all the way. Medicinal chemists will note that several of these are hydroxylated metabolites of known drugs, which are valuable reference compounds from a commercial standpoint.
The software was turned loose on all these structures to come up with what it regarded as plausible retrosyntheses, with the starting materials defined as things easily available in the Sigma-Aldrich catalog (naturally). And these routes were put to an interesting real-world test (as suggested by the DARPA funding that went into the project): the routes were put into practice in the lab in the four cases by chemists at MilliporeSigma (an experienced bunch), and in the bottom four cases by students with little or no practice in multistep organic synthesis, just to see if the routes were practicable by less-experienced hands.
The software generated routes in about 20 minutes for each of these. If the top-rated route was sufficiently different from what had been tried before, and if the starting materials were readily available, it was chosen as is. Otherwise, the second-ranked route was used (this happened in three cases). The reactions had to be taken as given, in their general form, although modifying conditions (temperature, solvent, etc.) was permitted. The MilliporeSigma targets were to deliver at least several hundred milligrams within 8 weeks, at 98% purity, while the student syntheses were more like three months but with similar purity.
I won’t go into all the details of the syntheses, since the paper is open-access and you can read them there. But when I will say is that for the four MilliporeSigma targets, the existing routes were substantially improved in all cases. The improvements were of several kinds (shorter routes, fewer chromatography steps, higher yields, more reproducible) and came from several directions (completely different synthetic approaches, different starting materials, etc.) The improvements in the latter four compounds were similar, and in the case of the third one down on the right (a metabolite of lurasidone) the software not only improved the synthesis, but in doing so broke the patented route to the compound. One interesting feature that shows up several times is that the software predicted (based on its knowledge of the literature) some “don’t bother protecting that OH” reactions that some chemists might have worried about trying, but which can be gotten away with.
This is very impressive work. Even discounting for having to do more work on the suggested reactions, which I’m sure was the case, it’s still impressive. One thing to note is that the software may (in those three cases mentioned above) have suggested routes that were very close to the existing (problematic) ones, which highlights the well-known fact that what looks good on the board doesn’t always go so well in the fume hood. This isn’t explicitly addressed in the paper. But overall, this paper is a pretty strong argument for the whole approach.
And from a theoretical standpoint, it seems clear that this is how things are going to go. I recently read Gary Kasparov’s Deep Thinking, about his experience with IBM and its Deep Blue program, and one of the points he makes is that even when he beat an earlier version of the program, he knew that it was going to surpass human chess play. That’s because it was constantly improving, faster than humans do (or can). And working out a retrosynthesis, versus playing chess, is similar enough that the same considerations apply. Chematica (and its competition in the software field) is getting better all the time. More reactions are entered, existing ones are extended and curated more precisely, adjustments are made to the algorithms, the hardware gets more capable.
So the fact that the program – or any such program – does as well as it does here means, folks, that the handwriting is on the wall. Not this afternoon, and not next week, but in the easily foreseeable future retrosynthesis and synthetic organic chemistry planning are going to be taken out of the hands of chemists. At least, that’s how it’s going to seem to us, the chemists of the present. But to future chemists, the ones who will enter the science once this transformation is complete, it won’t seem like that at all. To them, synthesis planning will always have been something that you have machine help with – why would you do it any other way? Who can carry a zillion reaction examples around in their head?
Kasparov mentions the idea of “centaur” chess players, humans aided by software in their analysis of games and positions. We organic chemists have been centaurs for a long time now, considering how much help we get from our machines and instruments, and this is going to be another example. It is certainly different in degree, and may well feel a bit different in kind, but it’s coming no matter what we feel about it. Prepare yourselves.