Here’s an interesting new paper from Lilly (brought to my attention by Ash Jogalekar on Twitter). “Creating a virtual assistant for medicinal chemistry” is the title, but fear not: this is not something that’s come to elbow you aside at the bench. Well, not yet. What they’re talking about is a software agent that is taking on the job of handing around information around the project as it’s generated – these compounds got made this week, these compounds got tested in these assays, these were the most active of the bunch, here are the ones that haven’t been run yet, etc.
That sounds pretty mundane, but it’s quite valuable. This is one of those irritating issues that teams wrestle with, and companies devote a lot of time to figuring out how to share data across large groups in some effective way. Thus you have your Sharepoints and Microsoft Teams and Slacks and Yammers and mailing lists and newsletters and project team updates and meeting minutes and Spotfire files and databases and. . .it goes on. The tricky parts include: (1) the information to be shared is pretty heterogeneous, ranging from high-level “What assay should we run?” and “What compounds should we make next” stuff down to piles of assay data. And since most projects are running several sorts of assays, at different frequencies and in different hands/locations, with different aims and priorities, keeping track of it all is nontrivial. There have been many “Here are the compounds run this week” sorts of system put in place over the years, but the more context these things can be put in, the better.
So the Lilly software (“Kernel”) sends out notices as the numbers hit the database, summarizing what’s going on. But it goes further, also trying to highlight compounds that behaved in unexpected ways by using metrics and models from the project so far (efficiency metrics, QSAR). I can certainly see why that part’s in there, but I’d be interested to hear candidly how valuable the recipients find that part of the output. I can imagine that it works better for some projects (or some SAR series within given projects) than others, and the problem with that situation is that people will tend over time to sort of slide over that part rather than put the effort in to see if it’s worth their time (in this instance) or whether it’s just something not to bother with. There’s a mention of trying to add automated binding mode predictions as well, to which I attach the same concerns.
The software also takes active compounds and runs some matched-molecular-pairs type substitutions on the structures, evaluating those by the same criteria it uses for outlier flagging. The idea here is to put up ideas for further work and to identify what could be particularly interesting analogs to make. All I can say on that is that if the evaluative mode for these models might be problematic (with people finding it useful or not), the predictive mode is surely even more so. I wouldn’t object to seeing what the software came up with, but I would also be quite curious to see how seriously I could take its suggestions and whether it was coming up with anything that I hadn’t thought of myself. It looks like the Lilly team is wondering the same thing, naturally:
. . .As of this writing, 63 compounds have been suggested to a chemist by Kernel and later made by the same chemist (frequently not as a result of the suggestion from Kernel). We also looked for compounds that were predicted by Kernel, but made by a different chemist on the project. These cases most likely correspond to independent compound design (i.e. the chemist came up with the design idea on their own). In these cases, we can look at the time between Kernel’s prediction and when the compound was made independently to see what kind of speedup could be possible if the chemist had seen the prediction and selected those compounds from the much larger number sent to them (typically ~20-times more compounds than those made). On average, Kernel predictions were about 35 days (range of 4-72 days) ahead of the chemist’s. This represents the possibility of accelerating the progress of a discovery project which could be a significant saving in both time and money.
It might. Or it might not. It also depends on how useful those predictions were, and how they compared to the compounds made by the medicinal chemists that were *not* predicted by Kernel. It will do no one much good if people are nudged to make this set of randomly active compounds instead of that set, in other words. There are not data of this sort in the paper, although there is an interesting case from last year, where the team fed the Kernel recommendations back through automated retrosynthesis software, and the easier ones were then submitted for automated synthesis and sent in for testing, which as the paper notes is the first no-human-input SAR cycle tried at Lilly. As you read on, though, you find that “Our implementation is still suboptimal and in this case the synthesized compounds did not drive the project forward. . .”
That’s what I would have expected – after all, most human-evaluated and human-synthesized compounds don’t really drive the project forward much, either, on average. So it’s way too early to say how useful this is – I mean, you can picture a future where it’s extremely useful indeed (here’s a vision mostly on the chemistry end of things). But how long it will take to get to that future, well, that’s another thing entirely. I think that the automated-data-distribution is useful, and it’s the sort of thing that other organizations have looked into over the years as well (because it’s not that hard to implement, comparatively). Once you start to bring in modeling and prediction, though, that’s another world – you’ve moved from reporting to prediction/speculation, and honestly, our current systems aren’t that great at prediction (and their usefulness varies widely from situation to situation). That’s the weak link in this whole idea. And it’ll get better – I assume that all modeling will get better. But how much better will it have to get, and how long will that take, and how will we know when it’s there?
I would myself be tempted to split those email alerts in two – one for the facts and one for the predictions/evaluations, because one of those can be argued about (or dismissed, fairly or unfairly) a lot more easily than the other. Mixing them is, I think, a mistake – but clearly the Lilly team doesn’t see it that way, because this whole paper is about how all these things are just naturally intertwined. Thoughts?