My intent is to start mixing in some non-coronavirus posts along with my pandemic science coverage – you know, like the blog used to be way back earlier in the year (!) Today’s subject might be a good transitional one – it’s an article in the New England Journal of Medicine on coronavirus drug discovery, but the points it raises are generally applicable.
“How to Discover Antiviral Drugs Quickly” is the attention-getting title. The authors are all from Oak Ridge, not known as a center of drug discovery, but the connection is the massive computational resource available there. Their Summit supercomputer is recognized as currently the most powerful in the world, which is a moving target, of course – Oak Ridge itself is expecting an even larger system (Frontier) next year, and other labs in China, etc., are not sitting around idly, either.
The authors note that “The laborious, decade-long, classic pathway for the discovery and approval of new drugs could hardly be less well suited to the present pandemic.” I don’t think anyone would argue with that, but it slides past a key point: it could hardly be less well suited to any other disease we’re trying to treat, either. Right? Is there any therapeutic area that’s best served by these timelines as opposed to something quicker? So this is not a problem peculiar to the coronavirus situation, although it does make for a more dramatic disconnect than usual.
Docking and Screening
The paper makes the case for high-throughput ensemble docking of virtual compound libraries. Many readers here will be familiar with the concept, and some of you are very familiar indeed. If this isn’t your field, the idea is that you take a three-dimensional representation of a candidate molecule and calculate its interactions (favorable and unfavorable) with a similar three-dimensional representation of a protein binding site for it. You’re going to be adding those up, energetically, and looking for the lowest-energy states, which indicate the most favorable binding. If that sounds straightforward, that’s because I have grievously oversimplified that description. Let’s talk about that.
Among the biggest complications is that both the molecules of interest and their binding site can generally adopt a number of different shapes. That’s true even when they’re by themselves – some of the bonds can rotate (to one degree or another) at room temperature without much of an energetic penalty, and taken together that gives you a whole ensemble of reasonable structures, each with a somewhat different shape. A real kicker is that the relative favorability of these depends first on the compound’s (or the binding site’s) interactions with itself: they could swivel around to the point, perhaps, where it starts to bang into itself, or you could rotate a bond to where nearby groups start to clash a bit, or you could cause a favorable interaction (or break one up) with such movements. And these energetic calculations are also affected by each partner’s interaction with solvent water molecules, which are numerous, highly mobile, and interacting with each other at the same time. Finally, the relative energies of each partner will be affected by the other partner. As a target molecule approaches a binding site, a dance begins with the two partners shifting positions in response. You can have situations (for example) where there might be a favorable binding arrangement at the end of such a process, but no good way to get to it by any step-by-step route. The whole field of “molecular dynamics” is an attempt to figure out this process frame-by-frame, and if you thought getting a static picture was computationally intensive, MD will eat all the computing cycles you can throw at it. (Here’s an older post on that topic, but many of its issues are still relevant). One thing that becomes clear is that there may well be some arrangements of either partner along the way that would be considered unfavorable if you calculated them alone in a vacuum or surrounded by solvent, but which make perfect energetic sense when they’re interacting with the other partner nearby.
Practitioners in this area will also appreciate that all those energetic calculations that the last long paragraph relied on are not so straightforward, either. Binding energy involves both an enthalpic term and an entropic one, and these can work in the same direction or can largely cancel each other out (a common situation). Even such an apparently straightforward step as displacing a water molecule from a protein’s binding site (to make room for a candidate small molecule) can be enthalpically favorable or unfavorable and entropically favorable or unfavorable, too. These calculations involve (among other things) the interactions of hydrogen bonds (very important), of charged or potentially charged groups such as acids and amines, of local concentrations of electron density such as pi-electron clouds and around electronegative atoms, and of plain noncharged alkyl groups that can attract each other weakly or strongly repel each other if they’re jammed together too closely.
There’s a lot going on, and dealing with all of these things computationally is always going to involve a list of tradeoffs and approximations, no matter what your hardware resources. Skilled molecular modelers will know their way around these, realize the weaker points in their calculations, and adjust their methods as needed to try to shore these up. Less skilled ones (and let me tell you, I am one of those) might be more likely to take some software’s word for it, whether that’s a good idea or not. These various software approaches all have their strong points and weak ones, which might reveal themselves to the trained eye as the molecules (and the relative importance of their interacting modes) vary across a screen.
Now, all this is to point out that while speeding up the calculations is a very worthy goal, speeding up calculations that have underlying problems or unappreciated uncertainties in them will not improve your life. The key is, as always, to validate your results by experiment – and to their credit, the Oak Ridge authors specifically make note of this. This is a good way to expose weaknesses in your approach that you wouldn’t have appreciated any other way, which sends you back for another round of calculation (improved, one hopes).
“Virtual screening” of this sort has been a technique in drug discovery for many years now, and its usefulness varies. Sometimes it really does deliver an enhanced hit rate compared to a physical assay screen, and sometimes it doesn’t (and sometimes you never really know, because you’re doing the virtual one because the real-world screen isn’t feasible at all). It’s definitely improved over the years, though – the methods for calculating the energies involved are better, and we can evaluate more far more shapes and conformations more quickly. But it’s important to realize that the larger the screen, the more work needs to be done up front to set it up properly – here’s a post on a paper that goes into that topic.
What Screening Gets You
And now we come to the bad news section, when we ask: how much time does one save in a drug-development process through improvements in high-throughput screening? Unfortunately, the answer is, mostly, “not all that much”. The laborious parts come after the screen is done, and they’re pretty darn laborious. Hits that come out of a screen have to be modified by medicinal chemists for potency, selectivity (against the things you know you should worry about, anyway), metabolic stability and clearance, toxicology (insofar as you understand it), and other factors besides, not all of which will be working in the same direction. Some of these things can be helped a bit by computational approaches, sometimes. But not all, and definitely not always.
And all this is before you even think about going into clinical trials. But those are the really hard part, where we have, for new investigational drugs, a 90% failure rate. None of the most common reasons for those failures are addressed by the supercomputer screen that started off the project. One big problem is that you may have picked the wrong target, and another big one is that your drug may end up doing something else to patients that you didn’t want. Neither of those problems are amenable – yet – to calculation, especially not the kind that the NEJM paper is talking about. You have pick a target before you start your screen, of course, and you get ambushed later by toxicology that you never even knew was coming. It’s not that we don’t want a computational way to avoid such nasty surprises – that would be terrific – but nothing like that is on the horizon yet. Billions of dollars, big ol’ stacks of cash, are waiting for the people who figure out how to do such things. But no one can do them for you at the moment.
Now, I understand that the early computational screens against coronavirus proteins were for repurposing existing drugs, which is indeed a really good idea – it’s the way to get something into the front lines the quickest. But the Oak Ridge folks ran that screen back in February (and good for them for doing it). The last paragraph of the current article is a bit vague, but as it ascends into the clouds it seems to be aiming for something more than repurposing. That, though, will subject it to just those problems mentioned in the last paragraph. Virtual screening gets a lot of effort thrown at it because, honestly, it’s just a lot more amenable to a computational approach, so far, than the really hard parts are. People can do it, so they do.
In the end, though, screening is just not a rate-determining step. Making it faster is no bad thing, but it’s like cutting a couple of minutes off your trip to the airport to catch a six-hour flight.