This new paper on “ultra-large” virtual screening is well worth a look in detail. We find a great many lead compounds in this business by random screening of compound libraries, and virtual screening is (as the name implies) the technique of doing this computationally instead of with hundreds (thousands) of sample plates and tireless robot arms. All of that takes time and effort and money – accumulating such a compound collection, making sure that those compounds are (or are still) what you think they are, dispensing them in a useful form, coming up with an assay that’s strong enough to run in automated fashion and actually getting it done, etc. The idea of doing all this computationally by docking mathematical representations of molecules into mathematical representations of your target has always been appealing, and it gets more so every year as the hardware gets ever more capable. Even if you can’t predict de novo the compounds that will do the job, and we can’t, you can still run huge numbers of them, all varieties, and see which ones come out on top.
This paper (a large multi-center academic collaboration) reports what I believe is the largest publicly disclosed effort of this type. It takes as its starting point 70,000 commercially available building block compounds, and elaborates those using a set of 130 known reactions. This gives you what should be a “make-on-demand” library whose actual synthesis has a good chance of being reasonable. The paper itself screens 99 million compounds against one target (the AmpC enzyme) and 138 million against another (the D4 receptor), and the library has grown much larger since then. Less than 3% of that library is itself commercially available; there are a lot of compounds to make in this world.
The computational screening of this set is not a trivial exercise – to their credit, the authors did a pretty thorough job. You could play the game of run-a-quick-minimization-and-dock-that-as-if-were-rigid on these things, and you’d get through them pretty quickly, but to what end? Reality is more various than that. So for each compound, an average of around 4,000 orientations was checked (basically, which part of the molecule approaches the protein target and a what angle) and for each of those, 280 conformations of the molecule were sampled. That adds up to a number of possibilities in the ten to the thirteenth range, scored with DOCK3.7, which will take you tens of thousands of core hours to chew through on typical hardware. Compounds resembling known ligands for these targets were deliberately filtered out in a search for new chemical matter.
Now we get to some interesting numbers. From the AmpC hits, the team picked out 51 top-ranking molecules (each from a different scaffold class) to synthesize, and 44 of those efforts were successful. (These molecules, as with the D4 example coming up next, were selected both by docking scores and by human inspection – see below!) Of those 44, only five showed any activity in the enzyme assay (ranging from about 1 micromolar to about 400 micromolar). The best of that list represents a very good starting point indeed for this enzyme, and synthesizing analogs of its structure led to a 77 nM compound, which appears to be among the most potent non-covalent inhibitors reported for it. A crystal structure of the inhibitor/enzyme complex confirmed the predicting docking pose, which is always good to see.
As for the dopamine D4 effort, this one went a bit more in-depth. The team selected 589 structures, and not just from the top rank of the docking scores, but from the middle and lower parts of the list as well. 549 of these could be synthesized, and 122 of these showed more than 50% radioligand displacement at 10 micromolar. 81 of these were dose-responsed, and showed Ki values of 18 nanomolar to 8 micromolar. Not bad! Most of the potent compounds were full or partial agonists, but there were two antagonists in there as well. One of the potent agonists was synthesized as its four separate diastereomers, and one of those was down to 180 picomolar activity, 2500x selective against the related dopamine receptor subtypes, which is about as good as you’re ever going to get.
There are a lot of interesting take-aways from this work. For one thing, as the authors mention, it would be tempting to just dock representative members of each structural type/cluster, rather than having to do them all. But trying that really demolished the effectiveness of the screen, shedding active hits at an alarming rate. The current docking/scoring technology can get you as far as “compounds that look kind of like this”, but definitely cannot reach in and pick out the best representative of any given class. And even that level of discrimination comes with a lot of effort – note the number of hit compounds in both the examples above that turned out to be completely inactive on synthesis. That definitely argues for setting up these virtual libraries according to expected ease of synthesis, because otherwise you could spend a lot of time making tough compounds that don’t do anything. People have.
This also speaks to the importance of size. The D4 receptor has been the subject of virtual screening before, but not at this scale, and the best compounds here were (of course) not found in those efforts. Nor were any that were as potent as the best ones here. Size matters, and since we can’t zero in on the best compounds, we’d better be prepared to evaluate as many of them as we can stand.
Another point is that high-middle-low effort on the D4 case. The binding assay results compared to the docking scores are shown at right. You can see that the number of potent compounds (better than 50% displacement, below that dashed line) decreases as the scores get worse; the lowest bin doesn’t have any at all. But at the same time, there are a few false-negative outliers with binding activity at pretty low scores, and at the other end of the scale, the top three bins look basically undistinguishable. So the broad strokes are there, but the details are of course smeared out a bit.
There’s also a human-versus-machine comparison in evaluating the hits. The authors took the top 1,000 compounds and selected 124 of them by eyeballing them for what looked like good interactions in the docking pose (not looking at the scoring), and took 114 molecules on the basis of docking scores alone. The hit rates for the two sets were almost identical (about 24%), but the human-selected ones were disproportionately potent – and indeed, in the two campaigns, the human-selected compounds were quite over-represented in the lists of potent compounds. So we have that going for us. But again, note that three-quarters of the compound selected, even after all this effort, were not active. That’s a huge enhancement over background, which is good news, but it’s not the magic that some outside the field think we can work, either.
One thing to note is that these two binding sites are very well characterized. There are plenty of compounds known for each, and there’s a lot of understanding about the structure of them bound to the proteins. Trying this against a blue-sky binding site that you don’t know much about is going to be a much different undertaking – but that, of course, is what we’d like to do. Ideally computational screening will eventually do even more not only with compounds that aren’t yet real, but do that against proteins that have never been physically screened before at all. Getting solid, actionable protein structures, though, is far more difficult than running through orientations and conformers for small molecules – as it stands now, screening modeled compounds against real protein structures can (as this paper shows) give you good results, although keep in mind that this report is pretty much at the edge of what we can do with current technology. But screening modeled compounds against modeled proteins runs a substantial risk of giving you a lot of noise. We’ll get there, but we aren’t there yet.
One other sobering note: this paper, as so many virtual screening papers do, starts off by mentioning the estimate for small-molecule chemical space of perhaps 1063 compounds. There’s room to wonder about that estimate, but since it’s cited here, let’s use it against the paper’s figure of 1.2 calendar days of straight processing time on 1,500 cores to get through the 138 million compounds in the second set. Extrapolating to the Big Set of Everything That Exists, that gives us 1055 days of processor time to screen the lot. Unfortunately, it’s only been about 5 x 1012 calendar days, more or less, since the Big Bang. So even if we allow ourselves more time by turning each day since the universe began into another universe’s current age worth of time (setting off a Big Bang every morning and waiting another 13.8 billion years until counting the next day) and then take each day of that unimaginable stretch and turn every one of those into another universe’s-age worth of time, then you would only need around a quadrillion of those extra-long intervals to get through the data set. Large numbers are indeed large.