Here’s a look from the D. E. Shaw research team at fragment binding, and even if you don’t do fragment-based drug discovery, it’s worth a read. That’s because the mechanisms by which fragments bind to proteins are most likely the fundamental ones by which larger molecules bind as well; this is the reductionist look at small molecule-protein interactions. So what kinds of interactions are they?
The group identified 489 fragment-bound protein structures in the PDB with good resolution, with manual inspection to remove the glycerols, etc., from the set. That process also cleared out covalently bound compounds, structures with more than one fragment interacting in the same protein binding pocket, and so on, and left 462 unique fragments and 21 that are bound to more than one protein. That’s a large enough set to draw some conclusions, but it should be noted that the proteins themselves (126 unique ones, with 168 unique binding sites) have hydrolases and transferases over-represented. That doesn’t mean that the interactions detected are any less valid, but you wouldn’t want to necessarily depend on a raw count of them (the authors normalized the data to deal with this problem). Most of the compound set has between 10 and 16 heavy atoms, most are uncharged, and most have cLogP values less than 2. Of the number of ring assemblies in the structures, phenyl, pyridine, pyrazole, thiophene, and indole account for half the numerical total, but of the 178 unique ring assemblies, 60% of them occur only once in the set.
What about binding? The large majority bury more than 80% of their solvent-accessible surface area when bound, which is what you’d expect from such small molecules that are able to display good potency for their size. The least-buried compound in the whole set is still at 50%. And if you consider polar surface area, a bit over 50% of the compounds bury over 90% of their polarity when bound, which makes sense, too – you’re not going to get noticeable binding at these molecular sizes just by random hydrophobic interactions alone. And indeed, 92% of the structures have at least one hydrogen bond to the protein, to a structural water molecule, or to a metal atom (such as a zinc in an active site, which is what you see with, in the classic example, sulfonamides bound to carbonic anhydrase). The record-holder is this structure, with 7 such interactions (!), followed by this one with six hydrogen bonds alone.
Considering the amount of time some of us has spent trying to get one measly hydrogen bond into our complexes, that’s pretty impressive. As experienced medicinal chemists know, those things are wonderful for picking up enthalpic currency in binding, but they’re extremely finicky (profoundly directional, for one thing). And as been often noted, an unsuccessful attempt at adding a hydrogen bond generally leaves you worse off than before, with a polar group that has had to shed its solvent interactions for no particular return. And that’s one of the principles behind a fragment-based approach – you try to start out with a core that already is displaying these features. The paper goes into detail on the varieties of nitrogen and oxygen atoms that participate in these hydrogen bonds from the fragment structures, and the sorts of groups on the protein on the other side of the transaction.
The biggest category of interaction outside of the hydrogen bonds are general arene-ring interactions, at 42% of the examples. That’s often called “pi-stacking” by Cro-Magnons like me, but it also encompasses edge-to-pi, arene-to-cation, and other categories. And the third most common category, found in 12% of the examples, is actually C-H hydrogen bonds, which don’t get nearly as much attention.
Overall, the paper recommends that if you want to generate new fragment libraries, that you stick with about a quarter of the heavy atoms being polar ones capable of participating in hydrogen bonding – in practice, that pretty much means nitrogen and oxygen atoms. Amides and alcohols are particularly useful, since they can both accept and donate in this context. You should keep things simple and not try to decorate the fragments with too many pharmacophores, because the simpler compounds have more geometric freedom to find productive interactions without running into additional clashes. The paper proposes that there must be a number of lesser-used heterocycles that could both participate in hydrogen bonds and in arene pi interactions, and they also suggest that seven-membered rings are under-represented, especially considering their unique conformational properties. But if you’re interested in having a rough first-pass set of fragments, though, to assess druggability and the like, you could do far worse than picking as many of the 462 compounds in this paper’s set as you can get. They surely have some structural biases from the earlier days of commercial fragment libraries, but they also have a proven record of binding to proteins.
The tricky part of putting new or unusual fragment structures into your library is that you have to be able to functionalize them later on, and ideally in several possible directions. I ran into that myself a few years back – a rather underexplored ring system came up (cinnolines) that really bound quite well to the protein target. But the number of known methods to make a variety of functionalized cinnolines is limited, so you have to decide if you want to embark on a discover-new-chemistry project in order to embark on the develop-a-fragment project in turn.
Interestingly, considering that this paper is coming from one of the most high-powered modeling groups in the business, the point is made that the number of water interactions in the studied set presents a problem. Current modeling software does not handle these things as well as it does some other categories of interaction, so virtual fragment screening efforts are going to have significant blind spots. This goes, of course, both for more static docking approaches and for molecular dynamics. Another computational issue is the ability to find and picture nonbonded interactions – as it stands, there seems to be too much handwork involved with querying the PDB for such things.
Overall, the paper is both a hymn to fragment-based drug discovery and (less directly) one to crystallography. It’s X-ray data that underpin the whole thing (and underpins the vast majority of fragment-based drug work in general). That really is the ground truth of this approach, for all the known limitations of crystallographic data – not modeling, not simulations. And the more of it we can get, the more we’ll understand what we’re doing.