You hear medicinal chemists talking about the “magic methyl”, the big effect that a single CH3 group can have on potency or selectivity. Here’s a new J. Med. Chem. paper that shows one in action.That structure looks like a kinase inhibitor if anything ever did, and so it is. But small changes to it can make a big difference. As you see from those assay numbers, adding in one (R) methyl group hurts MERTK activity by six-fold while making no real change against TYRO3. But AXL activity goes down eight-fold, and FLT3 activity by something like 80-fold.
The effect is seen with a number of different variations around this core, and it’s obviously because there’s something that FLT3, in particular doesn’t like about having a methyl there. Specifically, it’s a methionine in MERTK that’s a leucine in FLT3, and the branched chain of the leucine just doesn’t leave enough room for a methyl in the ligand. As you’d expect, if you put in a methyl of the opposite stereochemistry, there’s a big effect as well – that (S) methyl compound is 22, 560, 13, and 543 nM on the four enzymes, so you can see that there’s an even more severe clash in all of them except TYRO3, which doesn’t care one bit.
I bring this up as an illustration of what med-chem is like, pretty much all the time. There will be parts of your molecule that are pretty insensitive – you can hang all sorts of crap off them, and if the project goes on long enough, someone will. Those, as you’d imagine, are generally the parts sticking out into solvent, which is where you’ll see people desperately sticking on methoxyethoxy chains or whatever to try to make the compound less of a brick in its solubility and pharmacokinetics. And there will be parts that you really can’t touch at all. I’m sure that if you start yanking those nitrogen atoms out of the structure above in favor of good ol’ carbons that it’ll start losing activity with great speed and thoroughness, for example.
It’s the in-between parts where we make our livings. That’s where you exploit differences in amino acid side chains in the protein binding sites, where you tickle bound water molecules or pick up pi-interactions. And these effects have always been a tricky thing to handle in modeling the activities that result, and they’re going to be a tricky thing for machine learning, too. “Activity cliffs” are a feature of most structure-activity relationships, and sometimes you don’t know that they’re there until you walk off of them (and sadly, we have no Wile E. Coyote physics to save us). Abrupt nonlinearity is not an easy thing to work into a model – if you don’t watch it closely, said model might lose its silicon mind a little bit trying to fit that stuff into a coherent picture. That’s because there may not be a very coherent picture: most projects end up with something like “This works, but not too much of it, and only if you have that thing over there, but if you have something in that other spot it kind of cancels that out, well, except if you have this other thing, but when you. . .”
And that’s just trying to get a handle on the primary potency numbers. When you add in cell membrane permeability, pharmacokinetics, metabolic stability, and all the rest of it, well, things can get pretty wooly. For instance, the authors of the paper above ended up switching that piperazine to a homopiperazine to get better half-life numbers in their rodent dosing. Some machine help in all this would be welcome, but I’m not sure when we’ll see that arrive. The data sets are not always large enough to be useful for machine learning, particularly when doing a multifactorial optimization like this, and they’re definitely not large enough earlier in a project. And we don’t have the knowledge to just model everything de novo, either. If you want to know how selective your kinase inhibitor is against the other kinases, you don’t sit down and model them all – you send them out to a screening panel to get the actual numbers. And if you’d like to know what other targets your compound might hit, well, good luck. There are broader screening panels, of course, that will run your compound through dozens of assays. But there are a lot more targets out there that we know little or nothing about, and the only way you’ll know if one of those is doing something off-kilter is to dose rats. Or, God help you, humans.
So what I’m saying is that medicinal chemists, for all the talk about machine learning and AI, are not going to be replaced in this part of their job – the main part, mind you – any time soon. None of us, machines or humans, are smart enough for that, or not yet.