Quantcast
Channel: In Silico | In the Pipeline
Viewing all articles
Browse latest Browse all 98

Levels of Data

$
0
0

Here’s a brief article in Science that a lot of us should keep a copy of. Plenty of journalists and investors should do the same. It’s a summary of what sort of questions get asked of data sets, and the differences between them. There are six broad data analysis categories:
1. Descriptive. This is the simplest case, where you’re just summarizing a data set and describing the totals in it.
2. Exploratory. The next step – you search through the descriptive analysis looking for trends or relationships, with which to develop new hypotheses. No guarantees, of course – you’ll have to confirm these with more work.
3. Inferential. This one looks at an exploratory treatment and tried to determine whether those trends are likely to hold up. As the authors say, this is probably the most common statistical workup in the literature – better than randome chance, or not? But it can’t tell you why something is happening, of course.
4. Predictive. An inferential study is necessarily done on a large sample (well, it had better be, at any rate, if you’re going to infer with much confidence). A predictive analysis uses some subset of the data to predict how individual cases will go. The example from drug development would be the use of biomarkers to predict whether a given patient in a trial will respond to some new investigational drug.
5. Causal. At this level, you’re trying to see what the magnitude of changes are across the system when you start changing things – what often gets called the “tone” of the system. What are the most important variables, and what has little effect on the outcome?
6. Mechanistic. With the information at the causal level available, now you can really get down to the nuts and bolts. Change A causes effect B, through this detailed mechanism. We don’t see this as much with anything involving biology – there always seem to be exceptions. This is more the realm of engineering and physics, although a lot of time and money is going into trying to change that.
It’s only at the causal and mechanistic levels that you can start doing detailed modeling with confidence. That’s where everyone would like to be with computational binding predictions, but we don’t understand them well enough yet. And think how far we have to go to get predictive toxicology to those levels! We can do that sort of thing on a small scale – for example, saying that a compound that (say) inhibits angiotensin-converting enzyme, to this degree, and with that average half-life in vivo, will be expected to lower X% of a random population’s members blood pressure by at least Y%. That’s after decades of experience and data-gathering, keep in mind.
But that’s not aeronautical engineering. Those folks don’t tell you that wing design A will provide at least so much lift on a certain percentage of the airframes it gets bolted on to. Nope, those folks get to build their airframes to the same exact specifications, not just take whatever shows up at the factory needing wings, and those airframe/wing combinations had better perform within some very tight tolerances or something has gone seriously wrong. This is just another way of stating the “built by humans” difference I was talking about the other day.
So some of that data analysis hierarchy above is, well, aspirational for those of us doing drug research. The authors of the Science article are well aware of this themselves, saying that “Outside of engineering, mechanistic data analysis is extremely challenging and rarely achievable.”. But that level is where many people expect science to be, most of the time, which leads to a lot of frustration: “Look, is this pill going to help me or not?” We should remember where we are on the scale and try to work our way up.


Viewing all articles
Browse latest Browse all 98

Trending Articles