This was first posted: here

One key to successfully progress a drug discovery project is to make first-rate decisions (hopefully) based on unambiguous data. This is not trivial since our scientific problems are often very complex and data can be fuzzy. In drug design we try to approach this uncertainty by being rational. It is however sometimes forgotten that our rational approaches may not be that rational after all – decisions may well be based on personal preferences and intuitive biases…. perhaps unconsciously made on biased data.

In their great paper “Judgment under Uncertainty” the behavioral scientists Kahneman and Tversky elaborated on decision making and how people deal with uncertain events. It was shown that we tend to push ahead with confidence even though lacking enough information (to make informed decisions). There’ also the entertaining (?) and somewhat controversial notion that scientific facts can be constructed in a tribe-like fashionin laboratory settings. In drug discovery, a number of psychological biases that pose risks to good decision making was recently highlighted by Segal and Chadwick.

“The Halo Effect” is a specific type of confirmation bias that makes us perceive someone (or something) favorably because of one very positive quality in that person/thing. That is, one good feature lends its attractiveness to other properties of a person’s character: “that Hollywood actress is beautiful, so she must also be clever/happy/fill in the blank“. We tend to make attributions based on other data that we for some reason believe are reliable, and it can cloud our judgment and infer with decision-making.

David Beckham is looking good and few in the world can kick the ball like him. He’s likable and extremely popular. It is easy to think that he’s all good. Nonetheless, he has recently been accused of triggering a halo effect around unhealthy drinksby endorsing Pepsi. Is there also a halo(gen) effect in para-substituted phenyl rings? They are (on average) metabolically more stable than their ortho and meta regioisomeric partners and (perhaps therefore) the most popular regioisomer among medicinal chemists. Yet, para-substitution is (on average) the worst regioisomer with respect to hERG binding and aqueous solubility.

Dean Brown, a colleague of mine, recently discovered an unexpected biasin most (if not all) drug databases by performing exhaustive population analysis of phenyl-ring substitutions. It could be concluded that para-substitution are significantly more often occurring than meta and ortho. In attempt to gauge AstraZeneca medicinal chemists personal preferences regarding aromatic substitution pattern we set up a survey. The result was clear – the primary choice was indeed para. The two main reasons for this preference were: (a) para-substitution provides better protection against metabolism than ortho/meta; (b) the para-position was most likely to boost potency. The first reason was confirmed true whilst the second not.

There could be many reasons for this bias, such as the Topliss work that promoted para-substitutions, a range of possible DMPK (solubility, metabolism) and Safety (hERG) property differences, as well as ligand-binding effects (potency). Other possible factors are synthetic accessibility, cost differences for chemical reagents and historically different design strategies (classic pharmacology vs. target-based design). All of these were scrutinized and it was concluded that the para bias could not be attributed to one single factor. What we do know, however, is that personal preferences and subjectivity still play a pretty big role when selecting reagents for syntheses. In fact, a range of possible preconceptions was recently highlighted when the Dean Brown article was inthepipelined (it’s verb right?). Not to mention that luck influences most everything of what we do.

Why does this matter? Using skewed molecular databases can be risky if one is not aware of any uneven distributions. For example, if there are more para-substituted phenyls in a database than ortho/meta there will be more para hits (from a screen) out of sheer probability. This could in turn lead the inexperienced scientist to assume that the screened target favors para-substitution. Luckily there are remedies – statistical approaches combined with cheminformatics can be used to avoid these issue.

Relying on our intuition is often effective, when making decisions in situations of uncertainty. However, failing to understand the underlying reasons can lead to systematic and predictable errors as the one just described (ease of synthesis is not the reason for the para bias). We hope that our analysis will lead to a broader awareness of unevenly populated databases, a better understanding of how to deal with them to improve our judgments and decisions in medicinal chemistry. To learn more about this, a biased suggestion would be to read our article to see if any of your potential prejudices (regarding phenyl substituents) are supported by data.