and almost for free
Adrian Soto-Mota, Department of Physiology, Anatomy and Genetics, University of Oxford
Ox Pers Med J 2021; 1(1): 2-7
Adrian Soto-Mota, p2-p7
Download PDF • 147KB
Practising Personalised Medicine is not only tailor-making clinical prescriptions according with a patient’s genome. More frequently, it involves pondering individual benefits and their corresponding risks. It is almost a truism but, this is impossible if clinicians do not understand risks. There is a large body of evidence documenting clinician’s struggle with comprehending risks and, arguably, the most immediate and direct repercussion of this problem is obstaculising individual risk assessment and, consequently, personalising clinical interventions. More importantly, evidence also shows, clinician’s statistical skills can rapidly improve with short courses focused on addressing common cognitive biases and the most frequent misconceptions. I propose to summarise the aforementioned body of evidence and to outline the most frequent sources of error for individualising risks in current medical practice.
Personalised Medicine faces many obstacles. First, there is a reproducibility crisis (1), minorities are underrepresented in biomedical studies, and technological and economical barriers prevent the widespread use of genetic testing. To make things more difficult, our imperfect pieces of evidence are frequently misunderstood by those in charge of translating scientific progress into bed-side benefits (2). However, the latter is a particularly relevant because, even if scientific studies were perfect, statistical illiteracy could impede their insight from translating into clinical benefits and, because it is much easier to solve than Science’s reproducibility crisis or than achieving low-cost genetic tests for developing countries.
Personalised Medicine is in essence preferring “tailor-made” risk-benefits assessments over population-based recommendations (3). Therefore, it is impossible to incorporate its principles into every-day medical practice if health care workers misunderstand risk or misjudge benefits.
Some of the most relevant consequences of statistical illiteracy among clinicians are ethical. Patients are exposed to unnecessary risks and frequently undergo treatments that their clinicians would refuse if they found themselves in a similar scenario. Moreover, statistical illiteracy exacerbates the already disadvantageous relationship between population based health policies recommendations and individualised ones. What is more, statistical illiteracy does not only make doctors poorer providers of personalised care, it also makes them poorer scientists who will then, generate sub-par evidence themselves, perpetuating a vicious cycle of bad medical practice.
However, despite its complexity and widespread implications, the most important fact about clinicians’ statistical illiteracy is that it can be solved with cheap pedagogical interventions that require a few hours only (4). This text summarises the evidence on this topic, exemplifies how clinician’s statistical illiteracy impairs individualised risk-assessments and briefly describes the different strategies that have been proposed to overcome this issue.
HOW WIDESPREAD IS THIS PROBLEM?
Alarmingly, it is more likely that a clinician will misinterpret scientific data when assessing individual risks than the contrary. Studies on this topic report that regardless of their level of experience, most medics struggle when translating scientific data into their
practice (5, 6).
Of course, this is not an issue among health care workers only. Numbers are not intuitive to most human beings, and everyone from politicians (7) to rocket scientists (8) can make mistakes. The incapacity of incorporating numerical data into everyday decision making has even been proposed to be a form of illiteracy (sometimes named “innumeracy”) and is
highly prevalent in all sorts of professions (9).
It is worth highlighting that this is not a problem of ignorance. Most medics are familiarised with the concepts they misunderstand. Frequently, it is the framing of the data, and not the incomprehension of a particular concept, that results in an error in judgment when analysing numerical data (10).
WHAT’S CAUSING THIS PROBLEM?
This problem is not the result of intellectual limitations or laziness among medical doctors. On the contrary, medics are usually hard-working and very smart people. As usually occurs with complex problems, there are multiple origins and a complex interplay between many contributing factors, such as bad practices in scientific reporting (11), not enough time or emphasis on most medical schools’ curricula (12), and overwhelming workloads that leave clinicians with very little time to thoroughly examine a potential practice-changing scientific paper.
As a result, many healthcare professionals rely on evidence-summarising services or on the prestige of a given journal to judge the overall quality of a study. Besides being an incomplete and imperfect assessment of emerging evidence, this brings up problems for personalising medical prescriptions.
Headlines, abstracts and most summarising texts are more focused on describing the results and their implications on a study, and frequently lack a detailed description of the inclusion and exclusion criteria. Consequently, clinicians tend to assume the results are more generalizable than they are and deviating from the inclusion criteria can negatively influence
the clinical outcome (13). It has been recognised that many different parties (universities, health care systems, scientific journals, and clinicians) need to act simultaneously to address this problem (14).
EXAMPLES OF STATISTICAL ILLITERACY IMPAIRING ADEQUATE INDIVIDUALISED RISK ASSESSMENTS
Absolute risk vs relative risk
Using percentages instead of risks when expressing changes in probability magnifies the size of changes in probability. As mentioned before, it has been observed that clinicians are influenced by the way scientific results are framed (10).
Sedrakyan and Shih (11) followed top medical journals for two years and reported that the depiction of changes in risk is asymmetric. In other words, positive changes are more likely to be described using percentages or relative risks (proportional changes from baseline risk which are usually represented or converted into percentages), and negative changes are more likely to be described using frequencies and absolute risks (the number of registered events over the times they could have happened). What is more, they found that in a third of the cases harms and benefits are expressed using different metrics in the same paper.
Risk evaluation is context dependant. It is almost meaningless to find a genetic polymorphism that increases the risk of suffering a rare disease by 200% if the baseline risk is 0.0001%. In contrast, a lethality of 1/10000 could be considered unacceptable in the context of a widespread intervention (like a vaccine for example).
Perhaps not surprisingly, evidence suggests clinicians tend to overestimate the benefits of their interventions and to minimise their risks (15).
Sensitivity vs positive predictive value
The accuracy of different diagnostic tests is evaluated measuring their sensitivity and specificity. A frequently overlooked fact is that these numbers have very little to do with how clinically useful a test is because clinicians do not know if the result is a true positive or a true negative.
Clinicians evaluate people, not tests. They do not ask themselves “how accurate is this test?”; their question is usually “is my patient ill with this disease?”. The answer to the second question is known as the positive predictive value (the probability that a patient who tested positive is ill with the disease).
This subtle but extremely relevant difference that originates is what has been denominated “the base rate fallacy” (16). A very common example of this fallacy goes as follows:
For women in their 40s, the prevalence of breast cancer is 1.4%, the sensitivity of a mammogram is 75% and its specificity is 90%. Thus, in a group of 1,000,000 women, 14,000 have breast cancer and 986,000 of them do not have breast cancer. Of the 14,000 women who have breast cancer, 75% (10,500) will be correctly detected by the mammogram. However, of the 986,000 women without breast cancer, 10% (98,600) will be told they have breast cancer when they do not. Therefore, after performing 1,000,000 tests, there will be 10,500 true positive and 98,600 false-positive tests.
In summary, the more clinically useful parameter is the positive predictive value of a test. Nonetheless, it is frequently mistaken as the sensitivity of a test and it is heavily affected by the prevalence of the tested population.
Expanding the previous example: in women in their 40s with a positive family history of breast cancer, the prevalence of breast cancer is ten times higher (14%), the sensitivity of a mammogram is 75% and its specificity is 10%. The sensitivity of a mammogram is 75% and the specificity is 10% (unaltered). In a group of 1,000,000 women with a positive family history for breast cancer, 140,000 have breast cancer, 860,000 of them do not have breast cancer. Of the 140,000 women who have breast cancer, 75% (100,500) will be detected by the mammogram. But of the 860,000 women without breast cancer, only 10% (86 000) will be told they have breast cancer when they do not. Thus, after performing 1,000,000 tests, now we have 100,500 true positives and 98,600 false-positive tests. Thus, raising the prevalence of the disease in the tested population has raised the positive predictive value of the test as well.
To complicate things even further, a more frequently ignored fact is that the clinical utility of a diagnostic test is variable. Let us take an influenza rapid test as an example, which has a sensitivity reported to be around 50-80% and a specificity around 90-95%. In other words, 50-80% of patients with influenza will test positive and 90-95% of patients without influenza will test negative. Being a seasonal disease, the prevalence of influenza is variable throughout the year. Therefore, the positive predictive value of rapid influenza tests varies too. Hence the recommendation of limiting their use out of the influenza yearly season (17).
As mentioned before, evidence show clinicians with all levels of experience (15, 18) frequently mistake the sensitivity of a test with its positive predictive value, assuming the latter is constant, and frequently misjudge risks and benefits for their patients (19). This becomes more problematic when we consider that the current recommendation for two of the most common screening tests for adults is to individualise their approach (20, 21).
HOW STATISTICAL ILLITERACY EXACERBATES THE CONFLICTING INTERESTS BETWEEN PUBLIC HEALTH AND PERSONALISED MEDICINE?
Sometimes, the interests of the many are at odds with the interests of the one. For example, to adequately manage its limited resources the NHS frequently allocates its spending for individual treatments case by case (22). For example, the only approved treatment for paroxysmal nocturnal haemoglobinuria is a drug named eculizumab, which costs half a million dollars per year. Evidently, it is in the best interest of a patient with paroxysmal nocturnal haemoglobinuria to receive this drug, but it is perhaps in the best interest of the many to allocate these resources elsewhere.
These cost-benefit comparisons and discrepancies are better captured in a group of concepts that are frequently used as synonyms, even when they are not.
Efficacy vs Effectiveness vs Efficiency
The efficacy of an intervention is the effect observed during ideal conditions (highly controlled trials for example). Its effectiveness is the size of that effect in a “real-world” scenario. Its efficiency is its large-scale cost-benefit (23).
For example, the observed efficacy of the Pfizer vaccine in its Phase 3 trial was 90% (24) and the observed efficacy in Israel (the country that has vaccinated most of its population so far) is 60% (25, 33).
In contrast, a treatment may have an unfavourable cost-benefit ratio for a health system, but it is the best available option for a patient. For example, continuous glucose monitors improve disease control and quality of life for people living for diabetes, but it would be impossible for any health system to sponsor these monitors for all their patients during all their life (26).
Clinicians often need to ponder individual and population benefits in their decisions. However, there is an inherent bias towards “population convenient” interventions because most health systems limit the available treatments to those which are efficient even if there are more effective ones.
This does not mean that epidemiological studies, are misleading or unnecessary, they have different objectives and serve different purposes. Epidemiological studies allow to gather the data required to measure a treatment’s effectivity, effcacy and, to allow contextualising the relevance of the results observed in each clinical trial. For example, prices for a treatment can vary between countries or change over time, welfare system differences between countries can influence the safety or effectiveness of certain interventions (ambulatory chemotherapy for example).
In summary, obviating or not understanding these caveats impairs clinicians to provide personalised recommendations (27) or may make them impossible (for example, insurance-based systems may not cover all available treatments, or only some of them may be licensed in a country).
THE LACK OF PATIENT ORIENTED OUTCOMES IN CLINICAL RESEARCH STUDIES ALSO IMPAIRS PERSONALISED MEDICINE
It has been reported that clinicians often break the “golden rule” (treat others the way you would like to be treated). Treatment patterns among oncologists who suffered from cancer differ from the ones observed among cancer patients (28).
Additionally, many clinicians declare they would refuse most of the life-sustaining interventions that are prescribed every day (29).
In this case, clinicians cannot assess the risk/benefit ratio adequately due to the lack of patient-oriented evidence. When designing clinical trials, “hard” outcomes (mortality rate, days free of hospitalisation, etc.) are preferred as the primary outcome of a study over “softer” patient-oriented ones (patient satisfaction, self-reported quality of life, etc). An example of this problem is the regret rate of patients undergoing dialysis (30). Often, this life-sustaining therapy is recommended without mentioning the high proportion of patients who regret undergoing it.
Healthcare workers are treated differently because when choosing treatments for themselves, they can (at least subjectively) weigh in patient-oriented outcomes based on what they see in their practice (31). In contrast, when advising their patients clinicians cannot incorporate these factors into their assessments; they need to adhere to the available published evidence.
SOLVED IN A FEW HOURS AND ALMOST FOR FREE?
Of course, it is not reasonable to wait for all medical schools and scientific journals to adapt their syllabi and publishing standards to solve these problems. A first and likely sufficient step in the right direction simply requires showing medical professionals the most common types of mistakes and how to avoid them.
In their 2018 study, Jenny, Keller and Gigerenzer (4) demonstrated that a 90-minute training session in medical statistical literacy dramatically improved the performance (from 50% to 90%) in 82% of the participants, as evaluated using a standardised statistics test. (32) showed how easy-to-use and cheap-to-implement graphical aids improved informed consent and individualised risks assessments for surgeons.
However, these solutions are still far from being enough and much more research is needed on this topic. We ignore for how long these improvement lasts. Additionally, there is not a consensus about which specific statistical skills are necessary for all physicians and, it is likely different types of specialists would require to develop (and preserve) different skills. For example, clinical trials are more frequent in Internal Medicine Journals than in Forensic Medicine ones.
The same way all of us need to know how to choose wisely our food even if we do not become chefs, all clinicians need to know how to interpret the scientific evidence they read even if they will not become clinical researchers. What we do know for sure is that Medical boards and Medical schools need to strengthen statistical training and to evaluate it formally and periodically.
Personalised Medicine is ultimately about making well-informed individualised risk- benefit assessments. Statistical illiteracy among clinicians affects their ability to comprehend risks and estimate benefits, consequently impairing informed decision-making for their patients. Fortunately, cheap and quick interventions have shown promising results. It is necessary to acknowledge this problem as a direct obstacle to good-quality and practice personalised healthcare. The joint efforts from universities, scientific journals, and clinicians are needed to overcome these issues.
Baker, M. and Penny, D. (2016) “Is there a reproducibility crisis?,” Nature. Nature Publishing Group, pp. 452–454. doi: 10.1038/533452A.
Gigerenzer, G., Gray, J. A. M., Wegwarth, O., et al. (2011) Statistical Illiteracy in Doctors, Better Doctors, Better Patients, Better Decisions. Edited by G. Gigerenzer and J. A. M. Gray. The MIT Press. doi: 10.7551/mitpress/9780262016032.001.0001.
Goetz, L. H. and Schork, N. J. (2018) “Personalized Medicine: Motivation, Challenges and Progress.” doi: 10.1016/j.fertnstert.2018.05.006.
Jenny, M. A., Keller, N. and Gigerenzer, G. (2018) “Assessing minimal medical statistical literacy using the Quick Risk Test: A prospective observational study in Germany,” BMJ Open, 8(8), p. e020847. doi: 10.1136/bmjopen-2017-020847.
Wegwarth, O. et al. (2012) “Do physicians understand cancer screening statistics? A national survey of primary care physicians in the United States,” Annals of Internal Medicine, 156(5), pp. 340–349. doi: 10.7326/0003-4819-156-5-201203060-00005.
Whiting, P. F. et al. (2015) “How well do health professionals interpret diagnostic information? A systematic review,” BMJ Open. BMJ Publishing Group. doi: 10.1136/bmjop-en-2015-008155.
Giuliani’s Prostate Cancer Figure Is Disputed - The New York Times (2007). Available at: https://www.nytimes.com/2007/10/31/us/politics/31prostate.html (Accessed: January 27, 2021).
Metric Math Mistake Muffed Mars Meteorology Mission | WIRED (2010). Available at: https://www.wired.com/2010/11/1110mars-climate-observer-report/ (Accessed: January 27, 2021).
Innumeracy by John Allen Paulos - Penguin Books Australia (2014). Available at: https://www.penguin.com.au/books/innumeracy-9780141980133 (Accessed: January 27, 2021).
Naylor, C. D., Chen, E. and Strauss, B. (1992) “Measured enthusiasm: Does the method of reporting trial results alter perceptions of therapeutic effectiveness?,” Annals of Internal Medicine, 117(11), pp. 916–921. doi: 10.7326/0003-4819-117-11-916.
Sedrakyan, A. and Shih, C. (2007) “Improving depiction of benefits and harms: Analyses of studies of well-known therapeutics and review of high-impact medical journals,” Medical Care, 45(10 SUPPL. 2), pp. S23-8. doi: 10.1097/MLR.0b013e3180642f69.
Johnson, T. v. et al. (2014) “Numeracy among trainees: Are we preparing physicians for evidence-based medicine?” Journal of Surgical Education, 71(2), pp. 211–215. doi: 10.1016/j.jsurg.2013.07.013.
Nair, S. C. et al. (2014) “Generalization and Extrapolation of Treatment Effects From Clinical Studies in Rheumatoid Arthritis,” Arthritis Care & Research, 66(7), pp. 998– 1007. doi: 10.1002/acr.22269.
Gigerenzer, G. and Gray, J. A. M. (2011) Medical Journals Can Be Less Biased, Better Doctors, Better Patients, Better Decisions. Edited by G. Gigerenzer and J. A. M. Gray. The MIT Press. doi: 10.7551/mitpress/9780262016032.001.0001.
Anderson, B. L. et al. (2014) “Statistical Literacy in Obstetricians and Gynecologists,” Journal For Healthcare Quality, 36(1), pp. 5–17. doi: 10.1111/j.1945- 1474.2011.00194.x.
Bar-Hillel, M. (1980) “The base-rate fallacy in probability judgments,” Acta Psychologica, 44(3), pp. 211–233. doi: 10.1016/0001-6918(80)90046-3.
Balish, A. et al. (2013) “Analytical detection of influenza A(H3N2) and other A variant viruses from the USA by rapid influenza diagnostic tests,” Influenza and other Respiratory Viruses, 7(4), pp. 491–496. doi: 10.1111/irv.12017.
Anderson, B. L., Williams, S. and Schulkin, J. (2013) “Statistical Literacy of Obstetrics- Gynecology Residents,” Journal of Graduate Medical Education, 5(2), pp. 272–275. doi: 10.4300/jgme-d-12-00161.1.
Gigerenzer, G., Gray, J. A. M., Gaissmaier, W., et al. (2011) When Misinformed Patients Try to Make Informed Health Decisions, Better Doctors, Better Patients, Better Decisions. Edited by G. Gigerenzer and J. A. M. Gray. The MIT Press. doi: 10.7551/mitpress/9780262016032.001.0001.
Elmore, J. G. et al. (2005) “Screening for breast cancer,” Journal of the American Medical Association. NIH Public Access, pp. 1245–1256. doi: 10.1001/jama.293.10.1245.
Grossman, D. C. et al. (2018) “Screening for prostate cancer USPreventive servicestaskforcerecommendation statement,” JAMA - Journal of the American Medical Association, 319(18), pp. 1901–1913. doi: 10.1001/jama.2018.3710.
Insight: How the NHS decides what drugs it will and won’t pay for | The Scotsman (2019). Available at: https://www.scotsman.com/news/politics/insight-how-nhs- decides-what-drugs-it-will-and-wont-pay-1422674 (Accessed: January 27, 2021).
Definitions: Effcacy, Effectiveness, Effciency | Delivering Digital Drugs (D3) (2014). Available at: https://blogs.lse.ac.uk/ddd3/2014/12/11/definitions-effcacy- effectiveness-efficiency/ (Accessed: January 27, 2021).
Polack, F. P. et al. (2020) “Safety and Effcacy of the BNT162b2 mRNA Covid-19 Vaccine,” New England Journal of Medicine, 383(27), pp. 2603–2615. doi: 10.1056/nejmoa2034577.
Dagan, N. et al. (2021) “BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting,” New England Journal of Medicine, p. NEJMoa2101765. doi: 10.1056/NEJMoa2101765.
Continuous glucose monitoring (CGM) | Diabetes UK (2020). Available at: https://www.diabetes.org.uk/guide-to-diabetes/managing-your-diabetes/testing/continuous-glucose-monitoring-cgm (Accessed: January 27, 2021).
Wegwarth, O. and Gigerenzer, G. (2018) “The barrier to informed choice in cancer screening: Statistical illiteracy in physicians and patients,” in Recent Results in Cancer Research. Springer New York LLC, pp. 207–221. doi: 10.1007/978-3-319-64310- 6_13.
Smith, T. J. et al. (1998) “Would oncologists want chemotherapy if they had non-small- cell lung cancer?,” ONCOLOGY, pp. 360–365.
Gallo, J. J. et al. (2003) “Life-Sustaining Treatments: What Do Physicians Want and Do They Express Their Wishes to Others?” Journal of the American Geriatrics Society, 51(7), pp. 961–969. doi: 10.1046/j.1365-2389.2003.51309.x.
Davison, S. N. (2010) “End-of-life care preferences and needs: Perceptions of patients with chronic kidney disease,” Clinical Journal of the American Society of Nephrology, 5(2), pp. 195–204. doi: 10.2215/CJN.05960809.
Slevin, M. L. et al. (1990) “Attitudes to chemotherapy: Comparing views of patients with cancer with those of doctors, nurses, and general public,” British Medical Journal, 300(6737), pp. 1458–1460. doi: 10.1136/bmj.300.6737.1458.
Garcia-Retamero, R. et al. (2016) “Improving risk literacy in surgeons,” Patient Education and Counseling, 99(7), pp. 1156–1161. doi: 10.1016/j.pec.2016.01.013.
Israel Offers Glimpse of a World Vaccinated From Covid-19 - WSJ (2021). Available at: https://www.wsj.com/articles/israel-offers-glimpse-of-a-world-vaccinated-from-covid-19-11611662214 (Accessed: January 27, 2021).