When anyone, any institution, makes a claim, you ought to be in the habit of telling them, “I love evidence. Show me the evidence!”

There are so many claims bombarding us these days. There are so many reports of studies being contradicted by other studies. There are so many con artists out there using bits and pieces of studies that fit their snake-oil sales pitches. What’s a normal, intelligent person to do?  What are we to believe? Especially, what should we do where our health and longevity is at stake?

ANSWER: Demand to see the evidence!  Then, know a little about what constitutes good versus bad evidence.

Below, I will teach you what makes a “evidence” evidence. You will learn the more important aspects about what makes a study a valid study. Statistics is a very complex subject, often best left to mathematicians, but you and I both NEED TO KNOW some of the basics to avoid being conned.

How important is it that we laypeople learn enough to defend ourselves?  VERY important. One survey of research findings concluded that even with peer-group reviews, which ought to screen out the bad reports, there is a very large number of fake and inappropriate studies being published. Let me repeat that, approx. 57% of studies have arrived at  FALSE conclusions.

If scientists on peer committees are being fooled, it’s likely you and I will be, too. BUT, we can do a reasonability test of our own and at least screen out SOME of the more obvious studies that might mislead us. Spend a few minutes to study this “Quick-Learn” course and to print off the list to use as your checklist.

Meanwhile, CHECK  BACK OFTEN. There’ll be lots of updates and new hints to strengthen your false-evidence busting skills.


The list below identifies only SOME of the more common issues that are often misunderstood within scientific, research studies.

  1. “what our customers say” . Such a statement is called an anecdote. Another term for word-of-mouth statements is “testimonial”. The purpose of an anecdote is to appeal to your emotions, not your logic. Anecdotes make you emotionally receptive, gives you that warm feeling of being safe. Instead, when you hear an anecdote being offered as proof, think, “WARNING!!!   WARNING! DANGER!!!”.  What worked for someone else, may not work for you. An example, someone eating peanuts may thrill at being healthy or even full when peanuts are eaten as a snack. But, you may have a peanut allergy and eating that same product because someone else says it works for them, just might kill you.  Anecdotes are NOT PROOF.  You are likely being conned.  “He said. She said” is not research, not evidence, not science.
  2. “P-value” number is too risky in and of itself.  The P-value lends that air of authenticity to reports. Six Sigma practitioners are sometimes taught that if a P-value is less than P=.05, the result of the matter is likely valid, or, as expressed in statistical jargon, “there is a statistically significant difference”. What this means is that there is a 5% chance, or the odds are 20 to 1. If your life is at stake, if you might unwittingly cause yourself cancer, or you might die, do those odds protect you sufficiently?  No. A P-value of .01 means 1% or 100 to 1. Learn to ask yourself, “Is that SIGNIFICANT ENOUGH?”
  3. “P-value” number with no “range”.  Studies of people, for example, since no two people are alike, respond to therapies in different ways. Outcomes  vary across a wide range. Some have no response. Some die. But the majority will have almost similar responses. This suggests a BELL CURVE distribution. If you look at a BELL CURVE you will see “tails” before and after the hump. From the tip of one tail to the tip or end of the other tail is quite a distance. The P-value ought to be stated as being somewhere in that range or distance. Look for something that says, the P-value has a “confidence interval” of  this percentage to that percentage.  Here’s a diagram of a BELL CURVE, also called in statistics a “normal distribution”. Notice it has “tails” and a “hump”.      BELL CURVE
  4. statistical significance interpreted as clinically significant when it isn’t.  Statistical significance as we stated above by discussing “P-Value”. The numbers may show importance statistically. However, there may be little to no clinical significance. Let’s assume a study proves statistically that people with a certain disease who take medicine “XYZ”, cure their symptoms but still died in the same amount of time as anyone with the disease who is not taking the medicine. So, number-wise, the medicine is a raging success, however, clinically, the medicine is useless.
  5. clinical effectiveness doesn’t discuss side-effects. If we are talking medications, or supplements, even if the medications or supplements “work”, that is meet the objectives, what of the risks it may induce, the cost, the length of time, or the difficulty of implementing or managing the therapy using it.
  6. studies with murky objectives, (hypothesis).  It is not clear what the study is trying to validate or prove.
  7. the use of the wrong data.
  8. lack of, or too few follow-ups, check ups.
  9. duration was too short.
  10.  too few people included, or animals, if it is a study done on animals, (called sample size).
  11. risks are not addressed. Risks are important. There are relative risks. An example of a relative risk: 1% chance of death during heart by-pass surgery versus 95% chance of death if one does not have the bypass operation. There are absolute risks such as dying if you are bitten by a poisonous snake and there is no intervention available. In the case of therapies, supplements and other medical studies, risks ought to be reduced and outcomes improved either for relative or absolute outcomes.
  12. sample participants were too homogenous or not homogenous enough to be representative of the nation or other “universe”. Type and characteristics were not appropriate.  A calcium supplement benefit study used very obese people who were not used to exercising. Weight loss occurred and was attributed to calcium supplementation, but it was more likely that the researchers forced and monitored the heavy exercise program imposed on study participants.
  13. dropouts are ignored.  When dealing with large groups of people, large sample sizes, it is going to be difficult to keep every participant motivated to keep participating. Does the study make adjustments, or even acknowledge the incidence of people dropping out of the study and in doing so makes adjustments to the study computation?
  14. randomness is missing or compromised. How was the sample  (above)  selected?
  15. risk factors missing or miscalculated.
  16. using mean instead of median. Definitions first. Mean is another word for average. Mean, (average), add up all the numbers and then divide by the total count. Whereas MEDIAN is the CENTER number after you arrange all the numbers from highest to lowest, or lowest to highest. Why is this difference important?  If the data is skewed, that is, does not follow a nice-looking BELL curve like the one in the illustration above, then median might be a better choice of number than an average. Whereas the average, the mean, is more useful for locating the center point, that top-most point of the “hill”, on a normal distribution, which, as you know by now, is another name for Bell curve distribution of data.
  17. no “control” group. THIS IS A COMMON ERROR. One group must be the test subjects being tested with the new supplement, medicine or device. The other group must remain UNAFFECTED and NOT using whatever is being tested. And remember both groups must be very large. Think in terms of 2,000 or more!  The larger each group is the more likely it is to also represent the rest of the population to which the conclusions will be applied.
  18. when a sample and a “control” group is being used, data goes awry when each group is not homogenous on the majority of the factors or variable. Same rule for the “sample” group. Let’s use animal testing for our example. The sample group ought to be all mice of about the same health status, age, size and type. Not a mixture of mice and monkeys. Likewise, for the control group. Also, it would not be appropriate to compare a sample group of mice to a control group of dogs.
  19. where the sample group is subsequently divided into smaller groups, called “blocks”, each block should be relatively the same in all factors, including block size.
  20. failure to adequately specify the influences of geography, genetic history, other factors that help to define the sample selection and participation criteria.
  21. affect from external or other internal factors on the item or sample being tested. This may be confusing so let me clarify with an example of curing the common cold whereby the sample group may be given 25,000 mg. of vitamin-c upon first signs of an approaching cold virus yet be allowed to rest 24 hours and to drink plenty water along with extra-strength aspirin, and eat plenty of freshly made chicken soup. The control group is prohibited from vitamin-C, sleep, water, aspirin and chicken soup. Could you believe, in this instance, a claim that vitamin-C cures the common cold faster? Nope.
  22. drug interactions may have significant effects and distort findings if not identified. Studying maximum heart rates for athletes may be affected if the researcher fails to identify and extract or compensate for those athletes who are taking Beta-Blockers which limit maximum heart rates.
  23. failure to identify factors which may skew the data. Comparing the length of time people can sit in the sun before suffering burns may be skewed if the sample consists entirely of Eskimos who are ill-adapted to sunbathing compared to people who live in equatorial regions. If the Eskimos were combined with those from the Equatorial zone, the outcome of the data would not be representative of either group.
  24. studies funded by outside parties with vested interests in the outcomes are often likely to arrive at conclusions that best suit the company which is funding the study.
  25. using the wrong devices to measure the data. An example might be taking temperatures of children when using a thermometer that is not sufficiently sensitive enough to measure tiny variations.
  26. measuring the wrong data.  One major study of obesity got caught in the vice grips of the press when it was discovered that it was measuring calories burned by exercise rather than calories consumed from sugar-laden drinks then using that to prove that the blame for obesity rests entirely with insufficient calories burned through exercise.
  27. studying or measuring one variable or factor while ignoring other variables or factors that can moderate, destroy, or increase the variable or factor being studied. Measuring the degree of compliance of children on a school bus without identifying the influence of cameras monitoring the children, the attitude or experience of the bus driver, distance of the trip, influence of the school on all children at that school, the presence of one or more teachers on that same bus.
  28. data ignored or left out of the study is as important as data studied. Ask what’s NOT included and be sure it makes good sense for it to be removed or missing. The problem with unexplained and un-accounted, missing data is that other researchers may not be able to replicate the study and, therefore, will not be able to verify the study in question.
  29. weak or inappropriate survey questions can result in poor quality of data or result in failure to extract data that can influence the outcomes of the study.
  30. failure to screen out prejudice, bias, or selective perception. People generally see what they want to see. Respondent or study participants who are not properly and thoroughly vetted to screen out participants who have those biases, or the failure of researchers to account for those biases can taint study findings. Likewise, researchers themselves may introduce biases and preconceptions by see things that aren’t there, or may interpret data in ways that other more objective observers or reserachers would not.
  31. the study results may be reported in ways that confuse or mislead. Charts may be exaggerated in one dimension to make it appear that the data is more erratic or less erratic, etc.
  32. how the results are spread across the sample(s).  For this point, I’d have to tell you to take the discussion to a qualified statistician since there is no way to make this particular point easier to understand. You do need to know some mathematics and some fundamental statistics. But I’d like to introduce you to the topic anyways. When looking at the sample data, if you see a relatively smooth curve, called a “histogram”,  that looks like a BELL that is resting comfortably on the ground, a BELL SHAPE, then you are looking at data that statisticians term, a BELL CURVE distribution or data that is distributed, (spread across), in a “normal” manner, or simply, “normally distributed”. Such data is said to be “PARAMETRIC”. This is common when reviewing medical studies, but is not always the case. Now here is where it gets very technical. When examining normally distributed data, “parametric” data, there are certain “TESTS” that can be used and I’ll list them here just for the record and without getting into too much detail: t-tests, f-tests, variance testing, x^2, squared regression, to name the typical ones. However, when data is not so nicely graphed around such a pretty Bell curve, it is called “skewed” data, also called in statistical circles, “NON-Parametric” data. For non-parametric data, other test are used: non-parametric regression, Spearman, Mann-Whitney, Two-Way Analysis, Kruskall-Wallis, Wilcoxon, etc. Best to run your data review by a qualified statistician who has no vested interests in the study.
  33. adjusting the data mathematically, but inappropriately. Sometimes data can be, needs to be, or shouldn’t be “adjusted to fit a more normal distribution. This is termed “transformed”. Data that is lopsided, to use a non-statistical term, may lean too much to one side. To bring that back into a more normal-looking Bell Curve, sometimes researchers apply various mathematical techniques such a logarithms, reciprocals, or square root. This gets far too technical and too deep into the mathematics for any non-math major. So, if you suspect that, that is what is going on with the data, have it examined by a qualified statistician who has no vested interests in the outcomes.
  34. mathematics has many “numbers” and “variables. Be wary of how the researchers have used the numbers. If they justify their conclusions by using tests other than the ones mentioned above, you may want to call them on it. Either you are looking at a con or you need a qualified statistician who can determine why the researchers are deviating from using typical sample data testing. The study itself ought to be considerate enough to provide you, the reader, all those details plus clearly cite the source of such formulae and rationale.
  35. did the study say or promise one thing but go off on a tangent and do or study something slightly different?
  36. did the study methodology cheat by purposefully or unwittingly fix a certain outcome early in the study which is guaranteed to cause a desired outcome to happen in the conclusion? In other words, are you certain the study is random, fair, untainted, not influenced by anyone or any tools along the way?
  37. if multiple data points being intermingled or confused, lets assume data “A”, “B”, and “C” are being collected on each person, is the study ensuring that those data points refer to the individual being tested and not to another individual being tested? Person #19’s A value should not be “paired” with Person #2’s C value.
  38. when testing medications, and especially uncontaminated, genuine SUPPLEMENTS, researchers sometimes fail to look for opposite reactions. For example, though Vitamin-C is expected to lessen cold symptoms or duration of colds,  researchers should allow for Vitamin-C to possibly make cold symptoms worse and possibly extend the duration of the cold. That is called a ‘TWO-TAILED’ test. The study should specify if it is using a two-tailed test.
  39. information excluded because it is too far afield. How are the researchers determining that it ought to be excluded rather than considered? There is a statistical term for data points that appear to be way out in left field, “an OUTLIER”. Review the decision logic outliers. Almost every study has outliers. Do they even mention those outliers? Does that make sense to eliminate them? Was the study size too small to capture enough outliers so such decisions can be considered. Everything about the study needs to be carefully explained, including data included and data excluded. Often failure to mention data excluded and the logic of doing so is what prevents others from validating and replicating those studies.
  40. correlation and regression confuses many researchers. To explain the difference, correlation suggests that two or more variables, which could be factors, characteristics, actions, or events, are linked somehow such that a change in one “CAUSES” a change in the other. For a rough example, all other variables, (factors), remaining unchanged, if a person drinks lots of sugary soda each day, that person will get fatter. We can ROUGHLY say that sugary-sodas and obesity are LIKELY, PROBABLY,  “correlated”. Regression, on the other hand refers to the use of correlated variables to PREDICT an event in the future but includes a qualification of the degree of likeliness, called probability in mathematics. So we can run experiments making a very large sample of people fatter by drinking sugary sodas and look at the results. If working with a nice-looking “BELL CURVE”, and recognizing that some people will become extremely fat, others will not gain one iota of fat, and most will gain a certain amount of weight, we would then conclude that “by drinking sugary sodas, there is an x% probability that you will get fatter.”  So, we need two instances to be present when looking at regression analysis. We need a proven connection between variables, factors, and we need a nice, smooth-looking BELL curve.
  41. wrong use of regression calculations. If the variables are directly linked such that a change in one variable causes a change in the other variable, so, the data is “paired”, then, regression is not the properly tool to use. Instead, look above and re-read parametric and non-parametric data. If variables are “linked” and change together, some other test ought to be used, such as the “paired t test”.
  42. failure to identify the PROBABILITY, the chances, of a result occurring when using a regression analysis.
  43. bad assumptions being made about what causes what. That link between cause and effect is not always obvious or logical.  Large sample experimentation is required along with appropriate statistical techniques. The cause-effect must be reproducible by other researchers. And, especially, the association must demonstrate sound reasoning.
  44. study terminated prematurely for whatever reasons.
  45. study continues far beyond original design parameters to try to dig up more proof of significance because at the design parameter timeline there was no proof of significance.
  46. misused standard deviation.  Though when appropriately used, standard deviation is useful for describing the variability of all the data when the data follows a BELL Curve, that is, a normal distribution.
  47. CHECK BACK SOON…MORE POINTS WILL BE ADDED IN THE NEAR FUTURE.

END NOTE:

  1. Which type of study is better, a “reductionist” or an “epidemiological”?  It depends on the study hypothesis, that is, what the study is designed to prove. An example of an epidemiological study is published in the book, THE CHINA STUDY, in which the sample sizes are so large as to be considered the entire population rather than a typical sample. But the study will encompass many factors and not just one or two factors. Whereas reductionist studies typically are used to substantiate the use of medications based on results of studying one or few factors while excluding many other potential influences. Which is better?  Sorry…no easy answer. It depends.
  2. There are a number of epidemiological studies potentially available: intervention, ecological, cases studies, and data sectional.

 

REFERENCES:

  • “Statistical Pitfalls in Medical Research” by Nyriongo, Mukaka, Kalilani-Phiri. MALAWI MEDICAL JOURNAL, March 18, 2008. pp. 15-18.
  • “Statistics for the non-statistician. I: Different types of data need different statistical tests”. The BMJ. August 9, 1997. pp. 364-366.
  • “Statistics for the non-statistician. II: “Significant” relations and their pitfalls.”  The BMJ. August 16, 1997. pp. 422-425.
  • “Biostatistics Primer: Part 1”, by Overholser, Sowinski. NUTRITION IN CLINICAL PRACTICE. December 2007. pp. 629-635.
  • “Biostatistics Primer: Part 2”, by Overholser, Sowinski.  NUTRITION IN CLINICAL PRACTICE. February 2008. pp. 76-84.
  • “Critical Appraisal of Scientific Articles” by du Prel, Rohrig, Blettner.  DEUTCHSCHES ARZTEBLATT INTERNATIONAL. 2009. pp. 100-105.
  • “Beliefs and evidence in changing clinical practice” by Grol. The BMJ. August 16, 1997. pp. 418-421.
  • “Understanding measures of treatment effect in clinical trials”, by Akobeng. ARCHIVES OF DISEASE IN CHILDHOOD.  2005.  pp. 54-56.