Posted by larryhoover on November 9, 2013, at 12:50:16
In reply to Re: Irving Kirsch, placebos and antidepressants » larryhoover, posted by doxogenic boy on November 6, 2013, at 17:57:31
Darn, I feel an essay coming on.
First, I'd like to examine the measurement of depression severity in clinical trials. The most commonly employed is the Hamilton Depression scale, symbolized as the HAM-D, or HRSD.
It's administered in an interview format, where the interviewer asks the subject to assess their symptoms in seventeen different variables. A numerical value is assigned to each response, and the sum of the values is the HAM-D score.
http://healthnet.umassmed.edu/mhealth/HAMD.pdfThe result is always an integer. You might have a 14 or a 15, but never a 14.5. But what exactly does that number represent? Although it's used to approximate the severity of an individual's depression, are two people with equal scores equally depressed? And if they both change by two points, have they improved or declined by equal amounts? The fact is, we have no way of knowing the answer to those questions.
That's because the results of the HAM-D are ordinal data. We can rank the scores of individual results, but that's about all we can say.
I'll give you another example of ordinal data, the results of a marathon race. We can award first, second, third places, and so on, but we cannot determine what the time differential might have been between these ranked results. Only other measures can provide that insight.
The problem for psychology/psychiatry is that there is no other measure available to us, to assess the validity of what we've measured. We may employ other ordinal scales, such as the Montgomery-Åsberg Depression Rating Scale (MADRS), or the Beck Depression Inventory (BDI) (there are many others out there, but these are the most common alternatives to HAM-D), but we still can't get around the fact that there is no external validation for any of them.
One of the desired outcomes of any double-blind clinical trial is to provide evidence for the effect of an active treatment (such as a drug) when compared to a placebo treatment. The problem is that ordinal data only permit the most basic of summary statistics.
From a position of statistical rigor, you're only permitted to determine the median value (the value right in the middle of the ranked measurements, e.g the 50th value in a list of 100), and the mode value (the one most frequently measured). The only permitted graphical representation of the data is a histogram (bar graph, or frequency line plot).
You're not allowed to calculate means, standard deviations, confidence intervals, statistical significance.....and yet, every clinical trial does exactly that. It's kind of like a gentlemen's agreement, "We've got to try and pull something meaningful out of these data, and we'll use these summary statistics to do it, even though we probably shouldn't."
For more on the types of data, and the permitted statistics, here's a nice summary on wiki: http://en.wikipedia.org/wiki/Level_of_measurement
Probably the most meaningful of a derived statistic for a clinical trial is an individual's difference score, i.e. the change in HAM-D score between the time a subject began the clinical trial, and its completion. I'll get to other concerns about confounding variables later, but at least the same subject is being compared to himself, over time.
I want to talk about what Kirsch did, now. Here's a link to the PLOS article he wrote: http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0050045
I will quote from his work, which will always appear in quotation marks. Everything else is my commentary. One of the big issues for me is methodology. If you don't assess how the author(s) define their approach, you can't properly assess the results. Here we go:
Kirsch used only drug pre-approval data, in the period from 1987 and 1999. He remarks, "This strategy omits trials conducted after approval was granted." Why would he want to do that, as his stated premise was to determine 'clinical efficacy', which is more or less defined as results expected in the general population?
"In addition, the FDA independently reviews the clinical trial methods, statistical procedures, and results. The FDA dataset includes analyses of data from all patients who attended at least one evaluation visit, even if they subsequently dropped out of the trial prematurely. Results are reported from all well-controlled efficacy trials of the use of these medications for the treatment of depression. FDA medical and statistical reviewers had access to the raw data and evaluated the trials independently." I'm at a loss here. Is he impugning the FDA? I'll come back to this point later.
"The use of other psychoactive medication was reported in 25 trials" Out of a total of 47, by the way. So, for more than half of the data, placebo participants may have been exposed to active medication, after all. Despite this obvious potential confound, it does not enter into any of his statistical analyses, and I don't recall it ever being mentioned again.
The next concept under consideration involves standardization. You have to do that when different measurements are used in different studies. I don't know all of the parameters that Kirsch standardized, but he certainly collapsed 4,5,6 and 8 week study durations together, collapsed across patient groups (inpatient, outpatient, elderly), and probably the HAM-D scores themselves (Hamilton himself has published 5 versions of his 17-question test, while others have modified it to include up to 29 questions). In any case, when you standardize data, you reduce the variability. It's inherent in what you're doing, as you're doing nothing more than deciding to ignore some differences in the data. And yet, Kirsch assumes that standardization had no effect on the variability.
The capital S in the next quote represents 'Standardized'.
"In total, SDcs were known for 28 groups, could be calculated from other inferential statistics in nine comparisons (18 groups), and were imputed in 12 comparisons (24 groups) (47.38%) [13,14]." It's further described in the actual article, but when he says "imputed", what he means is he made up the data. He declares quite plainly that if data were "outliers" (not clustered with the others), he treated them as if they were missing data, and he made up values for those deleted ones. I don't know where you stand on this, but deleting data and making up data are both problems for me.
This next one simply makes me wonder. It's not about Kirsch, per se. But look at the difference in group sizes. "The dataset comprised 35 clinical trials (five of fluoxetine, six of venlafaxine, eight of nefazodone, and 16 of paroxetine) involving 5,133 patients, 3,292 of whom had been randomized to medication and 1,841 of whom had been randomized to placebo."
But no matter what he did with the data, his main outcome measures, presented in Table 2, were that drug was significantly superior to placebo, p <.001. All the rest of the paper is an attempt to provide an intellectual construct which dismisses his own statistical results, IMHO.
Kirsch applied statistical measures which were completely inappropriate, based on ordinal data, but his own evidence was that drugs were significantly better than placebo. His conclusions are not supported by the data.
We do not know that, as an example, the difference between a HAM-D score of 14 and 15, is the same magnitude as the difference between 24 and 25. We do not know that the subjects were assessed under identical circumstances (even time of day could make a difference). We do no know that expectancy played no part in the measures. What if some patients (placebo or drug) reported improvements because they didn't want to disappoint the doctor running the trial?
To do what Kirsch did here was completely inappropriate. He didn't even reference similar work undertaken by other researchers. Kirsch is neither the first, nor the only person, to attempt to interpret the clinical trial data on antidepressants collectively. A balanced report would certainly have included references to others raising similar questions.
One such researcher is Arif Khan. I like this quote from Dr. Khan, from the April 2000 volume of Psychiatric Times:
"The less-than-impressive results in these and other studies also calls to mind the fact that patients assigned to placebo treatment in clinical trials are not "getting nothing." The capsule they receive is pharmacologically inert but hardly inert with respect to its symbolic value and its power as a conditioned stimulus. In addition, placebo-treated patients receive all of the commonly employed treatment techniques: a thorough evaluation; an explanation for their distress; an expert healer; a plausible treatment; expectation of improvement; a healer's commitment, enthusiasm and positive regard; and an opportunity to verbalize their distress. Jerome Frank, Ph.D., in his book Persuasion and Healing: A Comparative Study of Psychotherapy made a compelling case that these parts of treatment are the active ingredients of all the psychotherapies (1993)....
A cautionary note is indicated about the generalization of these data to the clinical management of depressed patients. The less-than-impressive difference between drug and placebo in this and other studies of clinical trials does not speak directly to the effectiveness of antidepressants in clinical practice. Participants in antidepressant clinical trials are a highly select group and are not representative of the general population of depressed patients. They are not actively suicidal, they are almost always outpatients who are moderately rather than severely or mildly depressed, and they are free of comorbid physical or psychiatric illness. They are likely to have a higher placebo response rate than more severely ill depressed patients.
Furthermore, the primary aim of these studies is not to assess the optimal effect of antidepressants, but rather to rapidly assess efficacy of new drugs so they can be brought to the market. Therefore, dose, duration and diagnosis in clinical trials are not necessarily ideally suited to identify the optimal effects of antidepressants. Accordingly, clinical trials may identify the lower bound of the effect size compared to placebo."
My conclusion is that Kirsch's work is unreliable and unscientific. In totality, I believe that Kirsch simply used this paper to express his personal beliefs, in the guise of scientific inquiry. It was intellectually dishonest, misleading, and ultimately, disgraceful.
Now, I'm going to step aside from Kirsch, and speak of my own experience. I have participated in a clinical trial. I have helped design a clinical trial (based on my experience). And I have read numerous clinical trial submissions to the regulators (which include all raw data, often measuring over 300 pages).
I only wish that real life medical practise even fractionally approximated the level of care that goes into a clinical trial. My own physicians gave me great care during my difficult times, but that care never came close to the degree of personal interaction that is required to gather the data for a clinical trial. Comparing double-blind efficacy trial environments to those experienced by patients in real life is simply inappropriate.....and yet we do so, unthinkingly.
I would suggest that less than 10% of depressed patients would even qualify for a clinical trial. The exclusion criteria are quite comprehensive, because they must define the study population precisely, if there is to be any hope that the measurements are valid. Most of us don't fit nicely into those intellectual cubby holes.
And I suspect that the population here also is unrepresentative of the general population. This is a self-selected population of people questioning, struggling, seeking to expand the scope of their existing care.
I mentioned a moment ago that I have read the raw data for some clinical trials. It truly is remarkable just how much information is collected. And how often it is misinterpreted, based on summary statistics.
It's been a while since I wrote about it, but in this link you'll see a lengthy post by me that also has a couple of tinyurl hyperlinks. Lots of info in those.
http://www.dr-bob.org/cgi-bin/pb/mget.pl?post=/babble/20050504/msgs/494929.html
poster:larryhoover
thread:1052457
URL: http://www.dr-bob.org/babble/20131025/msgs/1054073.html