

OPHTHALMOLOGY PRACTICE 

Year : 1997  Volume
: 45
 Issue : 2  Page : 119123 

Confidence with confidence intervals
R Thomas, A Braganza, LM Oommen, J Muliyil
Schell Eye Hospital, Department of Ophthalmology, Christian Medical College, Vellore, India
Correspondence Address: R Thomas Schell Eye Hospital, Department of Ophthalmology, Christian Medical College, Vellore India
Source of Support: None, Conflict of Interest: None  Check 
PMID: 9475032
When considering the results of a study that reports one treatment to be better than another, what the practicing ophthalmologist really wants to know is the magnitude of the difference between treatment groups. If this difference is large enough, we may wish to offer the new treatment to our own patients. Even in well executed studies, differences between the groups (the sample) may be due to chance alone. The "p" value is the probability that the difference observed between the groups could have occurred purely due to chance. For many ophthalmologists assessing this difference means a simple look this "p" value to convince ourselves that a statistically significant result has indeed been obtained. Unfortunately traditional interpretation of a study based on the "p" value at an arbitrary cutoff (of 0.05 or any other value) limits our ability to fully appreciate the clinical implications of the results. In this article we use simple examples to illustrate the use of "confidence intervals" in examining precision and the applicability of study results (means, proportions and their comparisons). An attempt is made to demonstrate that the use of "confidence intervals" enables more complete evaluation of study results than with the "p" value. Keywords: Confidence Interval, p value
How to cite this article: Thomas R, Braganza A, Oommen L M, Muliyil J. Confidence with confidence intervals. Indian J Ophthalmol 1997;45:11923 
How to cite this URL: Thomas R, Braganza A, Oommen L M, Muliyil J. Confidence with confidence intervals. Indian J Ophthalmol [serial online] 1997 [cited 2020 Apr 7];45:11923. Available from: http://www.ijo.in/text.asp?1997/45/2/119/15004 
Most ophthalmology journals rightly insist on appropriate statistical analysis of data submitted for publication. It would be nice to just read (and apply) the conclusions section, but as clinicians we are sometimes forced to interpret the statistics ourselves. The easiest way is to abdicate and look at the allimportant "p" value indicating statistical significance, and assume that it must also indicate clinical significance. The "p" value, however is only a statistical measure indicating the probability that the observed result could have occurred due to chance alone. Thus a "p" value < 0.05 merely indicates that the result in question has a less than 5% probability of having occurred due to chance. Traditionally we decide that a result is significant if this probability is less than 5%. However, when such a decision may change the way we treat our patients, the more responsible way is to ask how sure can we be about this "p" value? Life usually has no simple "Yes" or "No" answers and is unfortunately full of imprecision; statistical tests (including "p" values) are no exception. Before we believe the "p" value in an article, a responsible ophthalmologist would want to know the imprecision associated with data collection. This could alter our interpretation.
In life we make decisions based on past experiences. As we cannot possibly experience everything, we extrapolate and apply our limited experience to other situations. We follow a similar approach to study diseases and treatment. As we cannot possibly examine and make observations on all the cases of the disease in question, we study a representative sample of cases. We assume that what we find in the sample represents the truth for the larger population. Let us first look at this concept of estimation from a single sample with a simple example. We wish to determine the mean intraocular pressure (IOP) in our population. As we cannot measure the IOP for the entire population in our state, in order to determine the mean intraocular pressure, we measure the IOP on a sample of the total population. Based on these "observations", we infer the mean IOP of the entire population from which the sample was drawn.
Alternatively, we may want to compare the mean IOP in two groups. For example, if we want to determine the IOP lowering effects of two treatment regimes in glaucoma, we may test the regimes on two study groups. These study groups are the "samples" from all the glaucoma patients who exist in the wide world out there the "glaucoma population" being investigated in this study. The obvious problem with this approach, however, is that any sample represents only a small part of the population and chance variations can occur between many such samples. As we can't possibly include the entire population of glaucoma patients in the study, we can obtain only an imprecise idea of the IOP.
How do we solve this problem? Fortunately we don't have to give up hope. Incredible as it may seem, there are some useful purpose for statistics; we can actually use some statistical methods to measure how imprecise our estimates of the truth are. Even though we may not know the absolute truth, we can have a fairly good idea of how close to it (or how wrong) we are. Confidence intervals is one such measure and is increasingly being used and reported in the scientific literature.
Some definitions   
Before we explain the use of confidence intervals with examples, let us define a few statistical terms involved in the calculation of the confidence interval in a language understandable to the average ophthalmologist. We will use the IOP example introduced earlier throughout.
Mean: In this article mean is used synonymously with "average" a term we are all familiar with. For example the mean IOP is the arithmetic average IOP of a sample of 100 normal subjects.
Standard deviation: The standard deviation is a measure of the variability or spread between individuals in the factor under investigation. In our example it is a measure of the variability of individual values of IOP in the sample around the mean.
Standard Error: The standard error is a measure of the uncertainty in using a sample to determine the population mean. In our sample of 100 normals we obtained a mean IOP of 17 mmHg. If we determine the mean IOP from several such samples (each consisting of 100 normals), all are likely to provide slightly differing results. The standard error of the mean indicates the extent of this uncertainty about the true mean IOP of the "entire" normal population.
"p" values: The usual way to look at results involves the familiar p values, (p stands for probability). The "p" value represents the chance of a false positive conclusion, that is a conclusion that a treatment is effective when in truth it is not. It is simply the probability that the result could have occurred due to chance alone and is derived by statistical methods not directly relevant to our present discussion. However it is important to know that "p" values are highly dependent on sample size. Large samples could result in small (clinically insignificant) differences becoming statistically significant (low "p" value). Conversely with a small sample, a clinically important difference may have a large "p" value, rendering it statistically insignificant.
For example the mean IOP measured in 3000 ophthalmologists working in North India (mean 16.3; standard deviation 4 mmHg) was found to be statistically significantly different (p<0.05) from the mean IOP of 3000 ophthalmologists working in South India (mean 16.1; standard deviation 4 mmHg). In this fictional study the northern borders of Andhra and Karnataka determined north and south India. This statistically significant result could prompt us to postulate that food habits have a ocular hypotensive effect that merits further investigation. Or we could recognize that large numbers are showing up clinically too insignificant differences for further studies.
Small sample sizes have the opposite effect. We measured IOP in our patients with corneal uclers. Thirty one of 41 ulcers (75.6%) with presenting IOP > 22 mHg healed compared to 12 of 21 ulcers (57.1%) with IOP > 22 mHg. This result was not statistically significant (p = 0.14). However when we consider the 18.5% difference between the groups (for every 5 patients with an IOP of > 22 mHg we would have one more ulcer heal, compared to an IOP >22 mHg), this difference would certainly seem to be clinically significant. Therefore, association of IOP with ulcer healing would merit further investigation.
Conventionally "p" < 0.05 is considered statistically significant. This means that the probability of the result such as obtained in the study occurring purely due to chance is less than 5 out of 100 (1 out of 20). This is based on the mean and standard errors. (If a value lies more than two standard errors away from a mean, the probability of it occurring due to chance is less than 5%, and we would like to consider this a significant cutoff). Sounds reasonable, doesn't it? But what happens when p=0.06? Not significant of course. And p=0.04? Significant, naturally. In reality, how different are these values? The difference of 0.02 between these values is crucial only because we treat them differently depending on where they lie in relation to our arbitrary (but conventional) cutoff value of 0.05. But where does the truth lie in relation to the "p" value?
Confidence interval: Confidence intervals quantify the amount of imprecision in the study and are the preferred measure. Thus, instead of telling us that some degree of benefit is likely to occur by chance less than 5% (p<0.05) or 1% (p<0.01) of the time, the authors can tell us the range of benefit that is likely to occur 90% or 95% of the time. This is the confidence interval approach. Specifically, the 95% confidence interval of a benefit indicates a range within which, 95 out of 100 times, its true value will lie; you can be 95% certain that the truth is somewhere inside the 95% confidence interval.[1] The confidence limit indicates the numerical borders (upper and lower) of the confidence interval.
Why 95%? We could as easily calculate a 90% interval or if we want to be really precise, a 99% interval. While 95 is arbitary, we'll see, that it is related to our conventional p<0.05.
Uses of the confidence interval   
There are a number of ways the ophthalmologists could exploit the confidence interval.
Let us take a quick look. Remember how we attempted to estimate the mean IOP from a sample of 100 normals? Let us look at a similar estimation of a complication from a single sample. We attempted to determine the risk of visual field "wipe out" in patients with advanced glaucoma undergoing trabeculectomy. In a prospective series of 60 patients with advanced glaucoma and split macula fixation as determined by the macular program of the Humphrey visual field analyzer using a size V target, one patient experienced a "wipe out". We could present this data with a conclusion that the risk of this complication is very low (1.6%), probably no different from a series of operations on other advanced glaucomas without split macula fixations (p>0.05, if you insist) and hence not worth considering in decision making. The responsible ophthalmologist would immediately calculate the 95% confidence interval (0 to 5%) and decide that a possible 5% risk of total visual loss could not be ignored. (For an explanation of the simple calculation involved see Appendix 1). He might object and communicate his reservations to the rest of his colleagues.
Such a quick look at the data allows us to make our own practical decisions independent of the p value. If the actual complication rate of a procedure is 1 in 10, we could easily do 10 cases without encountering a single complication. Indeed if we are lucky and keep our fingers crossed we could possibly do 20 cases with no complications. If no complications have been encountered, the usual formula doesn't work. For example if we say we had no "wipe outs" in a series of 10 cases, our formula wouldn't work. In this case a general rule of thumb provided by Sackett[2] can be applied: If we did 10 cases without encountering a complication, the true complication rate could be as high as 26%. If the number of cases is 25 without complications, the true rate could be as high as 11%. With 50, 75 and 100 cases this becomes 6, 4 and 3% respectively.[2]
The value of this ready reckoner is obvious. A recent paper presented the results of six Molteno implants; none developed flat chambers. Using our guide we realize that the true rate of flat chambers could be higher than 26% (actually around 39%). A recent article suggested that the retinal redetachment rate following phacoemulsification in eyes that had undergone retinal attachment surgery was less than that following standard extracapsular surgery.[3] They experienced no redetachment in a series of 47 operations. Using our ready reckoner, we realize that the true redetachment rate could be as high as 6%. This is well within the rate reported with standard extracapsular cataract surgery. If we had not performed this calculation we may feel obliged to do phacoemulsification for all such patients.
While on ready reckoners, here's another useful one. When considering a rare event, we must look at three times the number in the denominator to be 95 % sure we'll come across at least one of those events.[4] For example, we may wish to study whether phacoemulsification actually eliminates expulsive hemorrhage in cataract surgery (compared to standard extracapsular cataract surgery). The incidence of expulsive hemorrhage with extracapsular surgery is about one per thousand (thousand is the denominator for this rare complication). To be 95% sure we'll see at least one case of expulsive hemorrhage, we'll need to do at least 3000 cases. To show an actual difference between groups we'll need a lot more cases in each group (which is quite difficult isn't it?). Use of this rule of thumb is useful in viewing the usefulness of a study of expulsive hemorrhage that has used 100 cases in each group.
Determining magnitude of the difference   
A study was performed to compare the pressure lowering effects of trabeculectomy versus medical treatment. The study groups were similar in all respects and consisted of 100 patients (one eye of each patient) in each group. At one year the mean IOP in the medical treatment group was 21 mmHg (standard deviation = 3.5 mmHg) and 18 mmHg in the operated group (standard deviation = 4 mmHg). This difference of mean IOP between the groups (3mm Hg) is statistically significant p<0.05. We then calculate the confidence interval as shown in Appendix 2. The 95% CI for the difference in means ranges from 1.96 to 4.04. This is traditionally presented in the following manner: difference in means 3mmHg (p<.05); 95% CI 1.96 to 4.04 mmHg. (We provide the standard format of presentation here in order to ensure that when encountered in a journal the average ophthalmologist doesn't throw a fit).
If the two treatments were similar, the difference of mean IOP between the groups would have been zero. If one treatment was truly more effective the range of values (95% CI) would lie entirely on one side of zero. Our range of difference (95% CI) does not overlap 0. (In the case of a ratio like relative risk or odds ratio, no difference would be indicated by 1; to see if there was no difference in this situation, we'd check if the CI overlaps 1). In this case the range not overlapping 0 indicates that we can be fairly sure of the significance of this p value. On the other hand the lower value is not too far away from zero; hence we don't want to be cocksure.
What if we had used only 10 subjects in each group and obtained the same mean IOP and standard deviation in each group? We get the same mean difference and a CI ranging from 0.52 to 6.52. The range has enlarged and the lower limit has overlapped 0. In this situation we would not accept the difference quite so readily (see the next section on negative tests).
As this example demonstrates, a larger sample leads to a narrower CI and we are more likely to get a significant result. If the clinical situation demands a greater degree of certainty, we'd calculate a 99% interval; if all else remains the same, this would widen the CI. If, on the other hand, we were happy to be less sure (90%), we'd get a narrow CI. An increase in the variation (standard deviation) of the values also widens the interval.
It is important not to overinterpret the CI. In the example above, the CI we obtained does not mean that a difference in means of 0.45 is just as likely as the observed difference of 3. The value we actually got (3mm) is the most likely to be correct. Values towards the limits are possible, but less likely to be true.
A narrow confidence interval is not necessarily good news. In a significant study a narrow interval indicates that the estimates are precise. However this may not imply clinical significance. For instance a study shows that treatment A with a cure rate of 52% (CI 5153%) is significantly better than the established treatment B which has a cure rate of only 50%. By now one could guess this statistically significant result ("p"<0.05) was probably obtained by using 2000 patients in each group. While this confidence interval is small and very precise it is not exciting, at least clinically. On the other hand treatment C with a cure rate of 64% (CI 5276%) despite the wide interval sounds more clinically promising than treatment A and merits our consideration.
The confidence interval we select depends on the clinical situation. For the sake of an example let's assume that medical treatment has a success rate of 85% and is very cheap; surgery increases success to 90 % but costs more initially. We'd want a 99% confidence interval coming nowhere near 0 before we even consider switching to surgery. On the other hand if medical treatment had a success rate of 40% and was quite expensive while surgery had a 90 % success, we might be happy with a 90% CI in our decision to switch. Either way, especially in the first instance, we'd look at other parameters like the number needed to treat (NNT). [5, 6] Incidentally, even in the case of NNT, it is possible (and desirable) to build a confidence interval around the NNT.[5]
[TAG:2]RELATIONSHIP OF CONFIDENCE INTERVALS TO "p" VALUES[/TAG:2]
The confidence interval is closely related to the "p" value. In fact it is an algebraic rearrangement of the equation used to calculate "p" values. The confidence interval allows calculation of the "p" value at the associated level of significance. The zero difference between means (or a ratio of 1 for relative risk and odds ratios) is what the "p" value is checking for. If the confidence interval overlaps zero (difference) or 1 (ratio) the "p" value will be insignificant. If it does not overlap these values the "p" will be significant. The will occur at p< 0.05 for the 95 % CI and p < 0.01 for the 99 % CI.
Interpreting a negative study   
The confidence interval is of use in the interpretation of a negative study. Let's look at the example we quoted earlier. We studied the influence of IOP on the healing of corneal ulcers. Of the 41 ulcers with presenting IOP ≤22 mmHg, 31 healed. Of the 22 ulcers with presenting IOP >21mmHg, 12 healed. The difference of 18.5% in the healing rate between the two groups was statistically insignificant (p>0.05). However when we look at the confidence interval around the difference in healing rate, further possibilities emerge (please see Appendix 3). The confidence interval around the proportion difference of 18.5% is 6.5% to +43.5%. That is, raised IOP may actually be associated with better healing (6.5%), or adversely affect healing (43.5%). The upper limit of the confidence interval indicates that the association between a lower IOP and healing could be fairly high and perhaps should not be ignored despite the "p" value. Also this end of the confidence interval seems more biologically plausible. A word of warning: confidence intervals only quantify the imprecision inherent in our data; they do not provide any indication of nor absolve us from looking for the other methodological errors that can occur in a study. In our corneal ulcer example the reported CI should not prevent questions regarding the accuracy of the method used to measure the IOP etc. Incidentally, the Tonopen was used for the above study.
In summary, the naive way to determine significance is to abdicate to the ready made "p" value. We, however, are more interested in the actual magnitude of the difference (represented by the confidence interval); this is what leads us to clinical significance en route to our goal of improved care of patients[6].
References   
1.  Gardner MJ, Altaian DG. Statistics with Confidence. London: British Medical Journal, 1989:619. 
2.  Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. 2^{ndToronto: Little, Brown and Co. 1991:175176. } 
3.  Kerrison JB, Marsh M, Stark WJ, Haller JA. Phacoemulsification after retinal detachment surgery. Ophthalmology 1996;103:216219. 
4.  Riegelman RK, Hirsch RP. Studying a study and testing a test. 2 ^{nd} Little, Brown and Co. 1989:114. 
5.  Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. 2 ^{nd} ed. Toronto: Little, Brown and Co. 1991:189. 
6.  Thomas R, Braganza A, Muliyil JP. Assessment of clinical significance: The number needed to treat. Ind J Ophthalmol 1996;44:113115. 
[Table  1], [Table  2], [Table  3]
