|Year : 2000 | Volume
| Issue : 3 | Page : 245-50
Sample size for ophthalmology studies
TJ Naduvilath, RK John, L Dandona
Public Health Ophthalmology Service, L.V. Prasad Eye Institute, L.V. Prasad Marg, Banjara Hills, Hyderabad 500034, India
T J Naduvilath
Public Health Ophthalmology Service, L.V. Prasad Eye Institute, L.V. Prasad Marg, Banjara Hills, Hyderabad 500034
Source of Support: None, Conflict of Interest: None
Knowledge and the usage of actual sample size formulae are a necessity as validity of the inferences from research studies is often dependent on this. This paper explains how sample sizes are calculated. The concept of sampling variation is explained to emphasize the need for its proper calculation. Sample size formulae are explained with examples to provide researchers with a means of calculating the sample sizes for the commonly used study designs. Ophthalmic data are used as examples. It is perceived that this will improve the quality of inferences drawn from ophthalmic research studies.
Keywords: Humans, Ophthalmology, methods, Reproducibility of Results, Research, methods, statistics & numerical data, Sample Size,
|How to cite this article:|
Naduvilath T J, John R K, Dandona L. Sample size for ophthalmology studies. Indian J Ophthalmol 2000;48:245
The common failure to estimate the sample size before the start of any research study often reflects on the validity of the inferences made from the collected data. It is surprising but true that sample size calculations are often overlooked in ethically sensitive areas of medical research such as clinical trials. Halpern reported that several published reports based on data from clinical trials of nephrotoxicity had considerable lower power than believed; their lack of significance reflected only the mistaken assumption and cannot be automatically taken to imply small treatment effect.
In ophthalmologic studies the fundamental unit for statistical analysis is often the eye rather than the person. When both the eyes of an individual are considered, the observations are generally highly correlated. Such correlations are often not accounted for while calculating sample sizes for ophthalmologic studies. This paper intends to provide a basic understanding of calculating sample sizes for various ophthalmologic study designs.
| The Principles|| |
| Need for an appropriate sample size|| |
When medical researchers collect a set of data, it is usually from a small number of subjects. These small numbers of subjects are referred to as a sample. This sample is expected to represent all the possible subjects (defined by the specific criteria) otherwise called the population. Once the data are collected for a study, describing the results of that study is only the first step, while the most important step is the applicability and generalizability of these results to the population.
The desire to generalize prompts the medical researcher to usually take the results beyond the actual persons or items studied. It must be noted that the results can be generalized only from the observations made from the sample. When a doctor wants to determine the blood sugar level of a particular patient, a small amount of blood is sufficient. This is because blood in the body is so well mixed that these few drops drawn would tell the same story as any other. Here there is hardly any variation between samples (drops of blood) drawn from the population (all the blood in the body). One seldom comes across such a population. Most populations have an endless variation among individuals as well as variations in their environment. Therefore any sample of human subjects is subject to large variations. This variation between humans consists of the true variance between humans and a random variation between humans, which has occurred due to sampling. The latter is called sampling or random variation, otherwise called chance. This concept has been discussed in our previous paper on statistical analysis.
Let us suppose, a researcher wants to test whether two drugs A and B differ in reducing intraocular pressure (IOP) in glaucoma patients. It is hypothesized initially that the two drugs do not differ in the extent of reducing IOP. A sample of 10 glaucoma patients is selected and administered drug A in one eye and drug B in the fellow eye. The reduction in IOP by the drugs is recorded and the mean of the difference between the IOP reduced by drugs A and B in the pairs of eyes is calculated, and is found to be 5.1 mm Hg. Though this difference between the two drugs in reducing IOP is clinically significant, its statistical significance has to be tested as the result could have occurred by chance. Since there is variation in humans, a sample of 10 other glaucoma patients may portray a different picture, and hence cannot be generalized straightaway to the population. The significance can be checked for by the 95% confidence interval of the mean difference computed using the mean, standard deviation, and sample size. Such an interval for the given sample is 0.8 mm Hg to 10.4 mm Hg, implying that when a very large number of samples of size 10 are studied, this interval will contain the mean difference in 95% of the trials. If these mean differences are plotted in a graph we arrive at what is called the "sampling distribution" of the mean which resembles the bell shaped curve of the normal distribution (Figure). The central value of the graph will provide us the mean in the population with the most likelihood, which is 5 mm Hg. Though the sample mean (5.1 mm Hg) derived with sample size of 10 was close to the population mean (5 mm Hg) it had to be warded off as occurred out of chance because the 95% confidence interval around 5.1 mm Hg contained 0. In this case a small sample size and hence a wide 95% confidence interval would leave one with a less precise result and an erroneous conclusion that there is no significant difference between drugs A and B.
On the contrary, if a clinically insignificant difference of 1 mm Hg between drugs C and D in reducing IOP is obtained from a very large sample of 1000, with a narrow 95% confidence interval such as 0.5 mm Hg to 1.5 mm Hg, which is statistically significant, saying that it is likely to be so in the population, one needs to question the clinical relevance of this statistically significant finding. Having a sample size larger than what is appropriate leads to a waste of resources and exposes an additional number of subjects to the possible disadvantages of research; this is ethically unacceptable. Hence, an optimal sample size is paramount rather than an arbitrary number.
| Terminology|| |
| Level of significance|| |
Sample studies entail the risk of concluding that there exists a difference, when in fact there is none. This is called type I error (α) or level of significance, popularly known as p value. It is usually set as 0.05. Sample size formulae require the appropriate value of Z1-α/2. This is standard normal variate corresponding to an area of 1-α/2 in the distribution of the null hypothesis (the hypothesis that there is no difference between groups in the population). This is a normal distribution with mean 0 and SD 0. Often used Z1-α/2 values for level of significance are given in [Table - 1].
| Power|| |
Similarly, sample studies entail the risk of concluding that there does not exist a difference, when in fact there is. This is called type II error (β). The inverse of this is called power. Power of a sample size is a measure of how likely it will pick up a difference between two groups that truly exists in the population. This probability is often set to 0.80 or 0.90. Sample size formulae require the appropriate value of Z1-β. This is standard normal variate corresponding to an area of 1-β in the distribution of the alternative hypothesis (the hypothesis that there is a difference between groups in the population). Often used Z1-β values for powers are given in [Table - 1].
| Effect size|| |
Let us assume that a researcher wants to determine if the postoperative visual acuity of cataract patients following an extracapsular cataract extraction (ECCE) with intraocular lens (IOL) implantation was significantly better than an intracapsular cataract extraction (ICCE). He defines the study's end point as a visual acuity of at least 6/18. To calculate the sample size for this question, the researcher must decide on a clinically important difference in the success rates of the two groups. For an estimation of sample size we must have some idea of how big an effect of treatment can be expected, based either on previous reports or what would be of scientific or clinical interest. This is known as the effect size. In this example if the researcher wishes to detect any difference greater than 10% as significant, then 10% is the effect size of the treatment. This will be referred to as "d" in the text.
| Precision|| |
The comparisons of groups are not the only situations for calculation of sample size. Often one wants to estimate the prevalence of a particular disease in the population or in general find out the magnitude of a health problem in the population. In such a situation called estimation, the researcher must decide as to the level of precision he requires while estimating the population value - in other words the amount of error he can accommodate while estimating the population value. This error can either be relative to the estimated value called relative error or it can be independent of the estimated value, called the absolute error. In the text we consider only absolute precision as relative precision can be converted to absolute precision. Absolute error is also referred to as "d" in the paper.
| Design effect|| |
While estimating a population parameter, simple random sampling is unlikely to be the sampling method of choice in an actual field survey. Cluster sampling is the most frequent choice of surveyors as it is logistically more feasible than simple random sampling. In such situations, due to the correlation of observations within a cluster, the precision of the population estimate should be adjusted using the design effect. It is defined as the ratio of the cluster sampling variance to the variance as if it were a simple random sample. This would mean that if the design effect of a sampling design is 2, to obtain the same precision, twice as many individuals would have to be studied as with the simple random sampling strategy. While calculating the sample size, the design effect for a cluster sampling strategy is often taken as 1.5 - 2. But while constructing the confidence intervals for the estimate, the exact design effect for the variable must be calculated and adjusted for in the confidence interval.
| Types of variables|| |
Statistical analysis and the sample size formula varies based on the type of the variable studied. A variable in statistical terms is a characteristic that may vary from person to person and the values taken by these variables are referred to as data. For the purpose of calculating sample size, we will restrict the classification of variables to two, namely continuous and categorical. A variable that can take any given value in a given range or could have an intermediate value between any two given values is called a continuous variable. Examples of continuous variables are age and intraocular pressure. On the other hand, variables like gender with categories as males and females are called categorical variables as they indicate categories rather than values.
| Method of data collection|| |
Statistical analysis and the sample size formula is also dependent on the method of data collection. Data can either be collected in an independent or matched fashion. In simple terms the independent samples are those where the actual selection of individuals from one sample group is not affected by the selection or individuals for the other groups. At the other extreme, a paired sample arises when every individual or observation in a group has a unique match or pair in the other group.
| Sample size calculation|| |
Sample size formulae are explained using examples for both testing of a hypothesis and estimation of a population parameter. [Table - 2] gives the list of formulae used for the sample sizes in various situations of testing a hypothesis while [Table - 3] is for estimating a population parameter.
| Testing a hypothesis|| |
For testing a hypothesis, we assume that the two treatments, namely controls (c) and the new treatment (t), are to be compared and that there will be an equal number of subjects allocated to each treatment. Only one random eye or the worse affected eye is used for the purpose of the study.
Suppose that the endothelial cell loss were to be studied between children aged 0-5 and 6-10 years after a cataract operation. The clinician wishes to detect a difference greater than 1000 cells between the groups. From known sources it is reported that the variation (SD) in the endothelial cell loss in the older children is around 1500 cells.
The study design is an independent grouped study where the measurement variable is continuous. The effect size of the treatment is 1000 cells and the SD for the control group is 1500 cells. Fixing the level of significance as 5% and the power to 80%, the sample size in each group is calculated from [Table - 2] as
N = [2 x (1.96+0.84)2 x 15002]/10002 = 35
In a similar study if the researcher wanted to also know the change in the endothelial cell count after a cataract surgery among children aged 0-10 years, then the endothelial cells would be counted before and after the operation. It is reported that for adults there could be a drop of 500 cells after a cataract surgery. The SD of this cell loss is around 700 cells.
Here the study design becomes matched and the measurement variable is continuous. The mean difference the researcher is looking for is 500 cells and the SD of this difference is 700 cells. Fixing the level of significance as 5% and the power to 80%, the sample size is calculated from [Table - 2] as
N= [(1.96+0.84)2 x 7002]/5002 = 15 subjects
In another situation the researcher wants to compare the success rates of medically and surgically treated POAG. Success is defined as maintaining IOP less than 22 in the first year of treatment. The success rate for the medically treated eyes is quoted as 50 %. The researcher will change the modality of treatment even if there is a 15% improvement in the success rate.
Here the study design is an independent grouped study where the measurement variable is categorical. The effect size of the treatment (surgery) is 15% and the standard treatment has a success rate of 50%.
The pooled success rate is p = (pc + pt) / 2 ; p=(50+65)/2 = 57.5%
Fixing the level of significance as 5% and the power to 80%, the sample size in each group is calculated from [Table - 2] as
N= [(1.96+0.84)2 x 2 x (57.5) x (100-57.5)]/ 152 = 170 subjects
In another study, a researcher wants to compare the incidence of acute red eye over a period of one year on wearing two different types of contact lens namely etafilcon and polymacon A on an extended wear basis. The design involves putting etafilcon on one eye randomly and polymacon A on the other eye. The incidence of acute red eye with contact lens wear on an extended wear basis was found to be around 8%. The researcher wished to detect any difference greater than 5%.
Here the design is a matched study where the measurement variable is categorical. The effect size of the lens type is 5% and the incidence of acute red eye in standard extended wear lenses is 8%. As this is a matched study, we need to calculate the proportion of subjects in which treatment in one eye succeeds while in the other eye it fails (f). Although f is generally not known, a formula is available for estimating it on the basis of clinical observations. In this situation if f is set as 6%, fixing the level of significance as 5% and the power to 80%, the number of pairs of eyes is calculated from [Table - 2] as
N= [(1.96 x \?\0.06) + (0.84 x \?\(0.06-0.052))]2/0.052 = 186
Two eyes per subject
In ophthalmologic studies, it is quite common that each subject contributes two eyes to the analysis, where the within subject outcomes are likely to be correlated. A study to test the effect of Timolol in the reduction of IOP, for instance, might include both eyes. In such studies, when correlation within subject is not accounted for, more significant results than expected will be obtained due to the reduction of variation within the data points. A measure of the correlation between eyes is the intraclass correlation coefficient r, which has the range of -1 to 1. Usually r is positive, indicating for example that IOP is high or low in both eyes, simultaneously. This correlation between eyes is adjusted for while comparing two groups. If the variable under study is continuous a valid t test may be calculated by simply dividing the standard t statistic by the square root of (1+r). To compute the required number of eyes per treatment group, the formula in [Table - 2] is used and then simply multiplied by the quantity (1+r). On the other hand, if the variable under study is categorical the standard chi square test may be used by dividing the standard x2 statistic by (1+r). The required number of eyes when there is a correlation may be calculated using the formula given in [Table - 2] for categorical data and then multiplying this value by (1+r).
Lost to follow up
All prospective clinical studies need to account for the subjects likely to leave the study for reasons other than the outcome of the study. The statistical power will be affected if the sample size fails to account for these lost to follow up. If l is the proportion of study subjects who are expected to leave the study for reasons other than the outcome under study, then these lost to follow up can be accounted for while calculating the sample size, by simply dividing the calculated sample size as mentioned above by the factor (1-l). The proportion that is expected to drop must be estimated from prior experience in a similar clinical setting.
| Estimation|| |
Sample sizes are not only required for testing a hypothesis, but is equally important while trying to accurately estimate a particular population parameter. Estimating the correct sample size and using the appropriate sampling strategy can never be overemphasized in sample surveys. The sample size formulae used for estimation are given in [Table - 3]. For the purpose of sample size calculation it is assumed in the next two examples that only one random eye or worse eye or the better eye is used for the analysis. It is also assumed that the sampling strategy used is simple random sampling.
An epidemiologist wishes to estimate the proportion of individuals who are pseudophakic in the population aged greater than 30 years. It is known that the true rate is unlikely to exceed 2% and the researcher wants to estimate the proportion within 0.5% of the true value with 95% confidence.
Here the anticipated population prevalence of pseudophakes is 2% and the absolute error is considered for precision as the researcher wants to estimate the prevalence within 0.5%. Setting the level of significance at 5 % (here the power is not used, as power is not a parameter for estimation)
N= [1.962 x 0.02 x (1-0.02)]/0.0052 = 3012 subjects
On the other hand, if the expected prevalence remains at 2%, but the epidemiologist is willing to accept a wider confidence interval and settles for a precision of 1% instead of 0.5%, then the sample size will reduce to
N= [1.962 x 0.02 x (1-0.02)]/0.012 = 753 subjects
Thus the precision required significantly affects the sample size.
In another situation, the researcher wants to estimate the average IOP among individuals aged greater than 30 years in the community. He wants to estimate it within 2 mmHg on either side of the mean. It was found that a rough estimate of the standard deviation of IOP was around 6 mm Hg.
Here the variable to be estimated is a continuous variable and desired precision is 2 mm Hg and the expected SD is around 6mm Hg. Substituting these values in the formula from [Table - 3] we get
N= [1.962 x 62]/ 22 = 35 subjects
In the abovementioned examples for estimation, the samples were calculated assuming that a simple random sampling technique is used. This is often not possible. The sample size for any other sampling strategy can be calculated by simply multiplying the design effect due to the sampling strategy and the calculated sample size. If in example 5, the cluster sampling is used and the design effect is assumed to be 2 then the sample size using this sampling strategy will be 6024 subjects instead. Later the actual design effect can be calculated and adjusted for in the confidence intervals.
Two eyes per subject
In ophthalmologic sample surveys usually prevalence of categorical data is often reported as a random eye, worse eye or the better eye. Two eyes per subjects in surveys are considered only for continuous outcomes like IOP. The sample size in these situation will follow the similar method as in estimation and then it is multiplied by (1+r), where r is the intraclass correlation between the eyes. While doing the analysis this correlation between the eyes must be adjusted. Rosner has discussed in detail methods of adjusting for this intra class correlation.
In conclusion, this paper discusses the importance of calculating the appropriate sample size using the correct formulae. Since a detailed discussion of sample size calculation is beyond the scope of this article, only the basic essentials are mentioned. These may suffice for many common research situations. However, there are many circumstances when the formulae given here are inadequate or inappropriate. [9,10] Randomized controlled trials often compare treatments using survival curves, case control analysis use odds ratio, and cohort studies use relative risks. Based on the design of the study and the type of analysis, formulae for calculating the sample size will differ. The references point to further reading on these topics. Consultation with an epidemiologist or a statistician early in the planning of a research study is likely to increase the yield from the research effort, and is therefore, recommended.
| References|| |
Halpern EF. Inadequacy of sample sizes in clinical trials of laboratory parameters attributable to invalid statistical assumptions. Clin Pharmacol Ther
Naduvilath TJ, Dandona L. Statistical analysis: the need, the concept, and the usage. Indian J Ophthalmol
Bennet S, Woods T, Liyange WM, Smith DL. A simplified general method for cluster-sample surveys of health in developing countries. World Health Stat Quart
Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and other Multivariate Methods
. Boston: PWS-KENT Publishing Company; 1988.
Connor RJ. Sample size for testing differences in proportions for the paired-sample design. Biometrics
Rosner B. Statistical methods in ophthalmology: an adjustment for the intraclass correlation between eyes. Biometrics
Rosner B, Milton RC. Significance testing for correlated binary outcome data. Biometrics
Donner A. Statistical methods in ophthalmology: an adjusted chi-square approach. Biometrics
Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Controlled clin Trials
Lwanga SK, Lemeshow S. Sample Size Determination in Health Studies: A Practical Manual
. Geneva: World Health Organization; 1991.
[Figure - 1]
[Table - 1], [Table - 2], [Table - 3]
|This article has been cited by|
||Evaluation of central, steady, maintained fixation grading for predicting inter-eye visual acuity difference to diagnose and treat amblyopia in strabismic patients
| ||Kothari, M., Bhaskare, A., Mete, D., Toshniwal, S., Doshi, P., Kaul, S. |
| ||Indian Journal of Ophthalmology. 2009; 57(4): 281-284 |
||Can ocular torsion be measured using the slitlamp biomicroscope?
| ||Kothari, M.T., Venkatesan, G., Shah, J.P., Kothari, K., Nirmalan, P.K. |
| ||Indian Journal of Ophthalmology. 2005; 53(1): 43-47 |