RESEARCH METHDOLOGY Year : 1998  Volume : 46  Issue : 1  Page : 5158 Statistical analysis: the need, the concept, and the usage TJ Naduvilath, L Dandona Public Health Ophthalmology Service, L.V. Prasad Eye Institute, Hyderabad, India Correspondence Address: In general, better understanding of the need and usage of statistics would benefit the medical community in India. This paper explains why statistical analysis is needed, and what is the conceptual basis for it. Ophthalmic data are used as examples. The concept of sampling variation is explained to further corroborate the need for statistical analysis in medical research. Statistical estimation and testing of hypothesis which form the major components of statistical inference are construed. Commonly reported univariate and multivariate statistical tests are explained in order to equip the ophthalmologist with basic knowledge of statistics for better understanding of research data. It is felt that this understanding would facilitate well designed investigations ultimately leading to higher quality practice of ophthalmology in our country.
The Need for Statistical Analysis When medical researchers collect a set of data, it is usually from a small number of subjects. These small number of subjects are referred to as a sample. This sample represents all the possible subjects (defined by the specific criteria) otherwise called the population. Once the data is collected for a study, describing the results of that study is only the first step, while the most important being the applicability and generalizability of the results to the population. Because of the desire to generalize, the medical researcher's interest usually goes beyond the actual persons or items studied. It must be noted that the results can be generalized only from the observations made from the sample. To study the hemoglobin content of the blood of a particular person, little blood from a finger prick is sufficient. This is because blood in the body is well mixed that a few drops would tell the same story as any other. Here there is hardly any variation between samples (drops of blood) drawn from the population (all the blood in the body). One seldom comes across such a population. Most other populations have an endless variation among individuals as well as their environment. Therefore a sample of human subjects is subject to large variations. Let us consider that the researcher intends to compare the efficacy of surgical over medical modality of treatment for primary open angle glaucoma by comparing the postoperative intraocular pressure (IOP). Fifty subjects were randomly allocated to the two treatment groups and the preoperative factors in the two groups are comparable. The study shows a difference in pressure reduction of 40% between the two groups. Now this difference of 40% is valid for this sample of 50 subjects. This sample of 50 subjects is one of the many possible samples. If another 50 subjects were studied the study would very likely yield an IOP difference different from 40%. And again if another 50 subjects were studied, it would have another value. Therefore the IOP difference is dependent on the sample of subjects. If the sample of subjects varies, the IOP difference would also vary. This variation in the mean IOP difference from sample to sample is called the sampling variation of the mean. The sampling variation in simple terms is considered as chance or error in the observed results. If the sampling variation is large, there is a large probability of making an incorrect inference about the population. This probability of making an incorrect inference is the measure of chance. The primary objective of a research is to make inferences for the population based on the observations of the sample which represents the population. While making an inference, there is always a chance of arriving at a wrong inference. It is therefore necessary that while making an inference the researcher quantifies the amount of chance taken in drawing an incorrect inference on the population. The researchers needs to quantify the confidence with which he can infer about the population from the study of one sample. This procedure of drawing conclusions about the population from sample values is done by doing statistical analysis. Essentially, statistical analysis is a methodology by which something about a large population is discovered on the basis of observing a subgroup or sample from that population.[2] In technical terms, a statistical inference is an attempt to reach a conclusion about a large number of items or events on the basis of observations made on only a portion of them. This is valuable since for obvious reasons it is not possible to study the whole population in most circumstances. The Concept of Statistical Analysis There are two major aspects of statistical inference, namely, estimation of parameters and testing of hypothesis. Often these procedures are both applied in a given problem since they complement one another. Estimation of population characteristics Statistical estimation is done to make a statement about the value of a population parameter, a certain quantity in the population. Here one may wish to make a statement about the best guess for the value of the population average. The most common method of estimation is to determine an interval from the sample which is used to make a statement about our confidence that the interval includes the population parameter. Such intervals are called confidence intervals. The most commonly reported intervals are the 95% confidence interval for a population parameter. The correct interpretation of a 95% confidence interval is that on repeated sampling, 95% of the intervals computed in the same way would include the population parameter.[1] This implies 5% would not. In a given study the researcher will only have one interval that has been computed and will not know if this interval actually includes the parameter or not. However, since 95% of the intervals computed by random sampling would include the population parameter, the researcher would be "95% confident" of the statement concerning the inclusion of a parameter in the interval. This 95% is the specified probability that the interval contains the population parameter. The specified probability is called the confidence level and the endpoints of the confidence interval are the confidence limits. To calculate the confidence limits around an estimated population parameter (for example, mean IOP, prevalence of blindness, regression coefficient for the rate of change in postoperative endothelial cell count over time) it is necessary to have measures of (i) the estimated sample value like the average IOP calculated from the given sample, (ii) standard deviation (SD) of sample estimate which is the measure of the variation in the sample values of the studied variable, (iii) sample size, and (iv) specified probability of including the true population value. It is also necessary to assume that the sample estimate is normally distributed. Before proceeding further, the concept of normality is discussed here. The normality of a distribution always refers to the distribution of a variable in the population described by its mean and SD.[2] The normal distribution has certain definite features. It is unimodal, symmetrical, and bell shaped. As the normal distribution is entirely described by its mean and SD, two normal distributions with the same mean and SD are identical. The distinguishing feature of the normal distribution is its area feature or more precisely specific relationship between the percentile values and their means and standard deviations. A typical normal distribution will have 50% of the area lying above the mean and 50% below. It is also true that 68.27% of the area lies between the values obtained by subtracting and adding the value of the SD to the value of the mean, that is, within mean ± SD.[2] Also, 95% of the area of the normal curve lies within mean ± 1.96 SD Figure. The figures 68.27%, 95% and 1.96 arise from the normality of the distribution and are specific to such distribution.[2]Most biological characteristics are normally distributed. One of the most important rules of statistics is that even if the underlying population is not normally distributed, the means of the samples themselves will be approximately normally distributed if the sample sizes are sufficiently large.[3] Therefore in most instances with sufficient sample size, atleast the means of the samples are normally distributed. As discussed earlier, the sample mean or the sample proportion can vary with the particular sample chosen and this is referred to as sampling variation. A measure of this variation is the SD of all the possible sample means or sample proportions. This is called the standard error (SE). The formulae for SE for the mean and proportion are given in [Table:1]. The SE does not summarize the variability of all the observation in the sample but summarizes the variability of all the possible sample means or proportions or any other sample estimate under study. While calculating the confidence limits, we assume that the sample estimate is normally distributed and the measure of its variation is given by its SE. The confidence level (specified probability of including the true population value) needs to be decided. A 95% confidence level would mean that there is a 5% probability that the interval does not contain the population value. It is the researcher that needs to decide the amount of error that can be accommodated while making the inference on the population parameter. An error of within 5% is considered medically acceptable. The formulae for calculating the 95% confidence interval around the mean and proportion are given in [Table:1]. Let us consider that the IOP of 100 normal subjects aged >30 years gave a mean of 12.4 mm Hg with SD of 3.1 mm Hg. To estimate 95% confidently the mean IOP in the population of all normal people, the sampling variation (SE) of the mean needs to be calculated as 3.1/v100 giving 0.31. This would give the lower and upper 95% confidence limits as 12.4 ± (1.96 x 0.31) which are 11.8 mm Hg and 13.0 mm Hg. This would mean that it is 95% confident that the average IOP of a population of normal people aged >30 years would lie between 11.8 and 13.0 mm Hg. A sample estimate is usually presented along with its confidence interval. It is important to realize that the size of the confidence interval is related to the size of the sample: larger the sample, the smaller the confidence interval for a given confidence level. The size of the confidence interval is also related to the confidence level specified. For a given data set, the higher the confidence level specified, the larger the confidence interval. It is not necessary that a confidence level of 95% be always considered; values of 99%, 90% or others may be used depending on the nature of the question being asked. Hypothesis testing Hypothesis testing refers to the body of statistical techniques that can be used to arrive at a yesno decision regarding a particular hypothesis.[2] It provides a framework for making decisions on an objective basis by weighing the relative probabilities of different hypotheses rather than on a subjective basis by simply looking at the data. People can form different opinions by looking at data, but a hypotheses test, otherwise called a test of significance, provides a uniform decisionmaking criterion that will be consistent for all people.[2] The hypothesis testing situation is the same as in estimation, discovering something about the populations on the basis of sampling, but the approach is quite different. A test of hypothesis determines how likely it is that observed differences in data are entirely due to sampling variation rather than due to underlying population differences. As the prime objective of a research study is to make inferences on the population based on the study of the samples, the test of hypothesis provides a methodology of determining if the observations from the sample can be expected in the population. The test calculates the probability of obtaining the observed result due to sampling variation, that is, chance. If the observations are due to sampling variation, the results cannot be generalized to the population. While on the contrary if the observed results are not due to sampling variation, the results may truly occur in the population. As an example, suppose cataract subjects were randomized into two treatment groups, namely, extracapsular cataract extraction (ECCE) with posterior chamber intraocular lens (IOL) and intracapsular cataract extraction (ICCE) with aphakic spectacles. Postoperatively, their vision in terms of visual acuity (VA) were recorded at 6 weeks, 3 months, and 6 months. Success was defined as best corrected VA better than or equal to 6/18 at their last follow up visit. After the study, it was observed that ECCE with IOL had a success rate of 74% while ICCE with aphakic spectacles had 69%. Now what could be concluded about the two techniques in terms of best corrected VA? Does ECCE with IOL result in higher success than with ICCE and aphakic spectacles? Well, the data of the study shows that ECCE with IOL has a 5% advantage over ICCE with aphakic spectacles. But can the researcher state that this result is valid for all cataract patients needing surgery? The researcher could make such a statement only if it can be proved that the observed difference of 5% has not occurred due to sampling variation or chance. Statistical tests of hypothesis provide a number of methods of determining if the observed sample results occur due to chance. The first step in performing a statistical hypothesis test is to reformulate the medical hypothesis. The medical hypothesis which the researcher wishes to test in the example is that the success rate of ECCE with IOL is higher than ICCE with aphakic spectacles. In many situations, it is far easier to disprove a proposition than to prove it. For instance, to prove that all cows were black would require an examination of every cow in the world, while one brown cow disproves the statement immediately. It is therefore easier to test the hypothesis that all cows are not black. Statistics uses a similar approach. Statistical proofs commence with the supposition that the required result is not true. When a consequent improbability occurs the supposition is rejected and the required result is thus proved. Rather than trying to prove the medical hypothesis (success rate of ECCE with IOL is higher than ICCE without IOL), a hypothesis test disproves the hypothesis that the success rate are the same. This reformulated medical hypothesis is called the null hypothesis. It states that any observed differences are entirely due to sampling errors (that is, chance) and that in reality there is no difference between groups. Corresponding to the null hypothesis there is always an alternative hypothesis which includes all the possible realities not included in the null hypothesis. The test of hypothesis could be compared with the judicial process. An individual is assumed innocent until proved guilty. The assumptions of innocence corresponds to the null hypothesis, while proven guilty refers to the rejection of the hypothesis. Here it must be noted that "proven guilty" does not refer to the absolute truth but to the decision which could possibly be wrong (corresponding to the probability of chance) of the jury (corresponding to the statistical test) on the basis of the evidence presented (corresponding to the sample observation). Similarly the test of hypothesis does not give the absolute truth, but arrives at decision based on the observation from the sample. Thus the test of hypothesis will or will not reject the null hypothesis based on the results of the particular study, with a margin of error in whatever conclusion is reached. Following this, an appropriate statistical formula (based on the assumptions about the distribution of the data in the underlying population, type of variable studied, purpose of analysis and the method of data collection) is used to calculate the probability that the difference at least as large as those found in the observed data would have occurred by chance. Formulae used for the tests of hypothesis compare the observed difference in relation to the SE (measure of the sampling variation) of the observed difference. The resulting probability value from the statistical test is called the "pvalue". If the size of this probability is large (often arbitrarily set at 5% or greater), it is accepted that the results could be spurious and due to chance and that, therefore the null hypothesis cannot be rejected. If on the other hand, the magnitude of this probability is small, there is evidence to reject the null hypothesis. Of course, to reject the null hypothesis could be wrong, but smaller the calculated probability less the chance of making a wrong decision. A rule of thumb that a pvalue less than 5% (p<0.05) leads to a rejection of the null hypothesis is fairly arbitrary but universally used. Cut off points other than 5% can be taken and which ever is chosen is called the significance level of the test. Sometimes the significance level is 1%, in which case the pvalue must be less than 1% before the null hypothesis can be rejected. The smaller the pvalue, the more reliance can be placed on the sample results as reflecting reality. In statistical analysis, the rejection of a null hypothesis is referred to as a significant result. Thus a significant result is a result which is not likely to have occurred by chance but more likely to have occurred in reality. It is important to note that the null hypothesis is never proven right or wrong but is only accepted or rejected at a given level of significance.[2] The pvalue is influenced by both the strength of the association and the sample size. A small pvalue may be consistent with a weak association while a difference between the two groups may not achieve statistical significance if the sample size is not large enough. It is therefore imperative that statistical significance correlates to clinical significance. While statistical significance relates to a low probability of occurring by chance, clinical significance relates to the relevance of findings to clinical or public health practice. Because statistical significance is in part dependent on sample size, it is possible that small and clinically unimportant results may reach statistical significance. On the other hand, a result that is important from a public health perspective may not achieve statistical significance because the study sample was not large enough to detect that difference, which means that the study was too small to draw safe conclusions. Therefore, when interpreting results, significance of the results must be from both the clinical and statistical standpoints.[3] Choice of Statistical Analysis The choice of statistical analysis is dependent on the (i) purpose of investigation, (ii) mathematical characteristics of the variable involved, (iii) method of data collection, and (iv) the statistical assumptions made about these variables. These factors are all necessary while deciding on the appropriate analysis for the data; while reading the literature it is sufficient to consider the first three to critically determine the appropriateness of the statistical analysis reported.[4] Types of variables Statistical analysis varies based on the type of variable studied. A variable in statistical terms is a characteristic which may vary from person to person and the values taken by these variables are referred to as data.[4] Variables can be classified in a number of ways. Data analysis is determined based on these classification of variables. Variables can be classified into two broad categories: qualitative and quantitative. Qualitative data are not numerical and the variables taken by a qualitative variable are usually names. Variables like sex, with categories male and female are qualitative variables. These are also called as categorical variables as they indicate categories rather than values. Some qualitative variables that have an intrinsic order or rank (for example, socioeconomic group I is, in some sense, higher than or above socioeconomic group II) are referred to as ordinal variables. Quantitative variables, on the other hand, take numerical values. If the values of a quantitative variable vary by finite specific steps, it is referred to as discrete. The variable "number of brothers/sisters" takes only integral values, therefore is considered as a discrete variable. While other quantitative variables which take any value are called continuous as the variable could have an intermediate value between any two given values. Examples of continuous variables are age and intraocular pressure. Here it is important to note that discrete variables can sometimes be treated as continuous variables. This is possible when the consecutive values of a discrete variable are not very far apart and the sample is sufficiently large. Similarly variables that are fundamentally continuous may be grouped into categories. A second scheme for classifying variables is based on whether a variable is to describe or to be described by other variables. Such a classification depends on the study objectives rather than the inherent mathematical structure of the variable.[4] If a variable under investigation is to be described in terms of other variables, we call it a response or dependent variable, while an independent variable is one which is used to describe or predict a dependent variable. For example, if one intends to compare the effect of medical and surgical treatments of glaucoma, the dependent variable would be postoperative IOP while the independent variable would be medical or surgical treatment. Method of data collection Statistical analysis is not just affected by the type of variables and the design of the study, but is also dependent on the method of data collection. Data can be either collected in an independent or matched fashion. In simple terms, independent samples are those where the actual selection of individuals for one sample group is not affected by the selection of individuals for the other groups. At the other extreme, a paired sample arises when every individual or observation in one group has a unique match or pair in the other group. For example, a comparison of the preoperative and postoperative astigmatism after cataract surgery would have paired observations, while comparison of postoperative astigmatism between two different incision sizes for cataract surgery would have independent observations. Types of Statistical Analyses The commonly used statistical analyses in ophthalmic literature include both the univariate and multivariate analysis. In univariate analysis the effect of a single independent variable is studied on a dependent variable, while in multivariate analysis the effect of more than one concurrent independent variables is studied on a dependent variable. In the following section the statistical tests of significance that are used often in the ophthalmic literature are discussed in terms of the appropriateness of their usage. Univariate statistical tests [Table:2] gives broad guidelines to understand the purpose of the commonly used univariate statistical tests. These tests are all based on statistical assumptions. Such tests that make distributional or statistical assumptions about the variable being analyzed are called parametric tests. However in many situations research is done on a small sample of subjects. When the sample size tends to be small (less than 3040), there is often doubt about whether or not a particular test is valid based on its assumptions. In such situations the variable studied is analyzed as an ordinal variable (defined previously). Most of the statistical tests of ordinal data are assumption free and these tests are called nonparametric tests or sometimes called smallsample tests. [Table:3] lists the nonparametric tests equivalent to parametric univariate tests. Multivariate statistical tests As mentioned previously, multivariate analysis is done when a variable under study is affected by more than one independent variable. In such situations it is important to study the effect of each independent variable on the dependent variable being studied in the presence of all the other independent variables. Let us consider a study which intends to determine the effect of independent factors like age, socioeconomic status, sunlight exposure, diet, diabetes, and smoking on the occurrence of cataract. Here, if only the univariate analysis like the effect of age on the occurrence of cataract or the effect of diet on the occurrence of cataract is carried out, the effect of each of these independent factors would be incorrectly estimated. This is due to the fact that studying in isolation the effect of each factor on the occurrence of cataract which in reality is a result of many additive factors, would lead to incorrect estimates and conclusions. In this situation, the true effect of each of these additive factors on the occurrence of cataract can be estimated only after adjusting for the effect of all other factors which also contribute in some measure to the occurrence of cataract. Therefore, if the number of variables affecting an outcome variable like the occurrence of cataract, is more than one, univariate analysis would yield biased results as the estimate of the effect of each variable is not adjusted or controlled for the effects of the rest of the variables. Estimating the effect of each of the independent variables after adjusting for the effect of all other independent variables is done using multivariate analysis. [Table:4] shows some of the commonly used multivariate tests in ophthalmic literature. Conclusion Basic knowledge about the need and concept of statistical analysis, and the commonly used statistical tests in the medical literature, can enable better interpretation of the presented data. Classification of the various statistical tests in this article is directed towards understanding the purpose of various statistical methods used commonly in the ophthalmic literature. It is crucial to note that while deciding on a particular statistical method to use for a given data set, one must carefully check the statistical assumptions being made. The tests and formulae vary based on the assumptions made. It must also be noted that there are several other statistical methods used for medical data, but the ones mentioned here are some of the common methods. While this paper attempts to briefly elucidate the concepts and the usage of statistical analysis, it must be mentioned that the readers who wish to study the science of medical statistics or intend to carry out an epidemiological study, further extensive reading is recommended. References


