|Year : 2020 | Volume
| Issue : 1 | Page : 19-24
Probability and inferential statistics
Rakesh Garg1, S Bala Bhaskar2, Sabyasachi Das3, SS Harsoor4
1 Department of Onco-Anaesthesia and Palliative Medicine, Dr. BRAIRCH, AIIMS, New Delhi, India
2 Department of Anaesthesiology, Vijayanagar Institute of Medical Sciences, Ballari, Karnataka, India
3 Department of Anaesthesiology, Medical College, Kolkata, West Bengal, India
4 Department of Anaesthesiology, Ambedkar Medical College, Bengaluru, Karnataka, India
|Date of Submission||18-Apr-2020|
|Date of Acceptance||26-Apr-2020|
|Date of Web Publication||30-May-2020|
Dr. S Bala Bhaskar
Vijayanagar Institute of Medical Sciences, Ballari, Karnataka
Source of Support: None, Conflict of Interest: None
Application of statistical tools is essential for appropriate understanding of the collected data in clinical trials. The types of variables for the study are also decided in advance and the relevant statistical tools identified in the planning stage of the study. The tests applied also depend on the distribution of data and hence this assessment needs to be done before application of a particular statistical tool. The variables are summarised to better understand the large pool of collected data, done using descriptive statistics. To compare this summarised data across different study groups, inferential statistics are required. Inferential statistics provide the significance of differences (e.g., P value and confidence intervals) helping the researchers and readers to confirm the differences among the groups. Strong methodology and clinical knowledge of the chosen research question should form the background for statistical analysis. We provide a brief review of the basic statistical tools, including probability and inferential statistics.
Keywords: Confidence intervals, non-parametric statistics, parametric statistics, probability, regression
|How to cite this article:|
Garg R, Bhaskar S B, Das S, Harsoor S S. Probability and inferential statistics. Airway 2020;3:19-24
| Introduction|| |
Outcomes and observations during research are referred to as variables, in that they can have different values (i.e. they can vary). There are two types of variables, qualitative (categorical) and quantitative (numerical). Qualitative variables are expressed as frequency, proportions and percentages and quantitative variables as mean/median, standard deviation (SD), range (maximum − minimum) and interquartile range (IQR).
Observed variables are measured for uniformity of distribution and then analysed further to estimate certain characteristics of a population, within a group and also between groups (2 or more groups), helping to decide if one group differs significantly from the other.
The summary of variables is defined using descriptive statistics and comparisons of variables are defined mainly by inferential statistics. It is important to note that inferential statistics extrapolate sample data to generalisations, usually with calculated degrees of certainty.
| Concepts Associated With the Application of Inferential Statistics|| |
Type of variable
The primary requirement is identifying the variables and differentiating as categorical and numerical, followed by their measurements on the basis of measures of ‘central tendency’ and ‘dispersion'.
'Univariate’ refers to the analysis of one ('uni') variable at a time. It does not deal with causes or relationships and its major purpose is to describe. It summarises data and finds patterns in the data. The variable in univariate analysis is just a condition or subset that the data fall into. It can be taken as a ‘category'. For example, the analysis might look at a variable of ‘age’ or it might look at ‘height’ or ‘weight’ for univariate analysis. ‘Bivariate’ refers to the analysis of exactly two variables at a time, e.g., the weight of the patient and size of the supraglottic airway device used. ‘Multivariate’ is the analysis of more than two variables at a time (e.g., weight of the patient, size of the supraglottic airway device and the relation of both these factors to the ease of insertion of the supraglottic airway device in that patient).
Distribution of data
The distribution of data can be uniform around a central value (symmetrical/Gaussian) or non-uniform (asymmetrical/non-Gaussian). With Gaussian distribution, a symmetrical bell-shaped curve is obtained, with a mean (μ) of 0 and a SD (sigma-σ) of 1 [Figure 1]a. This is also known as the z distribution. The data representation and statistical tests applied vary depending on the nature of distribution of data, i.e., Gaussian or non-Gaussian data. Gaussian data have better validity with respect to statistical tests.
|Figure 1: (a) Gaussian distribution of data, (b) right skewed distribution of data and (c) left skewed distribution of data|
Click here to view
When the distribution is Gaussian, one SD, two SD and three SD on either side of the mean corresponds to 68%, 95.4% and 99.7% of the total area, respectively, while close to 95% of the population lies within 1.96 SD [Figure 2]. When the distribution is asymmetrical/non-Gaussian, the data tend to be skewed either to the right or to the left of the mean [Figure 1]b and [Figure 1]c.
|Figure 2: Symmetrical distribution - mean (μ), standard deviation (SD/σ)|
Click here to view
| Measures of Variability and Precision|| |
SD is therefore a measure of variability and should be quoted when describing the distribution of sample data. In contrast, standard error (SE) is the SD of sample means (from 2 or more groups), used to calculate 95% confidence intervals (CIs), and so is a measure of precision (of how well sample data can be used to predict a population parameter). SE is numerically a much smaller value than SD and is often presented (wrongly) for this reason.
The important inferential statistics that are discussed in this article are those that test confidence – P value and CI, those that test differences – parametric (t-test) and non-parametric tests (Chi-square test) and those that analyse relationships – correlation and regression. The last part is discussed in detail in a previous issue of this journal and hence only brief mention is made in the relevant section.
| Statistics That Test Confidence|| |
Once data are gathered from a study, statistical testing is performed and the statistical test looks at the likelihood that a certain result would have occurred based on some assumptions/hypothesis about the underlying population and outcomes being studied. However, certain proportion of results favouring the hypothesis could occur merely by chance despite the best methodology and this is the purpose of the P value, a measure of the effect of chance within a study. It is not the probability that the result of the study is true or correct. This ‘chance’ occurrence is universal, but the aim is to keep this to a minimum; an average value of 0.05 is universally accepted. That is, the result could occur by chance in 5 out of 100 instances. The P value lies outside of the 2 SD [Figure 3].
|Figure 3: P value is outside two standard deviations (outside 95%) on either side of the Gaussian curve and confidence interval lies within 95% with 2 standard deviation|
Click here to view
The CI is the range of likely values for a population parameter, such as the population mean. It is an estimate to provide a range that is likely to include the true value; the boundaries of a CI ('the confidence limits') give values within which there is a high probability (95% by convention and with P value of 5%) that the true population value can be found. The calculation of a CI considers the SD of the data and the number of observations. Thus, a CI narrows as the number of observations increase, or its variance (dispersion) decreases. The 95% CI, by convention, is indicative of 2 SD in normal distribution [Figure 3] and [Figure 4]. CI can also be constructed around proportions.
|Figure 4: A 95% confidence interval indicates that 19 (blue lines, 95%) out of 20 samples (95%) from the same population will produce confidence intervals that contain the population parameter (red line is outside of the confidence interval, horizontal line represents confidence interval)|
Click here to view
| Statistical Significance – P Value and Confidence Interval|| |
The choice of a specific cut-off point for a P value or degree of confidence for CI, as mentioned already, is arbitrary but conventional (P value of 0.05; 95% CI); there is no particular reason why a different P value, e.g. 0.02 and a corresponding 98% CI could not be the standard for calling a result ‘statistically significant'. Different values may be targeted based on the importance of the hypothesised outcome; for example, minimal P and maximal CI may need to be targeted in mortality-related studies. It is important not to draw too many conclusions from the P value; there may not be correlative practical implications of P values on either side of the chosen P value. The value of 0.001 (rather than 0.05) need not be clearly indicative of a result of the study being tightly restricted to avoid results occurring by ‘chance'. Furthermore, a marginally higher value such as 0.06 need not also mean that the errors have possibly been higher, leading to higher rates of events occurring by chance. Despite this loss of predictability, it is common practice to take very small P values as stronger evidence in support of a hypothesis than P values close to 0.05.
| Statistics Which Test Differences|| |
The differences between the observations of parameters (primary and secondary) used in research in terms of the ‘statistics of significance’ – the actual values of ‘P’ within a group and between the groups (discussed above) can be assessed by applying certain statistical tests. Those which are in common use in anaesthesia practice can be grouped as under:
For numerical data [Figure 5] and [Figure 6]:
|Figure 6: Comparing numerical data from groups (non-uniform distribution)|
Click here to view
- Tests with uniform distribution of data, within same group, e.g., paired t-test and repeated measures of analysis of variance (ANOVA)
- Tests with non-uniform distribution of data, within same group, e.g., Wilcoxon rank-sum test and Friedman's test.
- Tests with uniform distribution of data between groups, e.g., Student's t-test and one-way ANOVA
- Tests with non-uniform distribution of data between groups, e.g., Wilcoxon rank-sum test and Kruskal–Wallis test.
For categorical data [Figure 7]:
- Tests for comparison of data within same group, e.g., McNemar's test (McNemar's Chi-square test)
- Tests for comparison of data between groups, e.g. Chi-square test and Fisher's exact test. Traditionally, Fisher's exact test is used for small samples of <30, even though it is good for larger numbers as well.
| Choosing the Suitable Statistical Test|| |
It is advisable to have the basic knowledge of the factors that come into play when choosing the test for analysing each of the outcomes, and to discuss this with a qualified biostatistician, so that the most relevant tests are applied. Factors to be considered when narrowing down to the most suitable test (by no means complete) are discussed as follows:,,
- Type of variable considered and how they are measured
- Distribution of data: Gaussian or non-Gaussian
Gaussian distribution is demonstrated either (a) graphically or (b) by comparing the values of mean and median/mean and SD.
- The data can be plotted on a graph and distribution assessed
Difference between the mean and median values of variable are checked – If the difference is more than 1.5%, or the SD is more than 40%, it represents a non-Gaussian distribution. The interval limits are calculated mathematically as mean – 1.96 SD and mean + 1.96 SD. At least 95% of data should lie within these limits – if >5% deviation exists, it is possibly a non-normal distribution.
- If the distribution is Gaussian, data are more amenable to statistical analysis and parametric tests are applied. If the distribution is non-Gaussian, non-normal or skewed, data can be transformed so that they approximate a normal distribution, commonly by log transformation whereby the natural logarithms of the raw data are analysed to calculate a mean and SD (data transformation). The anti-logarithm of the mean of these transformed data is known as the geometric mean. One of the ‘goodness of fit’ tests, Kolmogorov–Smirnov or Shapiro–Wilk tests are applied for this test to see if the transformed data approximate to a normal distribution. If they do, they can then be analysed with parametric tests. If the non-Gaussian characteristic stays, non-parametric tests are to be applied
- Check what type of variables to compare?
- Numerical data
- Categorical data.
- Check whether the data are related or unrelated?
Related means comparison within the same group of a variable, before or after an event or intervention – also called ‘dependent variable’ comparison. Unrelated means comparison of variables between groups – also called ‘independent variable’ comparison
- Define X – the number of groups and Y – normal or non-normal distribution and choose one of the statistical tests as per the figures below [Figure 5], [Figure 6], [Figure 7].
Example: In a hypothetical study comparing the insertion of ProSeal laryngeal mask airway (LMA) and the i-gel in 85 adult patients, the insertion times in ProSeal group were 44.7 s, 52.7 s, 38.5 s, 48.6 s, 50.7 s,…… 43.5 s and in i-gel group, 34.6 s, 42.1 s, 40.3 s, 28.8 s, 31.4 s,……30.6 s. The average insertion times (mean with SD) were 46.45 ± 15.25 s and 34.63 ± 14.18 s. In the same study, the first attempt successful insertion rate was 83.7% and 95.1%, respectively, for the ProSeal and i-gel.
Consider addition of a 3rd study group using LMA Supreme with similar sample size and the insertion times as 53.7 s, 51.5 s, 48.3 s, 48.6 s, 51.1 s…… 59.5 s. If the average insertion time (mean with SD) was 52.10 ± 20.23 s and the first attempt successful insertion rate was 93.9%, what tests can be applied within each group and among groups for these parameters?
Let us take the distribution of data as symmetrical for discussion purposes (with bell shaped curve obtained by plotting the data). Between each group in the 2-group study, the insertion time differences can be assessed by Student's t-test. The paired t-test is used to compare the insertion times within each group. In the 3-group scenario, the insertion time differences among the groups can be assessed by one-way ANOVA and within each of the groups by repeated measures of ANOVA [Figure 5].
If the distribution of data is asymmetrical (with skewing on left or right) and also evidenced by wide SD, and logarithmic conversion with correction is not possible with ‘goodness of fit’ tests, non-parametric tests are applied [Figure 6]. For 2 groups (ProSeal LMA and i-gel), comparison of insertion times between groups is made using Wilcoxon Rank Sum test and within each group, by Wilcoxon Signed Rank test. For 3 groups (adding the LMA Supreme group), comparison of insertion times among the groups is made using Kruskal–Wallis test and within each group by Friedman's test.
With regard to first attempt success rates for comparison between 2 groups, Chi-square test, Fisher's exact test or Mann–Whitney U-test are used. Within each group, the McNemar's test is used. If 3rd group is also considered, Chi-square test and Fisher's exact test are used. Within each group, the McNemar's test is used [Figure 7].
| Statistics Which Analyse Relationships - Correlation and Regression Analysis|| |
To find out the relationship within a given data or to find the association between two quantitative variables, two statistical approaches are used: Regression and correlation. Regression analysis predicts the value of the dependent variable based on the known value of the independent variable, assuming that there exists average mathematical relationship between two or more variables. Correlation is described as the analysis which lets us know the association or the absence of the relationship between two variables ‘X’ and ‘Y’ and ranges between − 1 (perfect negative correlation) to + 1 (perfect positive correlation). Pearson's correlation coefficient is the commonly applied measure when data distribution is normal and Spearman's Rank correlation is applied when there is asymmetrical distribution.
For regression analysis as tool for correlations, some of the approaches are linear regression, cox regression and logistic regression. Readers may refer to the details of these in a previous issue of this journal. Sometimes, when access to raw data is not available, statistical methods are available to convert measures from one form to another.
| Practice Pearls|| |
- Variables used in a study must be identified as categorical and numerical
- Distribution of numerical data should be checked for uniformity
- Measures of central tendency and dispersion used are identified
- The Gaussian curve has clearly defined divisions of SDs around mean, useful in describing the distribution of data, the CI and the P value
- For uniform distribution, parametric tests are applied and for asymmetric (non-Gaussian) distribution, non-parametric tests are applied
- For asymmetric distribution, logarithmic correction can be attempted using ‘goodness of fit’ tests
- Specific tests are applied based on comparisons of variables within groups and among 2 or more groups
- Relationships and correlations with respect to data can be addressed by tests of regression and correlation.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Goneppanavar U, Ali Z, Bhaskar SB, Divatia JV. Types of data, methods of collection, handling and distribution. Airway 2019;2:36-40. [Full text]
Ali Z, Bhaskar SB, Sudheesh K. Descriptive statistics: Measures of central tendency, dispersion, correlation and regression. Airway 2019;2:120-5. [Full text]
Ali Z, Bhaskar SB. Basic statistical tools in research and data analysis. Indian J Anaesth 2016;60:662-9.
] [Full text]
Comparing Groups. Numerical data. In: Myles PS, Gin T, editors. Statistical Methods for Anaesthesia and Intensive Care. Oxford: Butterworth-Heinemann; 2000. p. 51-66.
Comparing Groups. Categorical data. In: Myles PS, Gin T, editors. Statistical Methods for Anaesthesia and Intensive Care. Oxford: Butterworth-Heinemann; 2000. p. 69-73.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]