

SPECIAL ARTICLE 

Year : 2020  Volume
: 3
 Issue : 1  Page : 1924 

Probability and inferential statistics
Rakesh Garg^{1}, S Bala Bhaskar^{2}, Sabyasachi Das^{3}, SS Harsoor^{4}
^{1} Department of OncoAnaesthesia and Palliative Medicine, Dr. BRAIRCH, AIIMS, New Delhi, India ^{2} Department of Anaesthesiology, Vijayanagar Institute of Medical Sciences, Ballari, Karnataka, India ^{3} Department of Anaesthesiology, Medical College, Kolkata, West Bengal, India ^{4} Department of Anaesthesiology, Ambedkar Medical College, Bengaluru, Karnataka, India
Date of Submission  18Apr2020 
Date of Acceptance  26Apr2020 
Date of Web Publication  30May2020 
Correspondence Address: Dr. S Bala Bhaskar Vijayanagar Institute of Medical Sciences, Ballari, Karnataka India
Source of Support: None, Conflict of Interest: None
DOI: 10.4103/ARWY.ARWY_13_20
Application of statistical tools is essential for appropriate understanding of the collected data in clinical trials. The types of variables for the study are also decided in advance and the relevant statistical tools identified in the planning stage of the study. The tests applied also depend on the distribution of data and hence this assessment needs to be done before application of a particular statistical tool. The variables are summarised to better understand the large pool of collected data, done using descriptive statistics. To compare this summarised data across different study groups, inferential statistics are required. Inferential statistics provide the significance of differences (e.g., P value and confidence intervals) helping the researchers and readers to confirm the differences among the groups. Strong methodology and clinical knowledge of the chosen research question should form the background for statistical analysis. We provide a brief review of the basic statistical tools, including probability and inferential statistics.
Keywords: Confidence intervals, nonparametric statistics, parametric statistics, probability, regression
How to cite this article: Garg R, Bhaskar S B, Das S, Harsoor S S. Probability and inferential statistics. Airway 2020;3:1924 
Introduction   
Outcomes and observations during research are referred to as variables, in that they can have different values (i.e. they can vary). There are two types of variables, qualitative (categorical) and quantitative (numerical). Qualitative variables are expressed as frequency, proportions and percentages and quantitative variables as mean/median, standard deviation (SD), range (maximum − minimum) and interquartile range (IQR).^{[1]}
Observed variables are measured for uniformity of distribution and then analysed further to estimate certain characteristics of a population, within a group and also between groups (2 or more groups), helping to decide if one group differs significantly from the other.
The summary of variables is defined using descriptive statistics^{[2]} and comparisons of variables are defined mainly by inferential statistics. It is important to note that inferential statistics extrapolate sample data to generalisations, usually with calculated degrees of certainty.
Concepts Associated With the Application of Inferential Statistics   
Type of variable
The primary requirement is identifying the variables and differentiating as categorical and numerical, followed by their measurements on the basis of measures of ‘central tendency’ and ‘dispersion'.
Variable sets
'Univariate’ refers to the analysis of one ('uni') variable at a time. It does not deal with causes or relationships and its major purpose is to describe. It summarises data and finds patterns in the data. The variable in univariate analysis is just a condition or subset that the data fall into. It can be taken as a ‘category'. For example, the analysis might look at a variable of ‘age’ or it might look at ‘height’ or ‘weight’ for univariate analysis. ‘Bivariate’ refers to the analysis of exactly two variables at a time, e.g., the weight of the patient and size of the supraglottic airway device used. ‘Multivariate’ is the analysis of more than two variables at a time (e.g., weight of the patient, size of the supraglottic airway device and the relation of both these factors to the ease of insertion of the supraglottic airway device in that patient).
Distribution of data
The distribution of data can be uniform around a central value (symmetrical/Gaussian) or nonuniform (asymmetrical/nonGaussian).^{[3]} With Gaussian distribution, a symmetrical bellshaped curve is obtained, with a mean (μ) of 0 and a SD (sigmaσ) of 1 [Figure 1]a. This is also known as the z distribution. The data representation and statistical tests applied vary depending on the nature of distribution of data, i.e., Gaussian or nonGaussian data. Gaussian data have better validity with respect to statistical tests.  Figure 1: (a) Gaussian distribution of data, (b) right skewed distribution of data and (c) left skewed distribution of data
Click here to view 
When the distribution is Gaussian, one SD, two SD and three SD on either side of the mean corresponds to 68%, 95.4% and 99.7% of the total area, respectively, while close to 95% of the population lies within 1.96 SD [Figure 2]. When the distribution is asymmetrical/nonGaussian, the data tend to be skewed either to the right or to the left of the mean [Figure 1]b and [Figure 1]c.  Figure 2: Symmetrical distribution  mean (μ), standard deviation (SD/σ)
Click here to view 
Measures of Variability and Precision   
SD is therefore a measure of variability and should be quoted when describing the distribution of sample data. In contrast, standard error (SE) is the SD of sample means (from 2 or more groups), used to calculate 95% confidence intervals (CIs), and so is a measure of precision (of how well sample data can be used to predict a population parameter). SE is numerically a much smaller value than SD and is often presented (wrongly) for this reason.
The important inferential statistics that are discussed in this article are those that test confidence – P value and CI, those that test differences – parametric (ttest) and nonparametric tests (Chisquare test) and those that analyse relationships – correlation and regression. The last part is discussed in detail in a previous issue of this journal and hence only brief mention is made in the relevant section.^{[2]}
Statistics That Test Confidence   
Pvalue
Once data are gathered from a study, statistical testing is performed and the statistical test looks at the likelihood that a certain result would have occurred based on some assumptions/hypothesis about the underlying population and outcomes being studied. However, certain proportion of results favouring the hypothesis could occur merely by chance despite the best methodology and this is the purpose of the P value, a measure of the effect of chance within a study. It is not the probability that the result of the study is true or correct. This ‘chance’ occurrence is universal, but the aim is to keep this to a minimum; an average value of 0.05 is universally accepted. That is, the result could occur by chance in 5 out of 100 instances. The P value lies outside of the 2 SD [Figure 3].  Figure 3: P value is outside two standard deviations (outside 95%) on either side of the Gaussian curve and confidence interval lies within 95% with 2 standard deviation
Click here to view 
Confidence interval
The CI is the range of likely values for a population parameter, such as the population mean.^{[2]} It is an estimate to provide a range that is likely to include the true value; the boundaries of a CI ('the confidence limits') give values within which there is a high probability (95% by convention and with P value of 5%) that the true population value can be found. The calculation of a CI considers the SD of the data and the number of observations. Thus, a CI narrows as the number of observations increase, or its variance (dispersion) decreases. The 95% CI, by convention, is indicative of 2 SD in normal distribution [Figure 3] and [Figure 4]. CI can also be constructed around proportions.  Figure 4: A 95% confidence interval indicates that 19 (blue lines, 95%) out of 20 samples (95%) from the same population will produce confidence intervals that contain the population parameter (red line is outside of the confidence interval, horizontal line represents confidence interval)
Click here to view 
Statistical Significance – P Value and Confidence Interval   
The choice of a specific cutoff point for a P value or degree of confidence for CI, as mentioned already, is arbitrary but conventional (P value of 0.05; 95% CI); there is no particular reason why a different P value, e.g. 0.02 and a corresponding 98% CI could not be the standard for calling a result ‘statistically significant'. Different values may be targeted based on the importance of the hypothesised outcome; for example, minimal P and maximal CI may need to be targeted in mortalityrelated studies. It is important not to draw too many conclusions from the P value; there may not be correlative practical implications of P values on either side of the chosen P value. The value of 0.001 (rather than 0.05) need not be clearly indicative of a result of the study being tightly restricted to avoid results occurring by ‘chance'. Furthermore, a marginally higher value such as 0.06 need not also mean that the errors have possibly been higher, leading to higher rates of events occurring by chance. Despite this loss of predictability, it is common practice to take very small P values as stronger evidence in support of a hypothesis than P values close to 0.05.
Statistics Which Test Differences   
The differences between the observations of parameters (primary and secondary) used in research in terms of the ‘statistics of significance’ – the actual values of ‘P’ within a group and between the groups (discussed above) can be assessed by applying certain statistical tests. Those which are in common use in anaesthesia practice can be grouped as under:
For numerical data [Figure 5] and [Figure 6]:  Figure 6: Comparing numerical data from groups (nonuniform distribution)
Click here to view 
 Tests with uniform distribution of data, within same group, e.g., paired ttest and repeated measures of analysis of variance (ANOVA)
 Tests with nonuniform distribution of data, within same group, e.g., Wilcoxon ranksum test and Friedman's test.
 Tests with uniform distribution of data between groups, e.g., Student's ttest and oneway ANOVA
 Tests with nonuniform distribution of data between groups, e.g., Wilcoxon ranksum test and Kruskal–Wallis test.
For categorical data [Figure 7]:
 Tests for comparison of data within same group, e.g., McNemar's test (McNemar's Chisquare test)
 Tests for comparison of data between groups, e.g. Chisquare test and Fisher's exact test. Traditionally, Fisher's exact test is used for small samples of <30, even though it is good for larger numbers as well.
Choosing the Suitable Statistical Test   
It is advisable to have the basic knowledge of the factors that come into play when choosing the test for analysing each of the outcomes, and to discuss this with a qualified biostatistician, so that the most relevant tests are applied. Factors to be considered when narrowing down to the most suitable test (by no means complete) are discussed as follows:^{[3],[4],[5]}
 Type of variable considered and how they are measured
 Distribution of data: Gaussian or nonGaussian
Gaussian distribution is demonstrated either (a) graphically or (b) by comparing the values of mean and median/mean and SD.
 The data can be plotted on a graph and distribution assessed
Difference between the mean and median values of variable are checked – If the difference is more than 1.5%, or the SD is more than 40%, it represents a nonGaussian distribution. The interval limits are calculated mathematically as mean – 1.96 SD and mean + 1.96 SD. At least 95% of data should lie within these limits – if >5% deviation exists, it is possibly a nonnormal distribution.^{[6]}
 If the distribution is Gaussian, data are more amenable to statistical analysis and parametric tests are applied. If the distribution is nonGaussian, nonnormal or skewed, data can be transformed so that they approximate a normal distribution, commonly by log transformation whereby the natural logarithms of the raw data are analysed to calculate a mean and SD (data transformation). The antilogarithm of the mean of these transformed data is known as the geometric mean. One of the ‘goodness of fit’ tests, Kolmogorov–Smirnov or Shapiro–Wilk tests are applied for this test to see if the transformed data approximate to a normal distribution. If they do, they can then be analysed with parametric tests. If the nonGaussian characteristic stays, nonparametric tests are to be applied
 Check what type of variables to compare?
 Numerical data
 Categorical data.
 Check whether the data are related or unrelated?
Related means comparison within the same group of a variable, before or after an event or intervention – also called ‘dependent variable’ comparison. Unrelated means comparison of variables between groups – also called ‘independent variable’ comparison
 Define X – the number of groups and Y – normal or nonnormal distribution and choose one of the statistical tests as per the figures below [Figure 5], [Figure 6], [Figure 7].
Example: In a hypothetical study comparing the insertion of ProSeal laryngeal mask airway (LMA) and the igel in 85 adult patients, the insertion times in ProSeal group were 44.7 s, 52.7 s, 38.5 s, 48.6 s, 50.7 s,…… 43.5 s and in igel group, 34.6 s, 42.1 s, 40.3 s, 28.8 s, 31.4 s,……30.6 s. The average insertion times (mean with SD) were 46.45 ± 15.25 s and 34.63 ± 14.18 s. In the same study, the first attempt successful insertion rate was 83.7% and 95.1%, respectively, for the ProSeal and igel.
Consider addition of a 3^{rd} study group using LMA Supreme with similar sample size and the insertion times as 53.7 s, 51.5 s, 48.3 s, 48.6 s, 51.1 s…… 59.5 s. If the average insertion time (mean with SD) was 52.10 ± 20.23 s and the first attempt successful insertion rate was 93.9%, what tests can be applied within each group and among groups for these parameters?
Let us take the distribution of data as symmetrical for discussion purposes (with bell shaped curve obtained by plotting the data). Between each group in the 2group study, the insertion time differences can be assessed by Student's ttest. The paired ttest is used to compare the insertion times within each group. In the 3group scenario, the insertion time differences among the groups can be assessed by oneway ANOVA and within each of the groups by repeated measures of ANOVA [Figure 5].
If the distribution of data is asymmetrical (with skewing on left or right) and also evidenced by wide SD, and logarithmic conversion with correction is not possible with ‘goodness of fit’ tests, nonparametric tests are applied [Figure 6]. For 2 groups (ProSeal LMA and igel), comparison of insertion times between groups is made using Wilcoxon Rank Sum test and within each group, by Wilcoxon Signed Rank test. For 3 groups (adding the LMA Supreme group), comparison of insertion times among the groups is made using Kruskal–Wallis test and within each group by Friedman's test.
With regard to first attempt success rates for comparison between 2 groups, Chisquare test, Fisher's exact test or Mann–Whitney Utest are used. Within each group, the McNemar's test is used. If 3^{rd} group is also considered, Chisquare test and Fisher's exact test are used. Within each group, the McNemar's test is used [Figure 7].
Statistics Which Analyse Relationships  Correlation and Regression Analysis   
To find out the relationship within a given data or to find the association between two quantitative variables, two statistical approaches are used: Regression and correlation. Regression analysis predicts the value of the dependent variable based on the known value of the independent variable, assuming that there exists average mathematical relationship between two or more variables. Correlation is described as the analysis which lets us know the association or the absence of the relationship between two variables ‘X’ and ‘Y’ and ranges between − 1 (perfect negative correlation) to + 1 (perfect positive correlation). Pearson's correlation coefficient is the commonly applied measure when data distribution is normal and Spearman's Rank correlation is applied when there is asymmetrical distribution.
For regression analysis as tool for correlations, some of the approaches are linear regression, cox regression and logistic regression. Readers may refer to the details of these in a previous issue of this journal.^{[2]} Sometimes, when access to raw data is not available, statistical methods are available to convert measures from one form to another.
Practice Pearls   
 Variables used in a study must be identified as categorical and numerical
 Distribution of numerical data should be checked for uniformity
 Measures of central tendency and dispersion used are identified
 The Gaussian curve has clearly defined divisions of SDs around mean, useful in describing the distribution of data, the CI and the P value
 For uniform distribution, parametric tests are applied and for asymmetric (nonGaussian) distribution, nonparametric tests are applied
 For asymmetric distribution, logarithmic correction can be attempted using ‘goodness of fit’ tests
 Specific tests are applied based on comparisons of variables within groups and among 2 or more groups
 Relationships and correlations with respect to data can be addressed by tests of regression and correlation.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Goneppanavar U, Ali Z, Bhaskar SB, Divatia JV. Types of data, methods of collection, handling and distribution. Airway 2019;2:3640. [Full text] 
2.  Ali Z, Bhaskar SB, Sudheesh K. Descriptive statistics: Measures of central tendency, dispersion, correlation and regression. Airway 2019;2:1205. [Full text] 
3.  Ali Z, Bhaskar SB. Basic statistical tools in research and data analysis. Indian J Anaesth 2016;60:6629. [ PUBMED] [Full text] 
4.  Comparing Groups. Numerical data. In: Myles PS, Gin T, editors. Statistical Methods for Anaesthesia and Intensive Care. Oxford: ButterworthHeinemann; 2000. p. 5166. 
5.  Comparing Groups. Categorical data. In: Myles PS, Gin T, editors. Statistical Methods for Anaesthesia and Intensive Care. Oxford: ButterworthHeinemann; 2000. p. 6973. 
6.  
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7]
