|Year : 2019 | Volume
| Issue : 2 | Page : 64-70
Describing and displaying numerical and categorical data
Sudheesh Kannan1, Pradeep A Dongare2, Rakesh Garg3, SS Harsoor4
1 Department of Anaesthesiology, Bangalore Medical College and Research Institute, Bengaluru, Karnataka, India
2 Department of Anaesthesiology, ESIC Medical College-PGIMSR, Bengaluru, Karnataka, India
3 Department of Anaesthesiology, Dr. BRAIRCH, AIIMS, New Delhi, India
4 Department of Anaesthesiology, Dr. B. R. Ambedkar Medical College, Bengaluru, Karnataka, India
|Date of Web Publication||28-Aug-2019|
Dr. Pradeep A Dongare
Department of Anaesthesiology, ESIC Medical College-PGIMSR, Rajajinagar, Bengaluru, Karnataka
Source of Support: None, Conflict of Interest: None
The set of observations recorded during research work is termed data. Data can be described as numerical or categorical. While numerical data are further divided into discrete or continuous, categorical data are further divided into nominal or ordinal data. These data may be represented in a textual manner or with the help of illustrations (tables or graphs). The selection of a proper mode of representation of data helps in the optimal understanding of results. The level of importance of each parameter determines the mode of representation. The present article attempts to introduce the various methods of data presentation and throw some light on the benefits and limitations of each mode of data presentation.
Keywords: Box and whisker plot, graph, line, results, table
|How to cite this article:|
Kannan S, Dongare PA, Garg R, Harsoor S S. Describing and displaying numerical and categorical data. Airway 2019;2:64-70
|How to cite this URL:|
Kannan S, Dongare PA, Garg R, Harsoor S S. Describing and displaying numerical and categorical data. Airway [serial online] 2019 [cited 2019 Sep 20];2:64-70. Available from: http://www.arwy.org/text.asp?2019/2/2/64/265618
| Introduction|| |
Data are individual pieces of factual information (set of parameters) recorded during research and are based on the primary and secondary objectives of the study. The objective(s) must be clear and precise so that the data collected will also precisely represent the research question. Both the collection of data and its representation in an effective way are of paramount importance while reporting a study. It involves describing data in an appropriate format so that the reader can easily understand the observations at a single glance. There are various methods of representation of data based on the type of data and their collection process. It is essential to select the most ideal way to represent the data for easy understanding.
| Describing Data|| |
Data collection is largely dependent on the objectives/research question. The more focused the research question is, the more precise will be the data collected. It is important to keep in mind the primary objective while representing the data. The use of a proper tool for presenting the primary outcome measure is essential for better understanding of the primary outcome parameter. However, steps to generate a good research question are based on Population, Intervention, Comparator, Outcome, and Time frame criteria. These are beyond the scope of this article and hence not discussed further.
Data can be classified as numerical (quantitative) and categorical (qualitative). Numerical data are classified as (i) discrete where the outcomes are expressed as finite numbers (e.g., number of attempts at intubation) or (ii) continuous where the outcomes can assume any value (e.g., height in centimetres). Categorical data are further classified as (i) nominal when the outcome variable is expressed as binary measure (e.g., male or female) or (ii) ordinal when the outcome variable can be arranged in an orderly fashion (e.g., Mallampati classification of airway). Further details regarding the description of data can be obtained from an earlier published article in this special series. The data collected should undergo a process of 'cleansing' as is also described in the same article.
Data can be represented in the results section as (i) text, (ii) tables and (iii) graphs. Each of these representations is unique and suited for representing particular types of data. There is no single rule which says a particular type of representation has to be used for a particular type of data. For example, the success rate of the use of an airway device can be represented either as a table or as a bar graph. Methods of presentation must be determined according to the data format, the method of analysis to be used and the information to be emphasised. Inappropriately presented data fail to convey information clearly to readers and reviewers. The type of representation is dependent on the place it holds while answering a research question.
In research where a variable representing a primary outcome measure needs to be expressed in detail, or if a parameter has more values to be shown, then a table may be more suitable. For instance, a study assessing haemodynamic responses to the placement of an airway device where parameters are recorded at multiple time intervals would be best represented by a table.
A general consensus while projecting the results is to have a description of demographics as the first paragraph (unless it is a primary research question), followed by the expression of primary and then secondary outcome parameters. This can be followed by untoward effects observed during the study. Every journal has its own way of presenting illustrations (that include tables and graphs), and journals provide clear instructions regarding the presentation of the content and labelling of tables and graphs. The following text is intended to give a brief description of each method of data representation and the examples (tables and graphs) generated based on imaginary values which are only for the purpose of illustration.
Text can be the main method of conveying information as it is used to explain results and trends and provides contextual information. Parameters expressed as tables or graphs can also be elaborated in the text and hence convey the proper information. Text is a good way of representation if the data contains one or two numbers, as tables or graphs for such representation may occupy far too much space without conveying additional information. For example, in a study comparing the success of intubation with the Glidescope® and the conventional laryngoscope, the comparison of overall success rates can be expressed in the text as the 'overall success rate with direct laryngoscopy was 99%, whereas it was 96% with the Glidescope®.' In addition, some of the important details in tables and graphs can be emphasised well in text. However, when more information needs to be provided, the use of text may occupy more space. The reader may not only take a longer time to read it but will also not be able to assimilate the full information. However, it is important to remember that data presented in the text should not be duplicated in the tables or graphs.
Tables help us to express a large amount of information or trends in a compact way which can be read and understood at a glance. Tables are useful when exact figures including decimals need to be expressed, especially when data pertaining to primary outcome measure need to be presented in detail. A table can also be used when multiple parameters need to be presented in a concise manner [Table 1]. Both qualitative and quantitative data can be represented using tables.
Tables generally have five components [Figure 1]. The title (heading or caption), the row headings (also called stubs), the column headings, the data fields and the footnote. Sometimes, the table contains a spanner, which is a common heading for some or all the column headings. The title of the table should be informative and precise and give a clue regarding its contents to the reader. The row headings (or the stubs) are listed in the far left column of the table. Here, one or more variables are listed, and generally these are used to name independent variables. The columns generally represent dependent variables. Units of measurement of the variables should be mentioned in this column. The footnote is an important part of the table. As the name suggests, it is positioned at the bottom of the table. The footnote includes explanatory matter. It can be used to explain non-standard abbreviations, the type of data representation or the statistical test used. As a general consensus, the title is placed on the top of the table and at the bottom of figures and graphs.
There are some limitations for the use of tables. They are not useful for conveying ideas. In addition, they are not good at depicting changes occurring over time (trends). Graphs convey such information better.
Basic rules for the preparation of tables
Ideally, every table should:
- Be self-explanatory
- Present values with the same number of decimal places in all its cells (standardisation)
- Include a title informing what is being described and where, as well as the number of observations (N) and when data were collected
- Have a structure formed by three horizontal lines, defining table heading and the end of the table at its lower border
- Not have vertical lines at its lateral borders
- Provide additional information in table footer (when needed)
- Be inserted into a document only after being mentioned in the text
- Be numbered by Arabic numerals.
Graphs attract readers' attention better, and the data they depict remain in their memory. The type of graph used is dependent on the nature of data that are to be shown. They are used for depicting outcomes, relationships and trends. The graphs can be dot graphs, line graphs or bar graphs.
A good graph should have easily readable data points and connecting lines, visually balanced and clearly decipherable axes and legible legends. These qualities ensure that readers understand the data shown in the graph without having to refer to the text. Use of colour coding or various patterns to differentiate the groups or parameters will make the graph more readable. Some types of graphs have provision to represent standard deviation or standard error of means.
Similar to tables, graphs should include a title providing all relevant information below the figure and also be referred to as figures in the text.
A dot plot, also called a dot chart, is used for relatively small data sets of a continuous variable. These plots give a visual comparison of the centre of the observations as well as providing some idea about how the observations vary. The plot groups the data as little as possible, and the identity of an individual observation is not lost. The dots are plotted against their actual data values that are on the horizontal scale. If there are identical data values, the dots are 'piled' on the top of each other. As an example, a graph depicting systolic blood pressure readings in a set of patients is shown in [Figure 2].
A scatter plot is a type of data display that shows the relationship between two numerical variables, especially when the amount of data is large. For example, in a study assessing the relationship between height and neck circumference to predict a difficult airway, a scatter plot is drawn to find out the correlation [Figure 3]. Data are displayed on X (independent variable) and Y (dependent variable) axes. A point represents each individual or object, and an association between two variables can be studied by analysing patterns across multiple points. A regression line is added to a graph to determine whether the association between two variables follows a pattern or not. The direction of the regression line explains whether the correlation is positive or negative, or there is no correlation at all. A scatter plot is easy to draw and interpret. Furthermore, the outliers do not influence the observation and are shown as isolated points in the graph. However, the graph does not show the quantitative measure and precise degree of correlation.
|Figure 3: Scatter plot showing relationship between height and neck circumference|
Click here to view
A pie chart, which is used to represent categorical data, visually represents a distribution of categories. It is generally the most appropriate format for representing information grouped into a small number of categories. For example, in a research studying the pattern of distribution of Mallampati classification in a population, the pie chart describes the percentage distribution of patients with various modified Mallampati airway classes in a study population [Figure 4].
|Figure 4: Pie chart showing the distribution of modified Mallampati class|
Click here to view
Pie charts do not show the actual values, but only percentages. Hence, categories with small percentages may not be visualised properly. Pie charts are recommended for data sets with relatively few (5-10) categories. To represent data in more than ten categories, a bar chart of percentages with category labels on the horizontal axis is easier to read and interpret.
Bar graph and histogram
Bar graphs can be used to represent both quantitative and categorical data. A bar graph is used to indicate and compare values in a discrete category or group and the frequency or other measurement parameters (i.e., mean) [Figure 5]a.
|Figure 5: (a) Bar graph showing gender distribution in a study. (b) Stacked bar graph showing the comparison of modified Mallampati class between groups. (c) Bar graph showing the comparison of time for intubation between two groups. (d) Comparison of the success of intubation at first attempt between intubating laryngeal mask airway and Ambu AuraGain™. (e) Comparison of the success of intubation at first attempt between intubating laryngeal mask airway and Ambu AuraGain™ with changed scale on Y-axis|
Click here to view
Depending on the number of categories, and the size or complexity of each category, bars may be created vertically or horizontally. The height (or length) of a bar represents the amount of information in a category. Bar graphs are flexible and can be used in a grouped or subdivided bar format in cases of two or more data sets in each category. For example, in a study comparing the effect of Mallampati class on intubation success with intubating laryngeal mask airway (ILMA) and Airtraq®, the comparison of the distribution of Mallampati grades between the two study groups has been depicted by a stacked bar graph [Figure 5]b. They also have the advantage of the actual number of observations being displayed in each category unlike pie charts where only percentages are displayed. The use of horizontal or vertical bar graphs depends on the data to be presented and compared. Vertical bar graphs are commonly used for all comparisons, and they have the advantage of representing negative data also, whereas a horizontal bar graph would serve better when a large number of data sets have to be plotted in a bar graph. Bar graphs have been found to have better visual impact and reproducibility of information compared to tables.
A histogram is a specialised bar plot that lets you discover and show the underlying frequency distribution (shape) of a set of continuous data. The frequency is represented on the Y-axis, and the continuous data are split into equal class intervals and represented on the X-axis. This graph also helps to assess the distribution of data (normal or skewed) and outliers.
Use of whiskers or error bars while plotting data would give additional information regarding the degree of dispersion (e.g., standard deviation) of the observed values around a central value (e.g., mean). In a study comparing the ease of intubation between ILMA and Airtraq, the comparison of time for intubation (mean and standard deviation) is represented in [Figure 5]c.
Bar graphs can be easily manipulated to improve visualisation and may influence interpretation unless the reader goes through the graph carefully. For example, [Figure 5]d shows the comparison of the success of intubation at first attempt between ILMA (Group A) and Ambu AuraGain™ (Group B) in seventy patients, and the graph gives an impression that they are comparable which is actually true (P = 0.452). However, by changing the scale of the Y-axis, a visual impression of a non-existent difference is created [Figure 5]e. It may also not be a useful graph when trends need to be presented, and clustering of too many parameters may make the graph look crowded.
Line plot with whiskers
A line graph is commonly used to display change over time as a series of data points connected by straight line segments on two axes. A line plot is useful for representing time series data, preferably when trends are to be presented. In a line graph, the X-axis represents the continuous variable, whereas the Y-axis represents the scale and measurement values. It is also useful to represent multiple data sets on a single line graph to compare and analyse patterns across different data sets. Line graphs depict the results better compared to bar graphs when the difference between the two groups is small and can show both positive and negative values. Line graphs are not very useful for depicting categorical data. Data values and labels can be displayed along the data points on a line graph. However, it is better to avoid them when a large number of data are being represented by the graph so that the graph does not look crowded.
Use of whiskers or error bars helps to show the degree of dispersion around a data set. [Figure 6] shows a line graph depicting the comparison of systolic and diastolic blood pressure trends in patients undergoing intubation with flexible fibreoptic bronchoscope (Group A) and Ambu A3 flexible bronchoscope™ (Group B). The biggest disadvantage of a line graph is that its appearance may imply more information than you have. A slanting line sort of looks like it represents all values between the two points it connects.
|Figure 6: Line graph showing the comparison of pressor response between two groups. Note that the groups are represented by different types of lines|
Click here to view
Box and whisker plot
This is a graphical representation where the entire set of data is summarised and presented in the form of box and whisker plots generated on either side of the box. Box plots can be drawn either horizontally or vertically. The various parts of a box and whisker plot are detailed in [Figure 7]. This figure shows the distribution of thyromental distance in a set of population screened for difficult intubation. The box represents the median value, along with the quartiles. The lines extending from the boxes represent the outliers. Some of the graphs also display mean. Sometimes, the outliers are marked as stars or points, especially when the outliers do not lie within the range of whiskers. Box and whisker plot displays variation in the samples of a statistical population without making any assumptions of the underlying statistical distribution, and they are the best way for representing non-parametric data. The space between the different parts of the box indicates the degree of dispersion (spread) and degree of skew in the data and shows outliers.
|Figure 7: Box and whisker plot showing thyromental distance in a group of patients|
Click here to view
These graphs can display and help in comparing multiple data sets from independent sources which are related to each other (e.g., comparison of thyromental distance between males and females in a set of population to study its influence on the success of intubation). In addition, they handle large amount of data easily and give a visual summary of distribution of the same. However, they do not reflect the exact values and details of the distribution of results. They give a summary of the distribution of the data set. They are not very useful in normally distributed data because mean, mode and standard deviation cannot be obtained from the graph. A histogram is a more useful graph in such cases.
Receiver operating characteristic curve
Receiver operating characteristic (ROC) curves [Figure 8] were initially developed by the Royal British Airforce in World War II as a method of radar signal detection. It is assumed that diagnostic tests (airway assessment tests such as thyromental distance and modified Mallampati class) yield either positive or negative results. In real-time clinical assessment, they yield a range of possible values such as either a score or a measurement with a reference cut-off for normality. The ROC curve allows us to measure the sensitivity and specificity and compare them at various cut-off points. They are usually plotted with sensitivity as the Y-axis and 1-specificity (false-positive rate) as the X-axis. Each point along the curve is a cut-off allowing us to choose the sensitivity and specificity at that point. The area under the curve (AUC) provides a summary measure of how well a variable may be able to predict an outcome. When two or more curves are constructed from data on the same group of individuals, they may be used to compare the relative efficacies of each in predicting outcomes.,
|Figure 8: A receiver operating characteristic curve plotted from hypothetical data of modified Mallampati class predicting a difficult airway|
Click here to view
The individual sensitivity and specificity of a test can be estimated by plotting the minimum distance line (from coordinates X = 0 and Y = 1 to X = 1 and Y = 0). The Youden index (J) is a measure of accuracy (denoted by the maximum vertical distance from the line of random chance to the optimum cut-off point obtained by the minimum distance line) used to rate tests on their ability to discriminate between the presence or absence of the outcome (e.g., the presence or absence of a difficult airway) [Figure 8]. The AUC is better at quantifying the ability of a test to discriminate between two outcomes.
Selection of a proper mode of data representation goes a long way in facilitating the understanding of results in an effective way. [Table 2] provides a brief summary of the various types of variables and their modes of representation.
|Table 2: Type of variable and choice of a suitable mode for appropriate representation|
Click here to view
| Conclusion|| |
The choice of an ideal tool for representing the data depends on the research question, the primary outcome measure and the order of emphasis to be given to the parameters. Tables can accommodate more parameters in a concise manner compared to textual description. Graphs provide a quick visual interpretation of the data and help in making the presentation more appealing.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Goneppanavar U, Ali Z, Bhaskar SB, Divatia JV. Types of data, methods of collection, handling and distribution. Airway 2019;2:36-40. [Full text]
In J, Lee S. Statistical data presentation. Korean J Anesthesiol 2017;70:267-76.
Annesley TM. Bring your best to the table. Clin Chem 2010;56:1528-34.
Bavdekar SB. Using tables and graphs for reporting data. J Assoc Physicians India 2015;63:59-63.
Duquia RP, Bastos JL, Bonamigo RR, González-Chica DA, Martínez-Mesa J. Presenting data in tables and charts. An Bras Dermatol 2014;89:280-5.
Annesley TM. Put your best figure forward: Line graphs and scatter grams. Clin Chem 2010;56:1229-33.
Ibe OC. Introduction to descriptive statistics. In: Fundamentals of Applied Probability and Random Processes. 2nd
ed. San Diego: Elsevier Publications; 2014. p. 253-74.
Brewer NT, Gilkey MB, Lillie SE, Hesse BW, Sheridan SL. Tables or bar graphs? Presenting test results in electronic medical records. Med Decis Making 2012;32:545-53.
Carter JV, Pan J, Rai SN, Galandiuk S. ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery 2016;159:1638-45.
Peacock JL, Peacock PJ. Oxford Handbook of Medical Statistics. New York: Oxford University Press; 2011. p. 348-9.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8]
[Table 1], [Table 2]