CC BY-NC-ND 4.0 · Indian J Radiol Imaging 2025; 35(S 01): S58-S73
DOI: 10.1055/s-0044-1796644
Research and Publications: The Process
Review Article

Basic Statistics for Radiologists: Part 1—Basic Data Interpretation and Inferential Statistics

Adarsh Anil Kumar
1   Department of Imaging Sciences and Interventional Radiology, Sree Chitra Institute of Medical Sciences, Trivandrum, Kerala, India
,
1   Department of Imaging Sciences and Interventional Radiology, Sree Chitra Institute of Medical Sciences, Trivandrum, Kerala, India
,
1   Department of Imaging Sciences and Interventional Radiology, Sree Chitra Institute of Medical Sciences, Trivandrum, Kerala, India
,
Santhosh Kannath
1   Department of Imaging Sciences and Interventional Radiology, Sree Chitra Institute of Medical Sciences, Trivandrum, Kerala, India
› Author Affiliations
Funding None.
 

Abstract

A systematic approach to statistical analysis is essential for accurate data interpretation and informed decision-making in the rapidly evolving field of radiology. This review provides a comprehensive overview of the fundamental statistical concepts for radiologists and clinicians. The first part of this series introduces foundational elements such as data types, distributions, descriptive and inferential statistics, hypothesis testing, and sampling methods. These are crucial for understanding the underlying structure of research data. The second part of this series delves deeper into advanced topics, including correlation and causality, regression analysis, survival curves, and the analysis of diagnostic tests using contingency tables and receiver operator characteristic (ROC) curves. These tools are vital for evaluating the efficacy of imaging techniques and drawing valid conclusions from clinical studies. As radiology continues to push the boundaries of technology and therapeutic interventions, mastering these statistical principles will empower radiologists to critically assess literature, conduct rigorous research, and contribute to evidence-based practices. Despite the pivotal role of statistics in radiology, formal training in these methodologies is still limited to a certain extent. This primer aims to bridge that gap, providing radiologists with the necessary tools to enhance diagnostic accuracy, optimize patient outcomes, and advance the field through robust research.


#

Introduction

Adopting a systematic approach to statistical analysis is essential for ensuring the accurate interpretation of data and drawing valid conclusions from research studies. In the field of radiology, statistics play a crucial role in enhancing diagnostic precision, improving patient outcomes, and driving advancements in research. This primer offers a thorough and condensed overview of key statistical concepts that are pertinent to both radiologists and clinicians. The first part is dedicated to discussing types of data, data distribution, descriptive and inferential statistics, hypothesis testing, and sampling. The second part delves into advanced statistical concepts such as correlation and causality, regression analysis, survival curves, and the analysis of diagnostic tests, encompassing contingency tables and receiver operating characteristic (ROC) curves. This primer not only serves as a foundational resource for grasping basic statistical concepts but also aids in the interpretation of various methodologies relevant to daily research endeavors.

Radiology has been at the forefront of technological innovations and various advancements, focusing not only on disease diagnosis but also on therapeutic interventions. The conduct of research assessing the utility of imaging techniques and their applications are crucial for shaping clinical recommendations and establishing practice guidelines, both now and in the future.[1] Understanding fundamental statistical principles will enable radiologists as well as clinicians to critically assess existing literature and make well-informed clinical decisions, which are the foundations of evidence-based medicine.[2] Similarly, the proper application and interpretation of statistical methods are crucial for carrying out scientifically rigorous studies. Nonetheless, training in research methodology, particularly in statistics, is generally limited throughout postgraduate medical training.[3] Our objective is to provide an overview of the most frequently used data analysis methods found in radiology literature.


#

Types of Data

Statistical data can be broadly classified into two types: quantitative and qualitative. Understanding the type of data are crucial for selecting the appropriate statistical method for analysis.[4] Quantitative data refers to numerical information that can be measured and counted. It can be further subdivided into two types ([Fig. 1])[5] [6]:

  • Continuous data can take any value within a specified range, allowing for the calculation of statistical measures such as means and variances. For instance, in a study measuring the size of tumors in breast cancer patients before and after treatment, the tumor sizes are considered continuous data because they can assume any value within the range of possible measurements, such as 1.2, 2.5, 3.7 cm, and so on.

  • On the other hand, discrete data consist of distinct and separate values, often arising from counting processes. For example, the number of renal cysts present on ultrasound images of different patients represents discrete data. If one patient has three cysts and another has five, these values are discrete data.

Zoom Image
Fig. 1 Flowchart demonstrating the classification of types of data.

Qualitative data describe characteristics or categories that cannot be quantified. They are also known as categorical data and can be subdivided into two types[5] [6]:

  • Nominal data: These represent categories that do not have an inherent order. This type of data is often used to classify observations into distinct groups. For example, in a study evaluating the choice of different imaging modalities for a particular suspected pathology among various radiologists, the modalities (magnetic resonance imaging [MRI], computed tomography [CT], ultrasound) are nominal data.

  • Ordinal data: This type of data represents categories with a meaningful order but no consistent difference among them. It is useful for ranking observations but does not provide information about the relative distance between ranks. For example, when evaluating patient satisfaction with imaging services, responses might be categorized as “poor,” “fair,” “good,” or “excellent.” These categories have a natural order, but the intervals between them are not necessarily equal.

Consider a study that examines the efficiency of different radiology workflows. The study can collect both quantitative and qualitative data. Quantitative data can be measured as the time taken (in minutes) to complete a set of imaging examinations, while qualitative data can be formulated as the type of workflow (manual vs. automated). Statistical tests are more robust for quantitative data than for qualitative data. By analyzing both types of data, the researcher can determine not only which workflow is faster but also how the type of workflow affects overall efficiency as well as user satisfaction.

When gathering data for research, it is advisable to collect the data as continuous variables rather than nominal variables when there is flexibility in organizing the data. For instance, when recording the hypertensive status of multiple patients, it is more advantageous to gather individual blood pressure measurements rather than categorizing patients as hypertensive or nonhypertensive. This approach offers benefits such as greater statistical power, reduced information loss, and increased flexibility in data transformation.


#

Distribution of Data

Understanding the distribution of data is essential for selecting appropriate statistical methods. Distribution describes how the data values are spread across and thereby provides insight into underlying patterns as well as trends within the dataset.[7]

Normal distribution (also known as Gaussian distribution) basically links frequency distribution to probability distribution, representing how near or how far distribution of the observed sample is from the ideal distribution of a population-based sample. It is a symmetrical, bell-shaped curve where most of the data points cluster around the mean. Many biological measurements, like blood pressure or body temperature, follow a normal distribution. Mean in such data occupies the central position within the distribution. Standard deviation (SD) indicates how data are dispersed around the mean. Larger the SD, wider and flatter the curve. Two SDs cover 95% and 3 SDs cover 99.7% of the observations. The properties of the normal distribution allow for the application of various statistical techniques, including parametric tests.[7] [8]

Skewness is a measure of asymmetry and deviation from a normal distribution. Data can be skewed if they are not symmetrically distributed. Skewness can be positive (right skewed) or negative (left skewed; [Fig. 2]).[9]

Zoom Image
Fig. 2 Bar charts demonstrating types of data distribution. Normal distribution of data is represented by the typical symmetrical bell-shaped curve, e.g., in a typical healthy population, liver attenuation values (in HU) usually center around a mean of 50 to 60 HU, with most people falling close to this value. There are few individuals with extremely high or low attenuation values, leading to the characteristic bell-shaped, symmetrical curve of a normal distribution. Positively skewed distribution causes the peak of the curve to shift toward the positive left side, e.g., in a dataset measuring duration of hospital stays for patients undergoing different interventional radiology procedures, a right-skewed distribution might indicate that while most patients are discharged within a few days, a smaller number of patients have significantly longer stays due to complications. Negatively skewed distribution causes it to shift toward the negative right side, e.g., if age at diagnosis for a particular disease shows a left-skewed distribution, it might indicate that most diagnoses occur later in life, with a few cases occurring at younger ages. Bimodal distribution with two peaks on the right and left side, for example, distribution of heights in a mixed-gender sample.

Right-skewed distribution: Most data points are concentrated on the left with a long tail to the right. For example, in a dataset measuring the duration of hospital stays for patients undergoing different interventional radiology procedures, a right-skewed distribution might indicate that while most patients are discharged within a few days, a smaller number of patients have significantly longer stays due to complications.

Left-skewed distribution: Most data points are concentrated on the right with a long tail to the left, such as in the case of age at diagnosis for a particular disease. For example, if age at diagnosis for a particular disease shows a left-skewed distribution, it might indicate that most diagnoses occur later in life, with a few cases occurring at younger ages.

A bimodal distribution has two peaks. This can occur when data are collected from two different populations. For example, the distribution of heights in a mixed-gender sample.


#

Presentation of Data

Data can be presented in three ways: as text, in tabular form, or in graphical form ([Fig. 3])[4] [10]:

  • Text: This is the main method of conveying information to explain results and trends, as well as to provide contextual information.

  • Table: It helps in the representation of larger amounts of data in an engaging, easy-to-read and coordinated manner. The data are arranged in rows and columns.

  • Graphical form: It is a powerful tool to communicate research results and to gain information from data. It may be in the form of a bar chart, pie chart, line diagram, scatter plot, or histogram.

Zoom Image
Fig. 3 Examples of different forms of data presentation. (A) Bar chart, which is used to compare the frequency or values of different categories, for example, comparing the number of patients with different types of brain tumors [gliomas, meningiomas, metastases] diagnosed over a year. (B) Pie chart, which is used to show proportions or percentages of a whole, for example, showing the percentage distribution of different imaging modalities (magnetic resonance imaging [MRI], computed tomography [CT], ultrasound, X-ray) used in a hospital's radiology department. (C) Line diagram, which is used to track changes or trends over time, for example, tracking the trend of average radiation dose per CT scan in a radiology department over time (across months or years). (D) Scatter plot, which is used to explore relationships or correlations between two continuous variables, for example, plotting the relationship between tumor size (in cm) and patient survival time (in months) after diagnosis of a malignant tumor. (E) Histogram, which is sed to display the distribution of a continuous variable by grouping data into bins, for example, displaying the distribution of radiodensity values (in Hounsfield units) for liver tissue on CT in a group of patients to assess for fatty liver disease. (F) Box and whisker plot, which is used to show the spread, central tendency, and outliers in a dataset, for example, comparing the distribution of radiologists' interpretation times (in minutes) for reading brain MRI across different experience levels (junior, senior, expert).

#

Descriptive and Inferential Statistics

Once you have gathered data and organized it according to its type and distribution, the next step is to analyze the data. One important aspect of statistics involves making assertions about a population. Since it is often impractical to obtain data from an entire population, a sample is typically taken instead. Descriptive statistics are then used to characterize this sample, including measures such as the mean value and the degree of dispersion. However, characterizing the sample alone does not provide insight into the population as a whole; this is the domain of inferential statistics. In this case, a sample is drawn from the population with the aim of drawing broader conclusions about the population based on this sample. Thus, inferential statistics seek to deduce the unknown parameters of the population from the known parameters of a sample, going beyond the immediate data unlike descriptive statistics. To accomplish this, inferential statistics utilize hypothesis tests such as the t-test or analysis of variance (ANOVA). Both are crucial for analyzing data and drawing meaningful conclusions from them ([Fig. 4]).[11]

Zoom Image
Fig. 4 Pictorial representation of descriptive versus inferential statistics. Sampling is the process of selecting a subset of individuals or data points from a population to make inferences about the entire population. Inferential statistics are used to make predictions or generalizations about a population based on sample data, often involving hypothesis testing and confidence intervals. Descriptive statistics are used to summarize and describe the main features of a dataset, such as measures of central tendency and variability.

#

Descriptive Statistics

Descriptive statistics summarize and describe features of a particular dataset using statistical characteristics, graphics, charts, or tables. They provide simple summaries about the sample and its measures, thereby offering critical insights into central tendency, dispersion, and shape of data distribution. It is important to understand that in descriptive statistics only properties of the sample are evaluated, and we do not draw conclusion about other points in time or the population. Descriptive statistics are further broadly divided into two subtypes: location parameters (i.e., measures of central tendency) and dispersion parameters (i.e., measures of variability). Parameter basically represents a measurable characteristic of the population.

Measures of Central Tendency

Measures of central tendency basically describe where the center of a sample is or where most of the sample is.[12] [13] [14]

Mean: it represents the average of all data points, which is calculated by summing all the values and dividing by the number of observations. The mean can be calculated only for metric variables and is sensitive to outliers. For example, if a radiologist measures the mean size of the liver in a sample of five patients with glycogen storage disorders as 15, 16, 17, 18, and 19 cm, the mean liver size is (15 + 16 + 17 + 18 + 19)/5 = 17 cm.

Median: when data points are ordered from smallest to largest, the middle value is termed as median. The variables must have an ordinal or metric scale level for calculating median. The median is less affected by outliers and skewed data. For the aforementioned example of liver size in a sample of five patients with glycogen storage disorders, the median is 17. For an even number of observations, the median is the average of the two middle values.

Mode: the most frequently occurring value in the dataset is defined as mode. There can be more than one mode if multiple values have the same frequency. It can be used for metric, nominal, or ordinal variables. For example, if the liver sizes are 15, 16, 17, 17, and 18 cm, the mode is 17 cm because it appears most frequently. The advantages and disadvantages of measures of central tendency are given in [Table 1].

Table 1

Table demonstrating the advantages and disadvantages of measures of central tendency

Measure of central tendency

Advantages

Disadvantages

Mean

• Takes all data points into account, providing a comprehensive summary

• Sensitive to outliers, which can skew the result

• Most commonly used and understood

• Not suitable for skewed distributions

Median

• Not affected by outliers or skewed data

• Does not consider all data points, only the middle value

• Represents the 50th percentile, providing a central location

• Less informative in symmetric distributions with no outliers

Mode

• Useful for categorical data where we wish to know the most common category

• May not be unique or may not exist in a continuous dataset

• Not affected by outliers

• Less informative when the distribution is fairly uniform


#

Measures of Variability

Measures of variability describe how much values of variables in a sample differ from each other. In other words, they described how much the values of the variable deviated from the mean value ([Fig. 5]).[15] [16] [17] [18]

Zoom Image
Fig. 5 Graphical representation of measures of central tendency and measures of dispersion. Measures of central tendency are statistical metrics (mean, median, mode) that represent the central point or typical value in a dataset, for example, if a radiologist measures the mean size of the liver in a sample of five patients with glycogen storage disorders as 12, 15, 15, 16, and 14 cm, the mean liver size is (12 + 15 + 15 + 16 + 14)/5 = 14.5, the median is 15, and, mode is 15. Measures of dispersion on the other hand are metrics (range, variance, standard deviation) that quantify the spread or variability of data around the central tendency, for example, in the previous example of mean liver size measurement, if the values are 10, 13, 14, 16, and 19 cm, range will be 9, variance will be 9.31, and standard deviation will be 3.05.

Range: it is the difference between the highest and lowest values in the dataset. It gives a sense of the spread but is affected by outliers. Let us consider the previous example of a radiologist measuring the mean size of the liver in a sample of five patients with glycogen storage disorders as 15, 16, 17, 18, and 19 cm. Range is 19–15 = 4.

Variance: the average of the squared differences from the mean. Variance provides a measure of how much the values in the dataset deviate from the mean.

For a population, the formula is the following:

Zoom Image

where N is the size of the population; xi are the values in the population, μ is the population mean.

For a sample, the formula is the following:

Zoom Image

where n is the size of the sample, xi are the values in the sample, is the sample mean.

For the example mentioned above (liver sizes of 15, 16, 17, 18, and 19 cm), the variance is calculated as the following:

  • Calculate the mean:  = (15 + 16 + 17 + 18 + 19)/5 = 17.

  • Calculate the squared differences from the mean: (xi )2

    • (15 − 17)2 = (−2)2 = 4.

    • (16 − 17)2 = (−1)2 = 1.

    • (17 − 17)2 = 02 = 0.

    • (18 − 17)2 = 12 = 1.

    • (19 − 17)2 = 22 = 4.

  • Sum the squared differences: ∑n(xi  − x)2 = 4 + 1 + 0 + 1 + 4 = 10.

  • Calculate the variance: s 2 = 10/(5–1) = 10/4 = 2.5.

SD: it is the square root of variance and indicates the average distance of data points from the mean. Thus, SD is the mean deviation (root mean square) of all measured values from the mean. It is expressed in the same units as the data.

For a population, the formula is the following:

Zoom Image

where N is the size of the population, xi are the values in the population, and μ is the population mean.

For a sample, the formula is the following:

Zoom Image

where n is the size of the sample, xi are the values in the sample, and is the sample mean.

For the example mentioned above (liver sizes of 15, 16, 17, 18, and 19 cm), SD is calculated as the following:

  • Calculate the variance: s 2 = 2.5.

  • Calculate the SD: s = √s2 = √2.5 = 1.58.

Quartile: it divides data into four parts as equal as possible. For this, the data must be arranged from the smallest to the largest.

  • Quartile (Q1): Middle value between the smallest value and the median.

  • Quartile (Q2): Median of the data, that is, 50% of the values are smaller and 50% of the values are larger.

  • Quartile (Q3): Middle value between the median value and the largest value.

Interquartile range: to find out the range in which the middle 50% of all values lie, one can use the scattering parameter known as interquartile range.

The advantages and disadvantages of measures of variability are given in [Table 2].

Table 2

Table demonstrating the advantages and disadvantages of measures of variability

Measure of central tendency

Advantages

Disadvantages

Range

• Simple and easy to calculate

• Highly sensitive to outliers

• Ignores the distribution of data points within the range

Variance

• Takes into account all data points, providing a comprehensive measure

• Not in the same units as the original data (squared units)

• Useful in statistical calculations and inferential statistics

• Sensitive to outliers

Standard deviation

• Provides a clear measure of spread in the same units as the original data

• Sensitive to outliers

• Widely used and understood in statistical analysis

• Can be less intuitive to interpret compared to the range

Interquartile range

• Not affected by outliers, as it focuses on the middle 50% of data

• Ignores the data outside the 1st and 3rd quartiles

• Useful in skewed distributions

• Less informative for distributions that are not skewed or have outliers


#
#

Inferential Statistics

Inferential statistics allow us to make predictions or inferences about a specific population based on the sample data. This includes estimating population parameters as well as testing hypotheses. It therein provides a way to generalize findings beyond the observed data.[19]

Inferential statistics are broadly of four types:

  • Difference between two groups of variables.

  • Correlation between two groups of variables.

  • Predicting the outcome variable.

  • Relation of variables in time distribution.

In this section, we shall be dealing with the difference between two groups of variables. The rest will be dealt with in part 2 of the series.

Estimation

Estimation refers to the use of sample data to estimate population parameters, such as the mean or proportion. The accuracy of these estimates can be assessed using confidence intervals.[20]

Confidence intervals: range of values within which the true population parameter is expected to lie with a certain level of confidence (e.g., 95% confidence interval). A wider interval indicates greater uncertainty about the parameter estimate. Let us consider the example of a study measuring the average radiation dose patients receive during a whole body 18-FDG positron emission tomography (PET)/CT, where a 95% confidence interval might be 13 to 15 mSv. The confidence level of 95% means that if we were to repeat this study multiple times, approximately 95% of the calculated confidence intervals from those studies would contain the true population mean radiation dose.


#

Hypothesis Testing: Fundamentals

Hypothesis is defined as an assumption that is neither proved nor disproved. It is a research process that involves testing assumptions or claims about a population parameter. Usually hypotheses are formulated starting from a literature review and framing a research question based on this review. Hypothesis testing of the collected data provides a formal framework for making decisions based on sample data. The final target is to either reject or retain this hypothesis.[21] [22]


#

Null and Alternative Hypothesis

Null hypothesis (H0): it is the default assumption that there is no statistically significant difference between two or more groups with respect to a particular characteristic (like no statistically significant difference between variables or no effect of an intervention). In a study comparing two imaging techniques, the null hypothesis might state that there is no statistically significant difference in the diagnostic accuracy between these two techniques.

Alternative hypothesis (H1): alternate hypothesis assumes that there is a difference between two or more groups. It represents the opposite of the null hypothesis. Alternative hypothesis might state that there is a difference in diagnostic accuracy between the two imaging techniques.


#

Difference and Correlation Hypothesis

Difference hypothesis: it tests whether there is a difference between two or more groups. Difference hypothesis might state that there is a difference in diagnostic accuracy between two imaging techniques.

Correlation hypothesis: it tests whether there is a correlation between two or more variables. Correlation hypothesis might state that there is a correlation between the size of a tumor measured by ultrasound and its volume measured by MRI.

Directional and unidirectional hypothesis: with an undirectional hypothesis, focus of interest is whether there is a difference in a value between the groups under consideration. On the other hand, a directional hypothesis focuses on whether one group has a higher or lower value than the other.

The fundamental concept of hypothesis testing is that whether a hypothesis can be accepted or rejected based on a certain probability of error. The reason for this probability of error is that each time you take a sample, you get a different sample, which means that the results are different every time.[23]

Type I error: it refers to rejecting the null hypothesis when it is true (false positive). The significance level (α) represents the probability of making a type I error. Usually, a significance level of 5 or 1% is set.

For example, if α is set at 0.05, there is a 5% chance of incorrectly rejecting the null hypothesis when it is actually true.

p-Value: it is the probability of obtaining the observed results if the null hypothesis is true. If the p-value is less than the significance level, the null hypothesis is to be rejected (otherwise not). A p-value less than 0.05 is typically considered statistically significant, indicating that the observed results are unlikely to have occurred by chance. For example, if the p-value is 0.03 in a study comparing imaging techniques, it suggests that there is a statistically significant difference in diagnostic accuracy.

Type II error: it is failing to reject the null hypothesis when it is false (false negative). The probability of making a type II error is denoted by β, and power is defined as 1–β. For example, if a study has low power, there is a higher chance of failing to detect a true difference between imaging techniques, resulting in a type II error.

It is important to keep in mind that just because an effect is statistically significant it does not mean that the effect is relevant. If a very large sample is taken and it has a very small spread, even a minute difference between two groups may be significant, but it may not be practically relevant.


#

Sample Size Determination

Determining the appropriate sample size is very crucial for ensuring the reliability and validity of study results. Too small a sample size will not give valid results or will not adequately represent the realities of the population being analyzed. On the other hand, larger sample sizes give smaller margins of error and are more representative. In fact, a sample size that is too large may significantly increase the cost and time taken to conduct the research.[24] [25] [26] [27] [28] The factors that influence sample size include the following:

  • Population size: larger populations generally require larger samples.

  • Effect size: smaller effect sizes require larger samples to detect differences.

  • SD: the higher the distribution is, the greater the SD and the greater the magnitude of deviation.

  • Significance level (α): lower significance levels require larger samples.

  • Power (1–β): higher power (typically 0.80) requires larger samples to reduce the risk of type II errors.


#

Case Study: Sample Size in Radiological Research

A study aims to evaluate the diagnostic accuracy of a new MRI sequence in neuroimaging. Researchers need to determine an appropriate sample size to ensure the study's findings are statistically significant and reliable.

  • Population size: the population includes all patients eligible for brain MRI at the hospital.

  • Effect size: based on preliminary data, the researchers estimate a moderate effect size.

  • Significance level (α): they choose a significance level of 0.05.

  • Power (1–β): they aim for a power of 0.80, meaning they want an 80% chance of detecting a true difference if one exists.

Using sample size calculation formulas, they determine that a sample size of 200 patients is needed to achieve the desired power and significance level. This ensures that the study results will be robust and reliable, providing valuable insights into the new MRI technique's diagnostic accuracy.

But which formula should we use to calculate the sample size ([Fig. 6], [Table 3])?

Zoom Image
Fig. 6 Formulae for sample size. In Eq. 1, n: required sample size for an unlimited population; z: Z-score, corresponding to the desired confidence level (e.g., 1.96 for 95% confidence); : estimated proportion of the population (i.e., the proportion you expect to observe a certain characteristic in the population); ϵ: margin of error (the maximum acceptable difference between the true population parameter and the sample estimate). In Eq. 2, n': adjusted sample size for a finite population; n: sample size calculated for an unlimited population (from the first formula); N: size of the finite population. In Eq. 3, n: required sample size for a finite population; (N): total population size; (Z): Z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence); p: estimated proportion of the population (the probability of the characteristic being studied); 1–p: complementary proportion (the probability of not having the characteristic being studied); e: margin of error (acceptable level of precision in the results). In Formula 4, (N): required sample size; σ 2: population variance (or an estimate of the variance of the outcome); Z 1 − α : Z-score corresponding to the desired level of statistical significance (e.g., 1.96 for a 95% confidence level), which accounts for type I error (false positives); Z 1 − β : Z-score corresponding to the desired statistical power, representing type II error (false negatives); typically, 1 − β is set at 0.80 or 0.90, and the corresponding Z-score is looked up (e.g., 0.842 for 80% power); d min: minimum detectable difference or effect size, representing the smallest difference that is practically significant and you wish to detect in your study.
Table 3

Table showing minimum sample size calculation of different statistical tests and examples with radiology literature citations

Test type

Formula

Variables needed

Example in radiology

Study

Unpaired t-test

• Significance level (α)

Z α/2 is the Z-value corresponding to the desired significance level

• Power (1–β)

Z 1–β is the Z-value corresponding to the desired power

• Standard deviation (σ)

• Effect size (difference in means; M1–M2)

Comparison of 320-detector volumetric and 64-detector helical computed tomography (CT) images of the pancreas for size measurement of various anatomical structures

Goshima et al[48]

Paired t-test

• Significance level (α)

• Power (1–β)

• Effect size (mean difference d)

• Standard deviation of differences (σd)

Comparison of tumor size on microscopy, CT, and MRI assessments vs. pathologic gross specimen analysis of pancreatic neuroendocrine tumors

Bian et al[49]

Chi-squared test

• Significance level (α)

• Proportion (p)

• Difference in proportions (Δ)

Comparison of enhancement patterns between benign and malignant solid renal lesions

Millet et al[50]

ANOVA

• Significance level (α)

• Power (1–β)

• Effect size (η 2)

• Variance between groups (σ 2)

Population-stratified analysis of bone mineral density distribution in cervical and lumbar vertebrae of Chinese from quantitative computed tomography

Zhang et al[51]

Confidence level

z-score

80%

1.28

85%

1.44

90%

1.65

95%

1.96

99%

2.58

Steps in using the formula for sample size calculation:

  1. Determine the population size (if known).

  2. Determine the confidence interval.

  3. Determine the confidence level.

  4. Determine the SD (basically representing the population proportion, which is assumed to be 50% = 0.05).[29]

  5. Convert the confidence level into a Z-score.

  6. Put these figures into the sample size formula to get your sample size.

Necessary sample size = (Z-score)2 × SD × (1–SD)/(margin of error)2.

Say you choose to work with a 95% confidence level, an SD of 0.5, and a confidence interval (margin of error) of ± 5%.

Necessary sample size = {(1.96)2 × 0.5 × 0.5/(0.5)2} = (3.8416 × 0.25)/0.0025 = 384.16.

Hence, the sample size should be 385.


#
#

Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions about the population based on sample data. It is used to assess whether a particular viewpoint is likely to be true.[30] It involves several steps ([Fig. 7]):

  1. Formulate hypotheses: define the null hypothesis (H0) and alternative hypothesis (H1).

  2. Selection of study design and sample size: select ones that are appropriate to the hypothesis being tested.

  3. Select significance level (α): commonly set at 0.05.

  4. Collect data: gather sample data relevant to the hypothesis.

  5. Calculate test statistic: use an appropriate test (e.g., t-test, chi-squared test) to calculate the test statistic for each outcome variable of interest.

  6. Determine p-value: compare the p-value to the significance level.

  7. Make a decision: reject H0 if p-value < α; otherwise, fail to reject H0.

Zoom Image
Fig. 7 Pictorial representation of hypothesis testing process. Steps involved in the hypothesis testing process are the following: (1) Formulation of a hypothesis (question mark at the top center). This step involves defining a research question or hypothesis. Typically, there are two hypotheses: (a) null hypothesis (H 0)—assumes no effect or no statistically significant difference and (b) alternative hypothesis (H 1)—assumes there is an effect or difference. (2) Selecting a sample (right panel showing population and sample). From the larger population, a sample is selected. (The sample should be representative of the population to generalize the findings back to the population.) (3) Hypothesis testing (bottom-right panel showing hypothesis testing). Statistical tests are performed on the sample data to test the hypothesis. (The aim is to determine whether the data provide enough evidence to reject the null hypothesis in favor of the alternative hypothesis). (4) Significance and p-value (bottom left with p-value). The result of the hypothesis test is evaluated using the p-value. (If the p-value is less than 0.05 [commonly used significance level], it suggests that the results are statistically significant, meaning there is sufficient evidence to reject the null hypothesis.) (5) Conclusion (Arrow back to the top indicating significance). Based on the p-value and test results, conclusions are drawn about the hypothesis, indicating whether the evidence supports rejecting the null hypothesis.

Hypothesis testing is just like the concept of “An accused is presumed to be innocent until proved guilty.”

Common Hypothesis Tests in Radiology

It is broadly divided into two groups: hypothesis tests done on numerical data and those done on categorical data. Basically, these tests are used to find the difference between two groups of variables.

Datasets will have to be treated as paired if they are related. Thus, if we compare the systolic blood pressure values of two independent sets of subjects, it is an example of unpaired data. However, if a condition is included like all the individuals in one dataset are siblings of the individuals represented in the other dataset, then corresponding values in the two datasets may be related in some manner (due to genetic or familial reasons) and the datasets are no longer independent.

Parametric data are normally distributed numerical data that follows the parameters of a normal distribution curve. If it is a skewed distribution, there is no particular distribution, or if the distribution is unknown, then it should be considered as nonparametric data. But practically, how do we determine whether the numeric data are normally distributed? One gross method is to look at the measures of central tendency, mean, and median. If the mean and median are the same or are very close to one another (as compared with the total data spread), then we can assume that we are dealing with parametric data. However, the proper method to test the fit of data to a normal distribution is to use “goodness-of-fit” tests such as the Kolmogorov–Smirnov test and Shapiro–Wilk test. The null hypothesis in these tests is that the frequency distribution of your data is normally distributed. If any of these tests return a p-value less than 0.05, it implies that the normal distribution will have to be rejected and the data would have to be taken as nonparametric.[31] [32] [33] [34]

Statistical tests for normal distribution:

  • Kolmogorov–Smirnov test.

  • Shapiro–Wilk test.

  • Anderson–Darling test.

  • D'Agostino–Pearson omnibus test.

The major disadvantage of these tests is that the calculated p-value is affected by the sample size. Therefore, if the sample size is very small, the p-value may be much larger than 0.05. But if the sample size from the same population is very large, your p-value may be smaller than 0.05.

To overcome this disadvantage, graphical tests for normal distribution are used ([Fig. 8]):

  • Histogram data: Compare the histogram curve with the normal distribution curve.

  • Quantile–quantile plot: Compare the theoretical quantiles of normally distributed data with quantiles of the measured values. If data were perfectly normally distributed, all the points would be on a straight line. The further the points deviate from the line, the less normally distributed the data are.

Zoom Image
Fig. 8 Histogram curve and Q-Q plot for graphical representation of normality of distribution. Histogram shows the data's shape, and the Q-Q plot compares the data's quantiles to a theoretical normal distribution to identify deviations from normality.

#

Hypothesis Tests Done on Contiguous Data

Parametric Data

Simple t-test: this is a test used to determine whether the mean calculated from sample data collected from a single group is different from the population selected ([Fig. 9]).[35] [36]

Zoom Image
Fig. 9 Approach to select appropriate parametric tests.

Let us consider a study where the researchers want to assess whether the hippocampal volume on MRI in temporal lobe epilepsy patients is significantly lower as compared with all epilepsy patients imaged during a specific time period. The t-test would then be used to show if the hippocampal volume is statistically lower in temporal lobe epilepsy patients.

Unpaired sample t-test (for two independent samples): it compares the means of two independent groups. There is no relationship between the subjects in one group and those in the other.[36] For example, an unpaired t-test could be used to compare the average radiation dose received by patients undergoing neurointervention on a monoplane and biplane angio-suite, assuming patients are randomly assigned to one of the techniques.

Student's paired t-test (for two dependent samples): it compares the means of two related groups or conditions. Each subject or sample is measured twice, resulting in paired observations.[36] A t-test might be used to compare the average size of hepatocellular carcinoma nodules in patients treated with a new intra-arterial chemotherapy drug. If the t-test shows a significant difference in mean sizes, it suggests that the drug is effective in reducing tumor size.

A tailed t-test refers to either a one-tailed test or a two-tailed test used to determine the direction of an effect, while a nontailed t-test typically implies a two-tailed test that assesses for any significant difference without specifying the direction.

One-tailed t-test: it tests for the possibility of an effect in one specific direction (e.g., greater than or less than). For example, when the research hypothesis predicts the direction of the difference (e.g., drug A increases recovery rate more than drug B). Basically, it tests if the mean is greater than a certain value.

Two-tailed t-test: it tests for the possibility of an effect in both directions (e.g., not equal to). For example, when the research hypothesis does not predict the direction of the difference (e.g., drug A has a different recovery rate than drug B, without specifying higher or lower). Basically, it tests if the mean is different from a certain value, either higher or lower.

One factorial ANOVA (for more than two independent samples): it determines whether there are any statistically significant differences between the means of three or more independent groups (or levels) on a continuous independent variable. It tests the null hypothesis that all group means are equal[37] [38]: A one-way factorial ANOVA could be used to compare the average reading times of radiologists interpreting images from three different types of imaging modalities (X-ray, MRI, and CT scan).

Repeated measures ANOVA (for more than two dependent samples): it determines whether there are any statistically significant differences between the means of three or more related groups (or levels) on a continuous dependent variable measured at multiple time points or under different conditions. It accounts for the correlation between measurements taken from the same subject across different conditions or at different time points.[38] [39] Repeated measures ANOVA could be used to assess the effectiveness of a new contrast agent in enhancing detection of small cerebral metastatic lesions across multiple time points during an MRI scan session (comparing the detection before contrast administration, immediately after contrast administration and 30 minutes postcontrast administration).


#

Non-parametric Data

For One Sample

Wilcoxon's test (Wilcoxon signed-rank test): it compares the median of a single sample of paired data against a specified median value (typically zero, assuming no difference; [Fig. 10] and [Table 4]). It is typically used when the data do not meet the assumptions required for a parametric test like the t-test, such as when the data are not normally distributed or when the measurement scale is ordinal.[40] Wilcoxon signed-rank test could be used to assess whether a new MRI sequence results in significantly improved lesion detection as compared with an established sequence.

Zoom Image
Fig. 10 Approach to select appropriate nonparametric tests.
Table 4

Table showing various nonparametric tests used depending on the type of variables

Variable type

Test

Description

Continuous

Mann–Whitney U test

Compares differences between two independent groups

Kruskal–Wallis test

Extension of the Mann–Whitney U test for three or more groups

Nominal

Chi-squared test

Assesses whether there is a significant association between two categorical variables

Fisher's exact test

Used for small sample sizes (<5 in 1 cell) to determine if there are nonrandom associations between two categorical variables

McNemar's test

Used for paired nominal data to determine if there is a difference in proportions

Ordinal

Wilcoxon's test

Compares two independent groups with ordinal data

Friedman's test

Compares differences between three or more dependent groups (repeated measures) in ordinal scale


#

Between Two Groups

Mann–Whitney U test (for two independent samples; also known as Wilcoxon rank sum test): it assesses whether two independent groups differ significantly in terms of their medians. It does not assume that the data follow a normal distribution.[41] The Mann–Whitney U test could be used to compare the interpretation times between two groups of radiologists interpreting the same set of MRI scans.

Wilcoxon's test (for two dependent samples): it compares the medians of two related groups or conditions. It assesses whether there is a statistically significant difference between paired observations from the same subjects under different conditions.[42] Wilcoxon signed-rank test for two dependent samples could be used to evaluate the effectiveness of a new image enhancement AI algorithm compared with the current conventional MRI images.


#

More than Two Groups

Kruskal–Wallis test (for more than two independent samples): it determines whether there are statistically significant differences between three or more independent groups in terms of their medians. It is an extension of the Mann–Whitney U test for more than two groups.[43] For example, the Kruskal–Wallis test could be used to compare the hepatic lesion size (measured as a continuous variable) among three different types of imaging modalities (ultrasound, MRI, and CT scan).

Friedman's test (for more than two dependent samples): it determines whether there are statistically significant differences between three or more dependent groups (repeated measures) in terms of their medians. It is analogous to the Kruskal–Wallis test but is used for within-subject designs.[44] Friedman's test could be used to compare the ratings of definition of margins of a cerebral lesion (ordinal scale) from the same set of radiologists across three different MRI sequences.


#
#
#

Hypothesis Tests Done on Categorical Data

  • If two groups are to be compared ([Fig. 11])[35]:

    • Chi-squared (χ2 ) test: it determines whether there is a significant association between categorical variables. It is typically used when both variables are categorical and the data are frequencies (counts).[45] For example, the chi-squared test could be used to assess the association between the presence of a certain radiological sign and the presence or absence of a specific pathology.

    • Fisher's exact test: it determines whether there is a significant association between categorical variables, especially when sample sizes are small or when expected cell counts in a contingency table are less than 5.[46] For example, Fisher's exact test could be used to compare the diagnostic performance of two imaging techniques in detecting a rare pathology.

    • McNemar's test: it is a nonparametric test used to analyze paired nominal data. It is particularly useful when you have two related samples or repeated measurements on the same subjects, and you want to determine if there is a significant change in responses between two conditions or time points.[47]

  • If more than two groups:

    • Chi-squared (χ2 ) test: it determines whether there is a significant association between two or more categorical variables. It is an extension of the chi-squared test for two groups but applied to contingency tables with more than two rows or columns.[45] For example, the chi-squared test of independence could be used to assess whether there is an association between the types of lung disease (categorized into four types: pneumonia, tuberculosis, asthma, and bronchitis) and smoking status (smoker vs. nonsmoker) among a group of patients.

Zoom Image
Fig. 11 Approach to select appropriate statistical tests for categorical data.

The tests to be done based on the type of data are summarized in [Tables 4] and [5].

Table 5

Table showing various parametric and nonparametric tests used depending on the nature of the sample being analyzed

Parametric tests

Nonparametric tests

One sample

Simple t-test

Wilcoxon's test for 1 sample

Two dependent samples

Paired sample t-test

Wilcoxon's test

Two independent samples

Unpaired sample t-test

Mann–Whitney U test

More than two independent samples

One factorial ANOVA

Kruskal–Wallis test

More than two dependent samples

Repeated measures ANOVA

Friedman's test

Correlation between two variables

Pearson's correlation

Spearman's correlation

Abbreviation: ANOVA, analysis of variance.



#

Reporting Statistical Tests

Reporting statistical tests in radiology is important to clearly and concisely convey the results of analyses performed to evaluate the significance of findings and robustness of conclusions drawn. Key points to consider when reporting statistical tests are the following:

  • Specify the statistical test used: clearly mention which statistical test was employed (e.g., t-test, ANOVA, chi-squared test, Mann–Whitney U test). Justification for the choice of test also has to be provided, including the nature of the data (parametric vs. nonparametric, nominal vs. continuous).

  • Include relevant parameters: degrees of freedom (if applicable; e.g., for t-tests and ANOVA), effect size (include measures such as Cohen's d for t-tests or eta-squared for ANOVA) to indicate the magnitude of the difference, and confidence intervals (present confidence intervals for mean differences or proportions to give context to the results).

  • Present p-values: clearly state the p-value obtained from the statistical test (use the conventional threshold for significance, e.g., p < 0.05; if the p-value is above this threshold, avoid stating it as “not significant”; instead, indicate the p-value explicitly). For very small p-values, it is common to report them as p < 0.001.

  • Interpret results: provide a clear interpretation of what the statistical results mean in the context of the study. Clinical significance of the findings should also be discussed, not just statistical significance.

  • Contextualize with clinical implications: discuss how the statistical findings relate to clinical practice, patient outcomes, or the diagnostic performance of imaging modalities. Consider including sensitivity, specificity, positive predictive value, and negative predictive value if applicable.

  • Follow reporting guidelines: adhere to relevant reporting guidelines (e.g., Standards for Reporting Diagnostic Accuracy [STARD] for diagnostic accuracy studies, Consolidated Standards of Reporting Trials [CONSORT] for randomized controlled trials) to ensure clarity and transparency in the reporting of statistical analyses.

Here is an example of how statistical results might be reported in a radiology study.

Let us consider a study to compare the average tumor volume measured by MRI in patients with type A and B tumors. A total of 60 patients were included in the analysis, with 30 patients in the type A group and 30 patients in the type B group. The mean tumor volume for patients with type A tumors was 15.2 cm3 (±3.1 cm3), while the mean tumor volume for patients with type B tumors was 22.8 cm3 (±4.5 cm3). An independent sample t-test was performed to assess whether the difference in mean tumor volumes between the two groups was statistically significant (after testing the normality of distribution).

The results indicated a significant difference in tumor volume between the two groups (t(58) = –5.46, p < 0.001; “t” signifies the result is derived from a t-test; the number in brackets is the degree of freedom {N1 + N2–2 = 30 + 30–2 = 58}; –5.46 is the t statistic value, with negative indicating the mean of the first group is less than that of the second group; p < 0.001 is the p-value that is statistically significant). Patients in the type B group exhibited larger tumor volumes than those in the type A group. The effect size, calculated using Cohen's d, was 1.41, indicating a large effect. Additionally, a 95% confidence interval for the difference in means was calculated, resulting in an interval of (–9.11 cm3, –5.25 cm3). This interval suggests that the mean tumor volume for type B tumors is significantly higher than that for type A tumors, with a clinically relevant difference. In conclusion, these findings demonstrate that patients with type B tumors have significantly larger tumor volumes compared with those with type A tumors, which may have implications for treatment planning and prognosis.


#
#

Conclusion

To conclude, statistics play a crucial role in radiology, aiding in accurate data interpretation, improving diagnostic accuracy, and advancing research. Proper understanding and application of statistical principles such as data types, their distribution, descriptive and inferential statistics, hypothesis testing, correlation, and sampling are essential for research in radiology. The foundational knowledge needed to leverage statistics effectively, ultimately enhancing clinical decision-making and patient outcomes.


#
#

Conflict of Interest

None declared.

Authors' Contributions

All the authors were involved in the procedure, data collection, and manuscript revision.



Address for correspondence

Jineesh Valakkada, MD
Sree Chitra Tirunal Institute for Medical Sciences and Technology
Thiruvananthapuram, Kerala 695001
India   

Publication History

Article published online:
09 January 2025

© 2025. Indian Radiological Association. This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Thieme Medical and Scientific Publishers Pvt. Ltd.
A-12, 2nd Floor, Sector 2, Noida-201301 UP, India


Zoom Image
Fig. 1 Flowchart demonstrating the classification of types of data.
Zoom Image
Fig. 2 Bar charts demonstrating types of data distribution. Normal distribution of data is represented by the typical symmetrical bell-shaped curve, e.g., in a typical healthy population, liver attenuation values (in HU) usually center around a mean of 50 to 60 HU, with most people falling close to this value. There are few individuals with extremely high or low attenuation values, leading to the characteristic bell-shaped, symmetrical curve of a normal distribution. Positively skewed distribution causes the peak of the curve to shift toward the positive left side, e.g., in a dataset measuring duration of hospital stays for patients undergoing different interventional radiology procedures, a right-skewed distribution might indicate that while most patients are discharged within a few days, a smaller number of patients have significantly longer stays due to complications. Negatively skewed distribution causes it to shift toward the negative right side, e.g., if age at diagnosis for a particular disease shows a left-skewed distribution, it might indicate that most diagnoses occur later in life, with a few cases occurring at younger ages. Bimodal distribution with two peaks on the right and left side, for example, distribution of heights in a mixed-gender sample.
Zoom Image
Fig. 3 Examples of different forms of data presentation. (A) Bar chart, which is used to compare the frequency or values of different categories, for example, comparing the number of patients with different types of brain tumors [gliomas, meningiomas, metastases] diagnosed over a year. (B) Pie chart, which is used to show proportions or percentages of a whole, for example, showing the percentage distribution of different imaging modalities (magnetic resonance imaging [MRI], computed tomography [CT], ultrasound, X-ray) used in a hospital's radiology department. (C) Line diagram, which is used to track changes or trends over time, for example, tracking the trend of average radiation dose per CT scan in a radiology department over time (across months or years). (D) Scatter plot, which is used to explore relationships or correlations between two continuous variables, for example, plotting the relationship between tumor size (in cm) and patient survival time (in months) after diagnosis of a malignant tumor. (E) Histogram, which is sed to display the distribution of a continuous variable by grouping data into bins, for example, displaying the distribution of radiodensity values (in Hounsfield units) for liver tissue on CT in a group of patients to assess for fatty liver disease. (F) Box and whisker plot, which is used to show the spread, central tendency, and outliers in a dataset, for example, comparing the distribution of radiologists' interpretation times (in minutes) for reading brain MRI across different experience levels (junior, senior, expert).
Zoom Image
Fig. 4 Pictorial representation of descriptive versus inferential statistics. Sampling is the process of selecting a subset of individuals or data points from a population to make inferences about the entire population. Inferential statistics are used to make predictions or generalizations about a population based on sample data, often involving hypothesis testing and confidence intervals. Descriptive statistics are used to summarize and describe the main features of a dataset, such as measures of central tendency and variability.
Zoom Image
Fig. 5 Graphical representation of measures of central tendency and measures of dispersion. Measures of central tendency are statistical metrics (mean, median, mode) that represent the central point or typical value in a dataset, for example, if a radiologist measures the mean size of the liver in a sample of five patients with glycogen storage disorders as 12, 15, 15, 16, and 14 cm, the mean liver size is (12 + 15 + 15 + 16 + 14)/5 = 14.5, the median is 15, and, mode is 15. Measures of dispersion on the other hand are metrics (range, variance, standard deviation) that quantify the spread or variability of data around the central tendency, for example, in the previous example of mean liver size measurement, if the values are 10, 13, 14, 16, and 19 cm, range will be 9, variance will be 9.31, and standard deviation will be 3.05.
Zoom Image
Zoom Image
Zoom Image
Zoom Image
Zoom Image
Fig. 6 Formulae for sample size. In Eq. 1, n: required sample size for an unlimited population; z: Z-score, corresponding to the desired confidence level (e.g., 1.96 for 95% confidence); : estimated proportion of the population (i.e., the proportion you expect to observe a certain characteristic in the population); ϵ: margin of error (the maximum acceptable difference between the true population parameter and the sample estimate). In Eq. 2, n': adjusted sample size for a finite population; n: sample size calculated for an unlimited population (from the first formula); N: size of the finite population. In Eq. 3, n: required sample size for a finite population; (N): total population size; (Z): Z-score corresponding to the desired confidence level (e.g., 1.96 for 95% confidence); p: estimated proportion of the population (the probability of the characteristic being studied); 1–p: complementary proportion (the probability of not having the characteristic being studied); e: margin of error (acceptable level of precision in the results). In Formula 4, (N): required sample size; σ 2: population variance (or an estimate of the variance of the outcome); Z 1 − α : Z-score corresponding to the desired level of statistical significance (e.g., 1.96 for a 95% confidence level), which accounts for type I error (false positives); Z 1 − β : Z-score corresponding to the desired statistical power, representing type II error (false negatives); typically, 1 − β is set at 0.80 or 0.90, and the corresponding Z-score is looked up (e.g., 0.842 for 80% power); d min: minimum detectable difference or effect size, representing the smallest difference that is practically significant and you wish to detect in your study.
Zoom Image
Fig. 7 Pictorial representation of hypothesis testing process. Steps involved in the hypothesis testing process are the following: (1) Formulation of a hypothesis (question mark at the top center). This step involves defining a research question or hypothesis. Typically, there are two hypotheses: (a) null hypothesis (H 0)—assumes no effect or no statistically significant difference and (b) alternative hypothesis (H 1)—assumes there is an effect or difference. (2) Selecting a sample (right panel showing population and sample). From the larger population, a sample is selected. (The sample should be representative of the population to generalize the findings back to the population.) (3) Hypothesis testing (bottom-right panel showing hypothesis testing). Statistical tests are performed on the sample data to test the hypothesis. (The aim is to determine whether the data provide enough evidence to reject the null hypothesis in favor of the alternative hypothesis). (4) Significance and p-value (bottom left with p-value). The result of the hypothesis test is evaluated using the p-value. (If the p-value is less than 0.05 [commonly used significance level], it suggests that the results are statistically significant, meaning there is sufficient evidence to reject the null hypothesis.) (5) Conclusion (Arrow back to the top indicating significance). Based on the p-value and test results, conclusions are drawn about the hypothesis, indicating whether the evidence supports rejecting the null hypothesis.
Zoom Image
Fig. 8 Histogram curve and Q-Q plot for graphical representation of normality of distribution. Histogram shows the data's shape, and the Q-Q plot compares the data's quantiles to a theoretical normal distribution to identify deviations from normality.
Zoom Image
Fig. 9 Approach to select appropriate parametric tests.
Zoom Image
Fig. 10 Approach to select appropriate nonparametric tests.
Zoom Image
Fig. 11 Approach to select appropriate statistical tests for categorical data.