CC BY-NC-ND 4.0 · Ultraschall Med
DOI: 10.1055/a-2243-9767
Original Article

Technical assessment of resolution of handheld ultrasound devices and clinical implications

Untersuchung zur technischen Bestimmung der Auflösung in tragbaren Ultraschallgeräten und ihrer klinischen Bedeutung
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Maia Arsova
2   Medical Department 1, University Hospital Dresden, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Katja Matthes
2   Medical Department 1, University Hospital Dresden, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Julia Husman
2   Medical Department 1, University Hospital Dresden, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
David Toppe
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Julian Kober
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Tönnis Trittler
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Daniel Swist
3   Vodafone Chair for Mobile Communications, TU Dresden Faculty of Electrical Engineering and Information Technology, Dresden, Germany (Ringgold ID: RIN123108)
,
Edgar Manfred Gustav Dorausch
3   Vodafone Chair for Mobile Communications, TU Dresden Faculty of Electrical Engineering and Information Technology, Dresden, Germany (Ringgold ID: RIN123108)
,
Antje Urbig
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Gerhard Paul Fettweis
3   Vodafone Chair for Mobile Communications, TU Dresden Faculty of Electrical Engineering and Information Technology, Dresden, Germany (Ringgold ID: RIN123108)
,
Franz Brinkmann
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Nora Martens
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Renate Schmelz
2   Medical Department 1, University Hospital Dresden, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Nicole Kampfrath
2   Medical Department 1, University Hospital Dresden, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
,
Jochen Hampe
1   Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus, Dresden, Germany (Ringgold ID: RIN59988)
› Author Affiliations
Supported by: Sächsisches Staatsministerium für Wissenschaft und Kunst COMEDUS Spitzenförderung
 

Abstract

Purpose Since handheld ultrasound devices are becoming increasingly ubiquitous, objective criteria to determine image quality are needed. We therefore conducted a comparison of objective quality measures and clinical performance.

Material and Methods A comparison of handheld devices (Butterfly IQ+, Clarius HD, Clarius HD3, Philips Lumify, GE VScan Air) and workstations (GE Logiq E10, Toshiba Aplio 500) was performed using a phantom. As a comparison, clinical investigations were performed by two experienced ultrasonographers by measuring the resolution of anatomical structures in the liver, pancreas, and intestine in ten subjects.

Results Axial full width at half maximum resolution (FWHM) of 100µm phantom pins at depths between one and twelve cm ranged from 0.6–1.9mm without correlation to pin depth. Lateral FWHM resolution ranged from 1.3–8.7mm and was positively correlated with depth (r=0.6). Axial and lateral resolution differed between devices (p<0.001) with the lowest median lateral resolution observed in the E10 (5.4mm) and the lowest axial resolution (1.6mm) for the IQ+ device. Although devices showed no significant differences in most clinical applications, ultrasonographers were able to differentiate a median of two additional layers in the wall of the sigmoid colon and one additional structure in segmental portal fields (p<0.05) using cartwheel devices.

Conclusion While handheld devices showed superior or similar performance in the phantom and routine measurements, workstations still provided superior clinical imaging and resolution of anatomical substructures, indicating a lack of objective measurements to evaluate clinical ultrasound devices.


#

Zusammenfassung

Ziel Da tragbare Ultraschallgeräte zunehmend allgegenwärtig sind, sind objektive Kriterien zur Bestimmung der Bildqualität nötig. Wir haben daher einen Vergleich objektiver Qualitätsmessungen und klinischer Leistungen durchgeführt.

Material und Methoden Es wurde ein Vergleich von Handgeräten (Butterfly IQ+, Clarius HD, Clarius HD3, Philips Lumify, GE VScan Air) und Workstations (GE Logiq E10, Toshiba Aplio 500) unter Verwendung eines Phantoms durchgeführt. Zum Vergleich wurden klinische Untersuchungen durchgeführt, bei denen die Auflösung anatomischer Strukturen in Leber, Pankreas und Darm von 2 erfahrenen Ultraschalldiagnostikern bei 10 Probanden gemessen wurde.

Ergebnis Die axiale volle Breite bei halber maximaler Auflösung (FWHM) von 100-µm-Phantomfäden in Tiefen zwischen 1–12cm lag zwischen 0,6 und 1,9 mm, ohne dass eine Korrelation zur Fadenhöhe bestand. Die laterale FWHM-Auflösung reichte von 1,3–8,7mm und zeigte eine positive Korrelation mit der Tiefe (r=0,6). Die axiale und laterale Auflösung unterschied sich zwischen den Geräten (p<0,001), wobei die niedrigste mediane laterale Auflösung beim Gerät E10 (5,4mm) und die niedrigste axiale Auflösung (1,6mm) beim Gerät IQ+ beobachtet wurde. Obwohl die Geräte bei den meisten klinischen Anwendungen keine signifikanten Unterschiede aufwiesen, konnten die Ultraschalldiagnostiker mit den Cartwheel-Geräten im Median 2 zusätzliche Schichten in der Wand des Colon sigmoideum sowie eine zusätzliche Struktur in segmentalen Portalfeldern unterscheiden (p<0,05).

Schlussfolgerung Während Handgeräte bei Phantom- und Routinemessungen eine überlegene oder ähnliche Leistung zeigten, lieferten Workstations immer noch eine überlegene klinische Bildgebung und Auflösung anatomischer Substrukturen, was auf den Mangel an objektiven Messungen zur Bewertung klinischer Ultraschallgeräte hinweist.


#

Introduction

Advances in ultrasound technology enabled the development of handheld ultrasound systems (HUS). The improved bedside availability of ultrasound makes these devices ideal for point-of-care-ultrasound (POCUS) [1]. In combination with their affordable pricing, HUS devices could also increase the availability of ultrasound in gastroenterology as suggested by early studies [2]. Due to performance concerns, physicians are currently hesitant to rely on HUS devices [3]. To evaluate their performance, several studies compared HUS devices with established cartwheel workstations in specific clinical use cases. So far, these studies showed almost similar clinical performance in different applications of gastroenterological ultrasound with a slight inferiority of HUS devices [4] [5] [6] [7] [8] [9] [10]. Previous studies evaluating the performance of HUS devices still have several limitations. Firstly, the GI application studies each included only a single handheld device, which in some cases did not correspond to the newest generation. Secondly, comparative studies of different HUS devices have only been done with subjective parameters [11].

To overcome the problems associated with subjective assessments, the American Institute of Ultrasound in Medicine (AIUM) [12], the American Association of Physicists in Medicine [13] and the European Federation of Societies for Ultrasound in Medicine and Biology (EFSUMB) [14] published methods for the objective measurement of ultrasound device performance using phantom models. The primary intention in these guidelines is the detection of device malfunctions after a certain time of usage. Due to the objective nature and ease of application, these guidelines have been proposed as a basis for the comparison of devices [15]. For HUS devices, this could offer the opportunity to evaluate the performance of new devices and compare it to values for existing devices.

Here, because the availability of HUS devices provides both a chance for broader ultrasound adoption in clinical practice and the challenge to select the most appropriate device, we compared ultrasound devices using the objective phantom measurements according to the EFSUMB scheme and simultaneously evaluated their clinical performance in typical gastroenterology and hepatology imaging tasks.


#

Methods

Five HUS devices were selected to cover different features and technologies of handheld ultrasound devices, namely both piezo-based and capacitive micromechanical ultrasound transducers (see Supplementary Tables 1). For comparison, two cartwheel devices, the General Electrics (GE) Logiq E10 (GE E10) and the Toshiba Aplio 500 (T500), were selected to represent two different workstation generations.

Laboratory phantom measurements

To objectively measure the image quality, a general-purpose phantom (Ultra IQ General Purpose Phantom, Cablon Medical B.V., Netherlands), as shown in Supplementary Figures 1 and the corresponding software (UltraiQ Desktop) as recommended by the EFSUMB [14] were used. The phantom consists of two horizontal rows of pins and one vertical column of pins (red dots in Supplementary Figure 1). Each pin is 100 µm in diameter and is spaced one cm apart. For each device, six B-mode images of the phantom were taken using the abdominal preset and a curved array or a curved preset on a linear transducer (for the Butterfly device). This approach was reported to produce high inter- and intraoperator repeatability [16]. Full width at half maximum (FWHM) of the embedded pins was calculated automatically using the UltraiQ Desktop software. This measures the blurring of the pins on the screen to the side as a surrogate for resolution.


#

Proband study

To evaluate clinical performance, imaging tasks that lend themselves to quantification and qualitative differentiation were performed by two experienced gastroenterologists with more than three years of daily ultrasound experience. For the acquisition, all devices were set to the “abdominal preset” defined by the distributor, thus limiting the general comparability between devices, but improving the comparability between the different abdominal settings between the devices. A detailed list of the devices and their technical settings is given in Supplementary Table 1. For the intestinal system, ultrasonographers were tasked to identify the section of the sigmoid colon with the best ultrasound visibility, to measure the proximal wall thickness, and to count the number of ultrasound layers as follows: Up to a maximum of four layers (muscular layer, submucosa, mucosa, intestinal content) if all were visible on the screen und down to a value of zero to denote failure to identify any section of the sigmoid colon. For liver imaging, the deepest visible portal field segment and the discernible structures in the portal field were counted as follows: One was coded if the portal field was visualized and two and three were recorded if more tubular structures (corresponding to the portal vein, artery, and bile duct) could be seen. In addition, the diameter of the pancreatic duct in the corpus and the thickness of the ventral and dorsal wall of the gallbladder were recorded. To reduce errors due to differences compared to regularly used devices, all handheld devices were provided to the two physicians one month before the commencement of measurements to be used in their daily routines. To avoid differences in the usage between the handheld devices, during this month both physicians used the devices in a repeating, rotating regime, testing one after the other, thereby guaranteeing the same use time for all devices. One proband was imaged by one ultrasonographer performing all imaging tasks with one device before picking the next device. The order of the handheld devices was changed by the physicians between probands by drawing lots. The study was approved by the local ethics committee. Probands were recruited via postings at the university between January 2022 and February 2022. The inclusion criteria were age >18 years, no major intervention involving the liver, and the ability to consent. Imaging procedures were performed at the interdisciplinary ultrasound department of the local university hospital.


#

Statistical analysis

Each device was compared to the GE E10 as the standard reference device, resulting in six comparisons. To perform these comparisons, a Friedman test was used with a Wilcoxon test as post-hoc test. In the case of substructural identification, if the structure could not be found, the missing value was set to zero. To compensate for the multiple comparisons, the calculated p-value was adjusted using the Bonferroni-Holmes method with six comparisons. After adjustment, a p-value of p < 0.05 was used as a significance criterion. The sample size was estimated assuming a power of 0.8, a significance level of 0.05 for the Friedman test, and an effect size of 1. All authors had access to the study data and reviewed and approved the final manuscript.


#
#

Results

Phantom performance

Examples of the phantom images for all devices are provided in Supplementary Figures 2. The axial and lateral full width at half maximum (FWHM) dimensions are plotted for the devices at a depth between one and twelve cm in [Fig. 1]. Pins were imaged much wider (multiple mm) than their actual pin dimensions of 100µm. The GE E10 device as the reference device achieved a lateral FWHM of 5.33 mm (IQR: 4.01–6.72 mm) and an axial FWHM of 1.21 mm (IQR: 1.1–1.33 mm). Both the axial and lateral FWHM pin dimensions were significantly different between devices (Friedmann test p<0.0001). In comparison to the GE E10 device, the Lumify (p=0.006) and the T500 (p=0.003) showed a significantly lower axial FWHM. The Butterfly device showed the highest axial FWHM (i.e., lowest resolution) with a significant difference compared to the GE E10 (p=0.003, IQR 1.47–1.66mm). The other devices showed no difference compared to the GE E10. In the lateral dimensions, the highest FWHM was measured by the GE E10 (IQR of 4.01–6.71mm). The lowest values in the lateral dimension were seen in the VScan (p=0.003) and the Clarius HD (p=0.003) with a significant difference compared to the GE E10 in the post-hoc test after adjustment of the p-value. The other devices showed no difference when compared to the GE E10 in the lateral dimension.

Zoom Image
Fig. 1 FWHM measurement results of the phantom. [Fig. 1] shows the mean axial (panel A) and lateral (panel B) full width at half maximum (FWHM) dimensions of the 100µm phantom pins. Lower values represent better resolution in the respective dimensions. Vertical bars indicate the standard deviation of each measurement. Lateral resolution is in general roughly five-fold lower than axial resolution. The measured dimension differed for some devices from the GE E10 as indicated by the red stars (* for p < 0.05, ** for p < 0.01, *** for p < 0.001).

Overall, the lateral FHWM was positively correlated to the depth of the measurement (r=0.6 p=1.3×10-9) while the axial FHWM showed no correlation with the depth of the measurement (r=0.09, Supplementary Figure 3).


#

Identification of anatomical structures

10 probands were included in the study and measurements with all devices were performed on each of them (Supplementary Table 2). Clinicians were tasked with identifying anatomical structures with GI relevance (see Methods). Sample images of the liver for all devices are provided in Supplementary Figure 4. Additional images of the colon can be seen in Supplementary Figure 5. When comparing the performance of the devices, there were significant differences in their ability to image certain anatomical structures – most notably the sigmoid colon (Friedman test p<0.0001). For instance, with the Butterfly device, the sigmoid colon could not be identified in five out of ten patients (50%, [Fig. 2]) and only a median of one layer (IQR of 0–1.75 layers, p < 0.05) was visible, whereas using the reference GE E10 workstation, a median of four layers (IQR of 4–4 layers) was visible. In further pairwise comparisons with the GE E10, the Lumify (median=2.0, p=0.01), VScan (median=1.0, p=0.01), and the Clarius HD (median=3.0, p=0.03) also showed a significantly lower number of layers ([Fig. 2]). The Clarius HD3 (median = 3.0, IQR of 2.0–3.0) and the T500 (median = 3.5, IQR of 3.0–4.0) showed no significant difference in their ability to distinguish layers of the sigmoid colon. For the differentiation of anatomical structures in the portal field, the GE E10 workstation allowed the visualization of significantly more substructures (median = 2) in comparison to all handheld devices (median=1, p<0.05). Only the T500 (p = 0.27; median 2; IQR 1–2) showed no significant difference in the adjusted post-hoc test when compared to the GE E10.

Zoom Image
Fig. 2 Differentiation of clinical structures. In [Fig. 2], the number of discernible layers and structures is plotted on the left y-axes. Additionally, the number of subjects in which the structure could not be found is plotted in gray bars ranging from zero (structure identified in all subjects) to ten (structure found in no subject) as indicated on the right y-axes and in the gray bars. The dotted line emphasizes the two groups, HUS, and standard device. Significant differences after adjustment of the p-value for single devices in the comparison to the GE E10 are indicated by red stars (* for p < 0.05, ** for p < 0.01, *** for p < 0.001).

#

Measurements of anatomical structures

Aside from the principal ability to image certain anatomical structures (see above), there were no differences in the measured dimensions of anatomical structures ([Fig. 3]) between devices. The structures had a mean of 2.21mm with an IQR of 1.64–2.68 mm. The depth ranged from a median of 17.8 mm (IQR 13.6–22.4 mm) for the proximal sigma wall to a median of 105 mm (IQR 97–113 mm) for the portal field, thereby covering a comparable range as the used ultrasound phantom (see Supplementary Figure 6). No differences in the measured depth of the anatomical structures were found between the devices and the GE E10 after adjustment of the p-value.

Zoom Image
Fig. 3 Clinical measurements of anatomical structures. [Fig. 3] provides measurements of the dimensions of selected anatomical structures. Measurements are provided as box plots with the range, interquartile range (as a box), and the median as an orange line in millimeters which is reflected on the left y-axis of each panel. The number of subjects in whom the respective anatomical structure could not be visualized is provided as gray bars and referenced in the right y-axis. There were no significant differences between the devices and the GE E10 for the measured values after adjustment of the p-value.

#
#

Discussion

The availability, affordability and increasing diversity of handheld ultrasound devices opens new opportunities for point-of-care ultrasound with potentially groundbreaking changes in clinical practice. In order to more systematically assess the potential utility of these devices, we report a comparative study of five handheld ultrasound devices and two workstation generations with respect to both their performance on an ultrasound phantom and their clinical utility in the GI setting.

The handheld devices and workstations differed significantly with respect to phantom performance. However, no consistent pattern of superior resolution in both dimensions emerged. Specifically, ultrasound workstations and handheld devices did not differ systematically in their performance. Interestingly, some of the handheld devices significantly outperformed the ultrasound workstations, especially with respect to the lateral resolution.

The performance regarding the visualization of anatomical structures is, however, clearly much more relevant for the clinical utility of the devices. We selected a range of typical applications in the GI and hepatology fields, where point-of-care ultrasound might be especially helpful. Ultrasound is increasingly used for the diagnosis and long-term monitoring of disease activity in inflammatory bowel disease [17] [18] and HUS has even been proven in this [19]. For this application, the differentiation of bowel layers and measurement of their dimensions are needed for the assessment of disease activity. Thus, we included the ability to image the sigmoid colon and structure differentiation as one of the clinical tasks. Ultrasound is also a primary method for the diagnosis of cholecystitis [20] and is used as a screening method for pancreatic pathologies. We thus included the assessment of the gallbladder wall and the pancreatic duct. Furthermore, ultrasound-guided puncture of intrahepatic bile ducts is an effective technique in PTCD [21]. This requires the differentiation of the bile duct within the portal triads in order to avoid vascular puncture. Thus, we included the resolution of structure in the portal field as an example of interventional liver imaging.

In the clinical evaluation, devices differed substantially both in their ability to image certain organs and to show their anatomical structures. For instance, one handheld device did not allow the identification of the sigmoid colon in 50% of the probands, which underlines the limitations of the chosen abdominal preset. Although, as mentioned before, better performance might be achieved if switching between presets and probes had been enabled. Additionally, the performance in a clinical setting is also highly dependent on the setting and surroundings, with handheld devices showing higher variability due to their mobility and possibility to connect to different monitors [22]. To provide a comparison of the complete performance of the devices, an extended investigation would be required as has been provided by Bachmann et al. [23]. As an interesting general pattern, the ultrasound workstations still outperformed the handheld devices in the clinical imaging tasks. This pattern was most pronounced for the portal substructures, while in sigmoid colon imaging, the Clarius scanners performed equally as well as the workstations. Thus, although previous studies established the principal utility of handheld ultrasound for GI imaging [4] [5] [6] [7] [8] [9], we provide a broader and updated assessment of device performance across different domains. However, our study was limited to healthy volunteers with normal BMIs (see Supplementary Table 2). Therefore, the observed differences might be more pronounced for more demanding ultrasound investigations, like in obese patients [24].

In addition, the phantom performance according to EFSUMB for the evaluation of image quality did not translate directly into clinical imaging utility. Because the clinicians in the study were trained on the workstations, a bias observer assessment might have contributed to the results, although some handheld devices showed similar clinical performance in selected imaging tasks. In addition, clinicians were not permitted to change the settings in the scanners, potentially limiting the ability of the devices to cope with the requirements of the investigations. Butterfly, for example, integrates different classic transducer forms (convex, linear, phased array) by software integration into one transducer. As a transducer, it might, therefore, perform better if clinicians were allowed to adapt the settings to the requirements of the investigation. On the other hand, the fixation of the abdominal setting while using the device in challenging examinations allows for a better comparison of this specific modality between the different devices. Besides the FWHM, there are other objective parameters like the signal-to-noise ratio and grayscale resolution, which have not been included in our study [15]. Further studies are needed to investigate the relationship between objective parameters and clinical performance to establish a parameter set for the prediction of clinical performance to guide users and innovators in the development of new ultrasound devices.


#

Conclusion

By using the official procedure of the EFSUMB to compare multiple handheld devices and workstations, significant differences in their performance could be shown, with handheld devices partly outperforming the workstations. However, these phantom measurements did not translate into clinical performance in typical GI ultrasound imaging tasks. Workstations mostly showed a superior ability to image certain organs, especially in intestinal imaging, and allowed for better differentiation of substructures. Further parameter sets for the reliable prediction of clinical performance are needed.


#
#

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary Material


Correspondence

Dr. Moritz Herzog
Else Kröner Fresenius Center for Digital Health, TU Dresden Faculty of Medicine Carl Gustav Carus
Fetscherstraße 74
01307 Dresden
Germany   

Publication History

Received: 18 August 2023

Accepted after revision: 08 December 2023

Article published online:
01 March 2024

© 2024. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial-License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom Image
Fig. 1 FWHM measurement results of the phantom. [Fig. 1] shows the mean axial (panel A) and lateral (panel B) full width at half maximum (FWHM) dimensions of the 100µm phantom pins. Lower values represent better resolution in the respective dimensions. Vertical bars indicate the standard deviation of each measurement. Lateral resolution is in general roughly five-fold lower than axial resolution. The measured dimension differed for some devices from the GE E10 as indicated by the red stars (* for p < 0.05, ** for p < 0.01, *** for p < 0.001).
Zoom Image
Fig. 2 Differentiation of clinical structures. In [Fig. 2], the number of discernible layers and structures is plotted on the left y-axes. Additionally, the number of subjects in which the structure could not be found is plotted in gray bars ranging from zero (structure identified in all subjects) to ten (structure found in no subject) as indicated on the right y-axes and in the gray bars. The dotted line emphasizes the two groups, HUS, and standard device. Significant differences after adjustment of the p-value for single devices in the comparison to the GE E10 are indicated by red stars (* for p < 0.05, ** for p < 0.01, *** for p < 0.001).
Zoom Image
Fig. 3 Clinical measurements of anatomical structures. [Fig. 3] provides measurements of the dimensions of selected anatomical structures. Measurements are provided as box plots with the range, interquartile range (as a box), and the median as an orange line in millimeters which is reflected on the left y-axis of each panel. The number of subjects in whom the respective anatomical structure could not be visualized is provided as gray bars and referenced in the right y-axis. There were no significant differences between the devices and the GE E10 for the measured values after adjustment of the p-value.