Endoscopy 2019; 51(12): 1113-1114
DOI: 10.1055/a-0999-5476
Editorial
© Georg Thieme Verlag KG Stuttgart · New York

Time for second-generation artificial intelligence in medical imaging

Referring to Cho BJ et al. p. 1121–1129
Fons van der Sommen
Electrical Engineering, Eindhoven University of Technology, Eindhoven, Netherlands
› Author Affiliations
Further Information

Publication History

Publication Date:
27 November 2019 (online)

After decades of modest successes and setbacks, it is safe to say that artificial intelligence (AI) is here to stay. Although most of the basic concepts of modern AI technology date back to the second half of the previous century, the enormous increment in data and computational power over the past decade has enabled it to demonstrate its full potential. A branch of AI known as machine learning has been responsible for most of the progress, with remarkable results: from generating hyper-realistic fake images [1] to the detection of lymph node metastases in women with breast cancer [2].

“The arrival of deep learning has irreversibly changed the field of AI by showing the first convincing success of machine learning algorithms.”

The key technology driving this progress is a sub-field of machine learning known as deep learning, in which deep artificial neural networks are exploited to learn informative patterns from data. Before the era of deep learning, engineers devised features to capture information from data, which they deemed relevant for the task at hand. For example, edges and color values for the classification of images of cars and buildings or for the discrimination between nondysplastic and early neoplastic tissue in Barrett’s esophagus in the field of endoscopy [3]. In contrast, deep learning aims to learn these features automatically from the data.

In this issue of Endoscopy, Bang et al. describe a method for classification of gastric neoplasms in endoscopic images using deep learning [4]. In their work, the authors evaluated three existing neural network architectures, which they fine-tuned by means of transfer learning using a large retrospectively collected data set. After achieving promising results using an internal validation, they collected a prospective data set for external validation. Although the scores on the prospective set were considerably lower (Table 3 vs. supplementary Table 3 in the manuscript), the authors should receive much credit for including these, as such an external validation serves as a much better indicator of clinical performance. In addition, the comparison with two expert endoscopists helps to put the algorithm performance into perspective: straightforward application of deep learning holds promise for the gastric lesion classification, but it still falls considerably short compared with the expert endoscopist.

Although this study takes a crucial first step toward the application of AI for gastric cancer screening, it includes several limitations that curtail its value. Among these limitations are the considerable difference between internal and external validation (e. g. 85.5 % vs. 73.5 % accuracy) and a greatly skewed performance (e. g. high grade dysplasia is not detected at all in either validation). However, these limitations can be observed in many similar studies on the application of AI and are symptoms of a much broader problem that hinders the field in moving forward.

The wide availability of software tools and online educational platforms for deep learning have enabled researchers with various backgrounds to investigate the applicability of AI in their field. Although this broad accessibility to such tools strongly fuels the progress of AI, there are also some serious risks attached to this development. Because data sets are the primary drivers of deep learning algorithms, the data collection and the experimental set-up used for validation are of crucial importance. With the move from hand-engineered features toward a data-driven approach, the possibility to exercise some control over what the system actually learns has diminished considerably. As deep learning will learn any pattern in the data, bias lurks around every corner. Moreover, these software tools are mostly plug-and-play, automatically filling in a lot of the blanks. While this is great for ease of use, it poses a risk with regard to overfitting when the user is not aware of these blanks. These aspects render a strong collaboration between medical doctors and engineers necessary: creating a common understanding of the potential and limitations of AI, setting standards on development, evaluation and reporting, and jointly defining the AI tools that would help clinicians in their daily work – which was also pointed out by Michael Byrne earlier this year [5].

The translation of AI methods to clinical practice is also far from trivial. Studies have shown that application of supportive AI systems can even lead to a reduced performance when the system has no notion of uncertainty [6]. Furthermore, most published algorithms have been developed and evaluated in a controlled environment on carefully selected, high quality data sets. In clinical practice, the quality of the data is generally much lower and often a considerable class imbalance is to be expected, leading to lower scores than originally reported. In an editorial earlier this year, Yuichi Mori addressed this problem in the field of colonoscopy and called for high quality prospective studies [7]. Moreover, disturbing results have been reported concerning the stability of modern AI approaches, in which unperceivable changes to an image could lead to completely incorrect predictions [8]. For example, an image of a school bus is classified as an ostrich after applying a very small distortion. Given the above considerations, aspects such as uncertainty, robustness, and interpretability of deep learning algorithms will become increasingly important for further progression of AI, especially in a clinical setting, where errors can have severe consequences.

The arrival of deep learning has irreversibly changed the field of AI by showing the first convincing success of machine learning algorithms. However, the promising results that have been reported so far have been mostly acquired in a controlled environment to demonstrate that these systems can excel at a wide range of specific tasks – under perfect conditions. We are now past the point at which the potential of AI could be questioned and we should move beyond feasibility studies. The next generation of AI in medical imaging will be developed in clinical practice, in a less controlled environment where new challenges will surface and opportunities will arise. Compare this to the construction of a ship, which, after many years of development in the shipyard, finally stays afloat in the harbor. Now, it is time to leave the controlled environment of the harbor and face the challenges of the open seas and discover the unexplored land beyond the horizon.

 
  • References

  • 1 Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019; 4401-4410
  • 2 Bejnordi BE, Veta M, van Diest PJ. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 2017; 318: 2199-2210
  • 3 de Groof J, van der Sommen F, van der Putten J. et al. The Argos project: the development of a computer-aided detection system to improve detection of Barrett’s neoplasia on white light endoscopy. United European Gastroenterol J 2018; 7: 538-547
  • 4 Cho BJ, Bang CS, Park SW. et al. Automated classification of gastric neoplasms in endoscopic images using a convolutional neural network. Endoscopy 2019; 51: 1121-1129
  • 5 Byrne MF. Artificial intelligence and the future of endoscopy: should we be quietly excited?. Endoscopy 2019; 51: 511-512
  • 6 Jorritsma W, Cnossen F, van Ooijen PMA. Improving the radiologist – CAD interaction: designing for appropriate trust. Clin Radiol 2015; 70: 115-122
  • 7 Mori Y. Artificial intelligence and colonoscopy: the time is ripe to begin clinical trials. Endoscopy 2019; 51: 219-220
  • 8 Szegedy C, Wojciech Z, Sutskever I. et al. Intriguing properties of neural networks. arXiv:1312.6199. 2013. Available at: https://arxiv.org/abs/1312.6199