Appl Clin Inform 2024; 15(05): 1121-1129
DOI: 10.1055/a-2414-7790
Special Topic on Teaching and Training Future Health Informaticians

Exploring the Impact of GitHub Copilot on Health Informatics Education

Authors

  • Sanja Avramovic

    1   Deprtment of Health Administration and Policy, College of Public Health, George Mason University, Fairfax, Virginia, United States
  • Ivan Avramovic

    2   Deprtment of Computer Science, College of Engineering and Computing, George Mason University, Fairfax, Virginia, United States
  • Janusz Wojtusiak

    1   Deprtment of Health Administration and Policy, College of Public Health, George Mason University, Fairfax, Virginia, United States

Funding None.
 

Abstract

Background The use of artificial intelligence-driven code completion tools, particularly the integration of GitHub Copilot with Visual Studio, has potential implications for Health Informatics education, particularly for students learning SQL and Python.

Objectives This study aims to evaluate the effectiveness of these tools in solving or assisting with the solution of problems found in Health Informatics coursework, ranging from simple to complex.

Methods The study assesses the performance of GitHub Copilot in generating code to solve programming problems normally given to students in introductory Health Informatics programming courses. Problem statements are provided to the tool; the response is assessed on correctness. The focus is on the impact of detailed explanations on the tool's effectiveness.

Results Findings reveal that GitHub Copilot can generate correct code for straightforward problems. The correctness and effectiveness of solutions decrease with problem complexity, and the tool struggles with the most challenging problems, although performance on complex problems improves with more detailed explanations.

Conclusion The study not only underscores the relevance of these tools to programming in Health Informatics education but also highlights the need for critical evaluation by students. It concludes with a call for educators to adapt swiftly to this rapidly evolving technology.


Background and Significance

The domain of software development, a fundamental set of skills used by Health Informatics majors, is very dynamic. Modern students can make use of powerful tools that assist in auto-generating code, and in the future, these tools are expected to evolve even further. Those training for programming roles are likely to find these tools readily available. This shift suggests a change in pedagogical approach, with more emphasis on reading and evaluating code rather than generating it.[1] The authors of a study from 2023 believe that the use of artificial intelligence (AI)-generated code, coupled with industry demands, will necessitate confronting ethical issues in computing education from the start of the curriculum.[2] Without swift and coordinated efforts, educators risk losing the opportunity to shape the future of this rapidly changing landscape, both in terms of potential opportunities and enduring challenges.

In the field of Health Informatics education, mastering programming languages like SQL or Python is crucial for students. These languages enable them to manage and analyze health data effectively, develop predictive models, clinical decision support tools, etc. The development process used by students can be influenced significantly by the use of AI tools, which are capable of anticipating and auto-completing their code. One prominent example of such a tool is GitHub Copilot, which can be integrated into the Visual Studio integrated development environment (IDE). This particular tool is the focus of this study due to the fact that it is free to use for students and professors.

GitHub Copilot, a collaboration between GitHub and OpenAI, is a tool that assists in writing code. It employs machine learning and natural language processing techniques to suggest code snippets, including entire functions, to developers as they write code. This comes in the form of auto-completion suggestions as the code is being written; these suggestions can be accepted or ignored. This code completion capability can be integrated into the Visual Studio IDE, a development platform created by Microsoft; at the time of this writing, Copilot is not usable with other IDEs, but it is reasonable to expect a greater number of similar AI tools to integrate with other coding platforms in the future. For the purposes of this study, Visual Studio and GitHub Copilot support SQL and Python code, both of which are relevant to Health Informatics education.

In terms of impact, opinion is divided as to whether the use of a tool like GitHub Copilot benefits software development education. Code generation tools such as Copilot will enhance students' ability to produce effective code in SQL or Python. However, as we integrate these tools into Health informatics education, it is important to consider several implications. For instance, while using AI-generated code, students should still be able to understand the code that they are implementing and to be able to evaluate its correctness.[3] They should also be aware of issues related to academic integrity and code reuse.[4] A study from 2023 suggested that using Copilot is like a “double-edged sword,” which can enhance the ability to produce code when used properly, but can be detrimental when one is unaware of its limitations.[5] To properly assess how an AI code generation tool impacts education, one must first be informed about what such a tool is capable of, especially the breadth and accuracy of its responses.

An earlier example of an AI tool that could generate computer code was the chatbot ChatGPT-3. A study of the success of ChatGPT-3 in solving problems from Health Informatics classes is described elsewhere.[6] This paper will focus on the more modern GitHub Copilot AI and evaluate its performance on assignments and assessments from two Health Informatics courses: Healthcare Databases, which teaches students to use T-SQL database programming in the context of health data; and Computational Tools in Health Informatics, which teaches students to synergize computing technologies to solve practical Health Informatics problems, primarily focusing on the Python programming language and computational tools. Both classes are graduate-level classes.

Related work has studied the effectiveness of Copilot as a software development tool. Studies have claimed that it does show promising capabilities in terms of useful code generation,[1] [3] including faster code development and improved code quality.[5] However, as a state-of-the-art, it still makes mistakes when asked to generate solutions to complex problems and can produce results that are inferior to code written by a human.[7] One study suggests that some software developers who use GitHub Copilot may spend more time scrutinizing its results than they would have taken to write the same code from scratch.[8] Some of the greatest concerns about Copilot as a development tool involve limitations in integration and accessibility, although concerns about code quality and privacy are also prevalent.[5]


Objectives

The primary objective of this study is to evaluate the effectiveness of GitHub Copilot and Visual Studio in assisting in solving Health Informatics problems ranging from basic syntax and operations to more complex, domain-specific tasks in SQL and Python. The study aims to assess the tool's performance for both introductory-level and graduate-level coursework in Health Informatics. The tool is assessed on a range of problems ranging from those which test fundamental syntax and basic operations in SQL and Python, to those which require domain-specific knowledge, use of advanced libraries, or an understanding of the limitations of the language. The study applies variations of actual problems taken from Health Informatics coursework. The study seeks to determine points of failure of the existing GitHub Copilot tool, and when it is straightforward to do so, to demonstrate work-arounds when the tool displays difficulty in producing a correct result.

When studying the accuracy of a code generating tool, we consider the relationship between the prompt that the tool receives and the resulting output that it produces. The prompt is the information provided to the tool from which it infers the problem that it is trying to solve. A prompt can either be explicit, where the tool is directly asked to solve a problem or produce a particular output, or it can be implicit, where the tool infers the nature of the expected result from the current context. When ChatGPT or Copilot is asked to produce code to solve a computing problem, this would be an example of an explicit prompt. When GitHub Copilot is provided with a programming objective described within code comments and/or an example of a partial solution, this serves as an implicit prompt.

In this study, we focus on implicit prompts provided in the form of a problem statement embedded in code comments. This form of implicit prompt provides a direct way in which a student can go directly from the problem statement on an assignment to a solution proposed by the AI tool. Furthermore, the entirety of the interaction takes place within the programming environment, thus representing a natural interaction for the student. It is for these two reasons that this form of implicit prompt is used. Our objective is to determine the relationship between the implicit prompt and the accuracy of the suggested code produced by GitHub Copilot when solving Health Informatics problems.


Methods

This study utilized GitHub Copilot integrated into Visual Studio to evaluate its effectiveness in assisting students with their classwork, homework, and exam problems. The problems were sourced from two main areas: the Healthcare Databases, where SQL was used; and Computational Tools in Health Informatics, where Python was employed. Variants of the actual assignments in these courses were used. Problems were selected from several different weeks of the course so as to create a range from simple to challenging, encompassing the breadth of issues that students are expected to solve in their coursework. Both begin with basic problems meant to introduce Health Informatics students to programming concepts, before transitioning to domain-specific problems.

For each problem, we initiated the process by providing a problem statement in the comments of the code and starting to write a reference solution, thereby prompting GitHub Copilot to suggest the remainder of the code. If necessary, code up to the first operator, assignment, or punctuation was typed (e.g., “variable =”) to prime the tool to produce an output, although in many cases—especially with SQL code—this was not necessary.

The tool's performance was then evaluated based on the correctness of the suggested code. In practice, based on the complexity and straightforwardness of the problem and the assumptions made by the code generation tool, sometimes it was capable of generating correct code and sometimes it was not. The correctness is given as a simple “correct” or “incorrect” depending on whether the code successfully solved the problem as stated, even if the efficiency of correct solutions and the severity of error in the incorrect solutions varied. In the cases where the tool provided an incorrect answer, the problem statement used as a prompt for the tool was revised to attempt to produce a more accurate answer.


Results

Problem Samples from the Health Database Class (SQL)

In all SQL problem examples, the problem statement was provided in the comments. No additional prompting was needed for Copilot to produce a suggestion. Questions asked varied in complexity and nuance. Results are shown in [Table 1] in order of increasing complexity. [Table 1] provides the questions used to test the tool and responses provided, including the Initial Prompt, the Refined Prompt (where applicable), the Copilot's Response, and the Correctness of the Response.

Table 1

Prompts and responses for SQL

Case 1 (Correct)

Initial Prompt:

 Find the total number of rows in the table dbo.PATIENTS

Copilot's Response:

 select count(*) from DBO.PATIENTS

Case 2 (Correct)

Initial Prompt:

 Find the minimum, maximum, average AGE, and standard deviation from the table dbo.PATIENTS

Copilot's Response:

 select min(AGE), max(AGE), avg(AGE), stdev(AGE) from DBO.PATIENTS

Case 3 (Correct)

Initial Prompt:

 Find the minimum, maximum, average Age, and standard deviation per patient from the table dbo.PATIENTS

Copilot's Response:

 select PATIENT_ID, min(AGE), max(AGE), avg(AGE), stdev(AGE) from DBO.PATIENTS group by PATIENT_ID

Case 4 (Incorrect)

Initial Prompt:

 Find the product of the column LR from the table #lr

Copilot's Response:

 select product (LR) from #lr

Explanation:

 T-SQL lacks a Product function

Case 4a (Correct)

Revised Prompt:

 Find the product of the column LR from the table #lr using T-SQL which doesn't have a built-in function for PRODUCT

Copilot's Response:

 select EXP(SUM(LOG(LR))) from #LR

Case 5 (Incorrect)

Initial Prompt:

 Calculate the MEDIAN of the column AGE from the table dbo.Patients

Copilot's Response:

 select median(AGE) from DBO.PATIENTS

Explanation:

 T-SQL lacks a Median function

Case 5a (Correct)

Revised Prompt:

 Calculate the MEDIAN of the column AGE from the table dbo.Patients using T-SQL which doesn't have a built-in function for MEDIAN

Copilot's Response:

 select AVG(AGE) from (select AGE, row_number() over (order by AGE) as rownum from DBO.PATIENTS) as temp

 where rownum in ((select count(*) from DBO.PATIENTS)/2, (select count(*) from DBO.PATIENTS)/2 + 1)

Notes:

 Students in the class are exposed to several different ways to calculate the median. If they try this approach using a very large table, it will produce an accurate result but is not as fast as other approaches taught in class

Case 6 (Incorrect)

Initial Prompt:

 Find the total number of patients. Use the table dbo.admission

Copilot's Response:

 select count(PATIENT_ID) from dbo.admission

Explanation:

 Incorrect results would occur when several rows (admissions to the hospital) for the same patient can be found in the Admission table

Case 6a (Correct)

Revised Prompt:

 Find the total number of patients. Use the table dbo.admission. We may have a patient with several admissions to the hospital over the time period observing

Copilot's Response:

 select count(distinct PATIENT_ID) from dbo.admission

Case 7 (Incorrect)

Initial Prompt:

 Find the patients from the table dbo.Admission who don't have DIABETES MELLITUS, To find diabetes use the column icd and the value '250.00'

Copilot's Response:

 select * from DBO.ADMISSION where icd != '250.00'

Explanation:

 We can expect multiple rows per patient; this code will select all admissions where the primary diagnosis was not diabetes

Case 7a (Correct)

Revised Prompt:

 Find the patients from the table dbo.Admission who don't DIABETES MELLITUS, To find diabetes use the column icd and the value '250.00'. We may have the same patient with several admissions and different diagnoses

Copilot's Response:

 select distinct PATIENT_ID from DBO.ADMISSION where PATIENT_ID not in (select distinct PATIENT_ID from DBO.ADMISSION where icd = '250.00')

Notes:

 Not optimized. Left Join would have resulted in better performance

Case 8 (Incorrect)

Initial Prompt:

 Using the table dbo.probs (Patient_ID, Actual, Predicted) calculate Sensitivity and 1-Specificity and AUC

Copilot's Response:

 select (select count(*) from DBO.PROBS where ACTUAL = 1 and PREDICTED > (select max(PREDICTED) from DBO.PROBS where ACTUAL = 0)) as TP,

 (select count(*) from DBO.PROBS where ACTUAL = 0 and PREDICTED > (select max(PREDICTED) from DBO.PROBS where ACTUAL = 0)) as FP,

 (select count(*) from DBO.PROBS where ACTUAL = 1 and PREDICTED <= (select max(PREDICTED) from DBO.PROBS where ACTUAL = 0)) as FN,

 (select count(*) from DBO.PROBS where ACTUAL = 0 and PREDICTED <= (select max(PREDICTED) from DBO.PROBS where ACTUAL = 0)) as TN

Explanation:

 The tool did not answer the question. The tool calculated TP (true positive), FP (false positive), TN (true negative), and FN (false negative), but it did not provide the code for computing sensitivity and specificity; these two values could be calculated using the provided result. Furthermore, it did not calculate AUC (area under the curve)

Case 8a (Incorrect)

Revised Prompt:

 Multiple variations of the prompt were attempted

Notes:

 Variations on the prompt did not lead to a more correct answer

In the most basic examples of SQL usage (Cases 1–3 from the table), Copilot was easily able to produce correct code. Each of these examples involved a basic selection of data or the application of basic functions to the data, for example finding minimum, maximum, average, etc., or finding the number of distinct rows.

In Cases 5 to 6, Copilot is asked to produce a result in which the means to produce the desired result is not directly available. In both cases, Copilot generalized that a function (product or median, respectively) was required to complete the task. These results were incorrect, however, because the functions do not exist. For example, T-SQL does not include functions that can directly compute the product or median of a set of data. These two examples are commonly used in homework assignments and exams in the class since they require students to apply critical thinking to recognize that the named functions do not exist and that an alternate solution is necessary.

In both cases where the required operation was not directly provided by a built-in function, the prompt was revised to include the hint that the desired operation was not provided by T-SQL. With the amended prompt, Copilot was able to generate a correct result in both cases. In Case 5a (producing a median), Copilot produced a correct answer to the amended prompt, but the response was not the most efficient approach to solve the problem.

Cases 5a and 7a both represent examples of responses that are correct but not the most efficient method possible.

Cases 6 and 7 are both problems that are made complex due to subtlety in the solution. In both examples, the problem requires the student to consider the possibility of multiple entries for the same patient. In both solutions, Copilot fails to take the possibility of multiple entries into account. If Copilot is provided the additional hint that multiple entries are a possibility, then it will produce a correct answer in both cases.

In all of Cases 4 to 7, the addition of details and clarity to the prompt resulted in a correct answer.

Case 8 represents the most complex problem that was presented. Copilot was challenged to produce the specificity, sensitivity, and area under the curve of a set of data. The problem itself is clear and well-specified; however, the expected answer is nontrivial. Copilot was able to produce results that were relevant to the desired solution—involving calculating true and false positives and negatives—but which did not provide the desired solution. Attempts to modify the prompt did not lead to a clear improvement in the answer.


Problems from Computational Tools in Health Informatics (Python)

When Python assignments were tested, most Copilot responses were at least partially correct. However, in some of the examples, it was necessary to begin typing the first word of code before an output was suggested. By comparison to the SQL assignments, which could have been described as practical and applied, many of the Python assignments could be more generally described as introductory programming tasks, although some are more sophisticated. In this part of the study, no attempt was made to revise prompts because the recommended solutions only deviated from a correct solution due to the proper use of indentation.

Cases 1, 2, and 5 in [Table 2] produced correct answers. Case 3 demonstrated logic, which was essentially correct, but the improper use of indentation on two of the lines would have led to a syntax error. The Python programming language is sensitive to the proper use of indentation, because it uses indentation for grouping code. In Cases 4, 6, and 7, the lack of proper indentation would have subtly altered the logic of the code. For example, in Case 4, the problem asks to find the largest of several numbers; however, after the initial comparisons are performed, subsequent comparisons are conditionally performed only if all numbers are in increasing order. Cases 6 and 7 also have indentation-related flaws in that the desired output gets printed potentially at every step instead of only once at the end. Indentation-related problems were common to Copilot-suggested solutions to Python programming tasks.

Table 2

Prompts and responses for Python

Case 1 (Correct)

Initial Prompt:

 Write a program that displays three lines of text:Hello there, this is a Python program

 and it is fun to use!

Copilot's Response:

 print(“Hello there,”)

 print(“this is a Python program”)

 print(“and it is fun to use!”)

Case 2 (Correct)

Initial Prompt:Write a program in Python that asks for height and weight and calculates BMICopilot's Response:

 height = float(input(“Enter your height in meters: “))

 weight = float(input(“Enter your weight in kilograms: “))

 bmi = weight / (height ** 2)

 print(“Your BMI is”:, bmi)

Case 3 (Incorrect)

Initial Prompt:

 Write a program that asks for a name. If the name is “person,” the program responds “Hello, person” and otherwise says “I don't know you but nice to meet you”

Copilot's Response:

 name = input(“Enter your name: “)

 if name == “person”:

 print(“Hello, person”)

 else:

 print(“I don't know you but nice to meet you!”)

Explanation:

 The improper indentation on the final two lines would produce a syntax error, but logically the code is correct.

Case 4 (Incorrect)

Initial Prompt:

 Write a program that asks for five numbers and returns the largest of them. Do not use the max function built into Python

Copilot's Response:

 n1 = float(input(“Enter the first number: “))

 n2 = float(input(“Enter the second number: “))

 n3 = float(input(“Enter the third number: “))

 n4 = float(input(“Enter the fourth number: “))

 n5 = float(input(“Enter the fifth number: “))

 largest = n1

 if n2 > largest:

 largest = n2

 if n3 > largest:

 largest = n3

 if n4 > largest:

 largest = n4

 if n5 > largest:

 largest = n5

 print(“The largest number is”:, largest)

Explanation:

 The follow-up condition checks would only be executed if the preceding checks succeeded.

Case 5 (Correct)

Initial Prompt:

 Write a program that prints all numbers between 27 and 78, one number per line

Copilot's Response:

 for i in range(27, 79):

 print(i)

Case 6 (Incorrect)

Initial Prompt:

 Using a for loop, write a program that finds the smallest number in the set {1, 2, 5, 3, 9, −1, 0, 2, −10, 8}

Copilot's Response:

 numbers = [1, 2, 5, 3, 9, −1, 0, 2, −10, 8]

 smallest = numbers[0]

 for number in numbers:

 if number < smallest:

 smallest = number

 print(“The smallest number is”:, smallest)

Explanation:

 The smallest number is printed each time it is updated rather than once at the end

Case 7 (Incorrect)

Initial Prompt:

 Calculate the arithmetic mean of all numbers in the previous set. Do not use any Python built-in functions

Copilot's Response:

 sum = 0

 for number in numbers:

 sum += number

 mean = sum / len(numbers)

 print(“The mean is”:, mean)

Explanation:

 The mean is recomputed and reprinted after each number in the list rather than once at the end

Case 8 (Correct)

Initial Prompt:

 Write a cgi script into which data from a form containing first name, last name, email address, and comments is processed. The script should display a thank you note for submitting the data, and store the data into a text file on the server

Copilot's Response:

 import cgi

 form = cgi.FieldStorage()

 first_name = form.getvalue(“first_name”)

 last_name = form.getvalue(“last_name”)

 email = form.getvalue(“email”)

 comments = form.getvalue(“comments”)

 print(“Content-type: text/html\n”)

 print(“<html>”)

 print(“<head>”)

 print(“<title>Thank you!</title>”)

 print(“</head>”)

 print(“<body>”)

 print(“<h1>Thank you for submitting the form!</h1>”)

 print(“<p>Your data has been saved.</p>”)

 print(“</body>”)

 print(“</html>”)

 filename = “data.txt”

 with open(filename, “a”) as file:

 file.write(first_name + ” ” + last_name + ” ” + email + ” ” + comments + “\n”)

Note:

 Although the provided solution is correct, it makes use of the deprecated cgi.FieldStorage code, which is no longer available as part of the most recent releases of Python

Case 9 (Correct)

Initial Prompt:

 We are given a pandas dataframe called data which includes a column called “Age.” Create a field that indicates if patient age is above average in the data by first grouping age in ranges of 10, then using the ranges to produce an age estimate, then finding the mean of the estimates. Create a column called “AboveAverage” which is true or false depending on whether the age is above the estimated average

Copilot's Response:

 import pandas as pd

 data = pd.read_csv(“data.csv”)

 data[“AgeRange”] = pd.cut(data[“Age”], range(0, 101, 10))

 data[“AgeEstimate”] = data[“AgeRange”].apply(lambda x: x.mid)

 mean_age = data[“AgeEstimate”].mean()

 data[“AboveAverage”] = data[“AgeEstimate”] > mean_age

Cases 8 and 9 represent more sophisticated, practical problems. In Case 8, the student is asked to program an interaction for a Web site. Copilot was successfully able to produce the proper code for the described task, including separating the tasks of retrieving input, producing a web page, and saving the results. Case 9 involves a sequence of operations involving the manipulation of data. Both cases were implemented correctly; however, the solution to Case 8 had a shortcoming that it made use of deprecated code; the most recent version of Python no longer includes cgi.FieldStorage, which is integral to this particular solution.

In some instances, repeating the prompt multiple times would produce different solutions. For example, when asked to calculate body mass index (BMI) in Case 2, sometimes Copilot would suggest a solution that would only compute the BMI value, but at other times, it would also deduce and announce the associated BMI classification. In Case 3, when asked to find the largest number, an alternate solution that was produced involved testing independently for each of the five numbers to see if it was the largest—a correct but inefficient solution.



Limitations

The study focused on problems from only two Health Informatics courses. While the courses were chosen as very technical courses within the program, a broader range of courses might reveal further specific shortcomings of the AI tool. Additionally, the problems used in the study are derived from student coursework. While this may reflect tasks that students may be responsible for in the industry, it might not be reflective of challenges faced when performing cutting-edge research.

The study focused on a specific AI tool and development environment. The existence of effective free tools will evolve with time, and future work would benefit from incorporating the abilities and limitations of additional tools.

Finally, the study was performed without student input. Student involvement could help illuminate the means by which the tools are most commonly used.


Discussion

Our study provides insights into the application of GitHub Copilot in the context of Health Informatics education, particularly for students learning SQL and Python programming. The findings reveal both capabilities and limitations to using AI-based code generation tools for producing computer code in an educational setting.

The ability of GitHub Copilot to generate correct code for straightforward questions and those requiring built-in functions demonstrates that it does have potential—whether in the present or later when the technology matures—as a learning aid. It can provide immediate feedback and suggestions, enabling students to learn from the tool's responses or verify their own work. This can be particularly beneficial for novice programmers who are still grasping the fundamentals of SQL and Python.

However, to be a truly reliable tool, a student should be able to trust it to provide accurate answers of reasonable quality. This study highlights some limitations of the tool in this regard. This study identified several scenarios in which Copilot would provide incorrect or imperfect answers. For example, although it did show the ability to generalize while seeking a solution, it lacked the context-awareness to identify functions that could not be used, because they were not built into the language of choice. In other examples, although it was able to deduce a straightforward approach to a problem, it was unable to apply critical thinking to recognize situations that would cause the solution to fail. In Python examples, Copilot frequently misapplied indentation in ways that would alter the logic of the code.

These examples all underscore the importance of human oversight and intervention when using such tools. While GitHub Copilot can generate code based on prompts, there is a risk that students might use the code without understanding it, which would hinder their learning process and lead to a superficial understanding of programming concepts. Students need to critically evaluate the suggestions provided by the tool rather than relying on it blindly.

In some of the previous examples, the output was corrected by modifying the prompt to include more detail and qualifying criteria. Missing functions could be avoided if Copilot was told not to use them, while special cases could be accounted for if Copilot was reminded that they could arise. Educationally speaking, this shows that the student still retains a key role in recognizing and integrating the aspects of a problem that give it purpose. It serves as a lesson that a student who is a novice programmer should not be taught to trust AI suggestions blindly. It also points to the potential for using the tool to encourage students to articulate their problem-solving process more clearly.

In several examples, the tool was able to produce a correct result, although it was not the most efficient well-known approach to solving the problem. Copilot also produced inconsistent results, in which it would give less efficient results depending on when it was asked. As an educational example, inefficient solutions can be practically accessible to students but would not be an ideal example to learn from. Such examples show that Copilot remains inferior to a professional or subject matter expert, although this may change with time as tools mature.

The most challenging questions posed a significant difficulty for the tool. However, with more detailed explanations, the tool's performance improved. This indicates that while the tool has impressive capabilities, it is not infallible. It also suggests that the tool may be more suited to assisting with simpler tasks and providing a starting point for more complex problems. By contrast, more sophisticated practical questions in Python were solved without a problem. This may be due in part to the popularity of Python as a language and the large breadth of sample code that an AI tool can draw from.

Beyond straight coding problems, Copilot is challenged when domain expertise is needed in addition to programming. That was exemplified in the problem of counting patients in the admissions table, which required the understanding that such a table is likely to have patient duplicates. This is particularly important in informatics education. One of the primary concerns is the potential for misuse of such tools in academic settings. Students might rely heavily on AI-generated code for their assignments, which could blur the lines between academic integrity and original work. There is also the risk that innocuous use of such tools may hamper or replace student learning: An anecdotal situation involved students who used Copilot's code suggestions to improve the quality of their code while still developing their solutions themselves; however, students who used this practice have complained of being unable to reproduce the same effectiveness on exams and quizzes, where the use of AI is not an option. The concern is that overreliance on Copilot could lead to a lack of independent coding and problem-solving skills.[3] It's important for educators to clearly define what constitutes acceptable use of these tools in their courses.

There are several strategies that instructors could use to prevent misuse of the tool, some of which were mentioned in the literature when analyzing ChatGPT-3 performances.[6] Instructors should be conscientious of student progress. If students who were struggling suddenly submit perfect work, the instructor should change the way that the knowledge is evaluated. If students experience drops in performance after coming into contact with code generation tools, the professor should consider regulating and restricting how those tools are used.

When approaching the question of integrity, a multifactor strategy is recommendable. There are some assessment formats—for example, in-person pencil-and-paper exams, or class presentations, which are open to question-and-answer—where the use of AI tools is not an option. A blend of such approaches may be desirable to reduce concerns of abuse of such tools.

Part of responsible tool use is education and familiarity. It is important for students to understand that even if AI tools can supplement human capabilities, they have still not fully supplanted humans. As such, students are responsible for doing their part in improving their personal knowledge. For comparison, calculators are already helpful in mathematical education, yet a student will only learn arithmetic if they spend the effort to solve problems themselves. It is recommended to have open discussions with students about the responsible use of AI tools, including mentioning benefits and limitations, and to reinforce the importance of their own critical thinking and knowledge. This can be reinforced by inviting guest speakers from university honor code units, as this will remind students of the value of academic integrity. The goal should be to help students learn how to use tools ethically while also preparing them if they need to use those tools in their future careers.

Informally, we can claim that students do already recognize the value of their personal knowledge. As an example of what we mean, our department regularly conducts an Overview of Skills survey of our graduating students as part of their Capstone; our most recent cohort (n = 6) unanimously indicated a preference for more in-person practical laboratories in which instructors and teaching assistants will practice coding with students. We view that as a desire for personal educational growth is present, even if AI tools can automate part of what humans can do.


Conclusion

The integration of GitHub Copilot and Visual Studio presents opportunities for enhancing Health Informatics education. However, it still presents challenges due to the limitations of the tool in producing correct, complete, and quality output, and it is these challenges that need to be addressed. Educators need to be aware of the rapid changes in the landscape of AI, including what the technology is currently capable of and how it can impact student learning. They need to be ready to adapt their lesson plans to the technology; to inform their students' perspectives about the role of code-generating tools; and to place proper guidelines that are conducive to the use rather than abuse of such tools. Further research is needed to explore how best to integrate these tools into the curriculum and to fully understand their impact on student learning outcomes.


Clinical Relevance Statement

The clinical relevance of this study lies in assessing the implications of a transformational technology such as AI on health care-specific educational assessments. By presenting the capabilities of the tool in a Health Informatics context, the study shows the need for health educators to adapt, and it provides clarity on what they need to adapt to.


Multiple-Choice Questions

  1. What is an example of a situation when Copilot shows a tendency to produce errors?

    • Basic, straightforward problems in Python and SQL.

    • Python code to read web form input and produce a dynamic web page.

    • When an intuitive choice of function is unexpectedly not available.

    • Copilot produces errors at a negligible rate.

    Correct Answer: The correct answer is option c. The study shows multiple examples where an intuitive function—such as product or median—is not available in the language. In these cases, Copilot nevertheless chooses to use the function in the solution. Nevertheless, if provided with a warning that the function does not exist, Copilot is able to adapt an acceptable alternative.

  2. If Copilot does not produce a good answer what strategy can be used to correct the response?

    • Nothing, you cannot improve the AI tools' response.

    • Ask a more complicated question to provide perspective.

    • Include clarifying details in the prompt.

    • Reinstall the tool.

    Correct Answer: The correct answer is option c. As the study demonstrated, providing more information to Copilot allowed it to correct and refine its answer in some cases, thereby avoiding its originally incorrect answer. The AI's answers are not immutable, so modifying the prompt or even asking the question again can produce a better result. However, while increased detail in the prompt improves the response, increased complexity in the nature of the result that is sought can prove difficult for the tool.



Conflict of Interest

None declared.

Protection of Human and Animal Subjects

This study did not involve human subjects and therefore did not require approval from an Institutional Review Board or Ethics Committee.


  • References

  • 1 Puryear B, Sprint G. Github Copilot in the classroom: learning to code with AI assistance. J Comput Sci Coll 2022; 38 (01) 37-47
  • 2 Becker BA, Denny P, Finnie-Ansley J, Luxton-Reilly A, Prather J, Santos EA. Programming is hard-or at least it used to be: Educational opportunities and challenges of AI code generation. ACM Technical Symposium on Computer Science Education 2023; 54 (01) 500-506
  • 3 Dakhel AM, Majdinasab V, Nikanjam A, Khomh F, Desmarais MC, Jiang ZM. GitHub Copilot AI pair programmer: asset or liability?. J Syst Softw 2023; 203: 111734
  • 4 Finnie-Ansley J, Denny P, Becker BA, Luxton-Reilly A, Prather J. The robots are coming: Exploring the implication of OpenAI Codex on introductory programming. Proceedings of the 24th Australasian Computing Education Conference; 2022:10–19
  • 5 Zhang B, Liang P, Zhou X, Ahmad A, Waseem M. Practices and challenges of using GitHub Copilot: An empirical study. Proceedings of the 35th International Conference on Software Engineering and Knowledge Management; 2023.
  • 6 Avramovic S, Avramovic I. Exploring the potential benefits and limitations of using AI text generation tools in health informatics education: an examination of ChatGPT's performance on assessments. J Health Adm Educ 2024
  • 7 Imai S. Is Github Copilot a substitute for human pair-programming? An empirical study. Proceedings of the 44th International Conference on Software Engineering. 2022: 319–321
  • 8 Bird C, Ford D, Zimmermann T. et al. Taking flight with Copilot: early insights and opportunities of AI-powered pair-programming tools. ACM Queue 2023; 20 (06) 35-57

Address for correspondence

Sanja Avramovic, PhD
Department of Computer Science, College of Engineering and Computing, George Mason University, 4400 University Drive
Fairfax, VA 22030
United States   

Publication History

Received: 29 May 2024

Accepted: 11 September 2024

Accepted Manuscript online:
13 September 2024

Article published online:
25 December 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

  • References

  • 1 Puryear B, Sprint G. Github Copilot in the classroom: learning to code with AI assistance. J Comput Sci Coll 2022; 38 (01) 37-47
  • 2 Becker BA, Denny P, Finnie-Ansley J, Luxton-Reilly A, Prather J, Santos EA. Programming is hard-or at least it used to be: Educational opportunities and challenges of AI code generation. ACM Technical Symposium on Computer Science Education 2023; 54 (01) 500-506
  • 3 Dakhel AM, Majdinasab V, Nikanjam A, Khomh F, Desmarais MC, Jiang ZM. GitHub Copilot AI pair programmer: asset or liability?. J Syst Softw 2023; 203: 111734
  • 4 Finnie-Ansley J, Denny P, Becker BA, Luxton-Reilly A, Prather J. The robots are coming: Exploring the implication of OpenAI Codex on introductory programming. Proceedings of the 24th Australasian Computing Education Conference; 2022:10–19
  • 5 Zhang B, Liang P, Zhou X, Ahmad A, Waseem M. Practices and challenges of using GitHub Copilot: An empirical study. Proceedings of the 35th International Conference on Software Engineering and Knowledge Management; 2023.
  • 6 Avramovic S, Avramovic I. Exploring the potential benefits and limitations of using AI text generation tools in health informatics education: an examination of ChatGPT's performance on assessments. J Health Adm Educ 2024
  • 7 Imai S. Is Github Copilot a substitute for human pair-programming? An empirical study. Proceedings of the 44th International Conference on Software Engineering. 2022: 319–321
  • 8 Bird C, Ford D, Zimmermann T. et al. Taking flight with Copilot: early insights and opportunities of AI-powered pair-programming tools. ACM Queue 2023; 20 (06) 35-57