AI: ChatGPT can outperform university students at writing assignments
ChatGPT may match or even exceed the average grade of university students when answering assessment questions across a range of subjects including computer science, political studies, engineering, and psychology, reports a paper published in Scientific Reports. The research also found that almost three-quarters of students surveyed would use ChatGPT to help with their assignments, despite many educators considering its use to be plagiarism.
To investigate how ChatGPT performed when writing university assessments compared to students, Talal Rahwan and Yasir Zaki invited faculty members who taught32 different courses at New York University Abu Dhabi (NYUAD) to provide three student submissions each for ten assessment questions that they had set. ChatGPT was then asked to produce three sets of answers to the ten questions, which were then assessed alongside student-written answers by three graders (who were unaware of the source of the answers). The ChatGPT-generated answers achieved a similar or higher average grade than students in 9 of 32 courses. Only mathematics and economics courses saw students consistently outperform ChatGPT. ChatGPT outperformed students most markedly in the ‘Introduction to Public Policy’ course, where its average grade was 9.56 compared to 4.39 for students.
The authors also surveyed views on whether ChatGPT could be used to assist with university assignments among 1,601 individuals from Brazil, India, Japan, the US, and the UK (including at least 200 students and 100 educators from each country). 74 percent of students indicated that they would use ChatGPT in their work. In contrast, in all countries, educators underestimated the proportion of students that plan to use ChatGPT and 70 percent of educators reported that they would treat its use as plagiarism.
Finally, the authors report that two tools for identifying AI-generated text — GPTZero and AI text classifier — misclassified the ChatGPT answers generated in this research as written by a human 32 percent and 49 percent of the time respectively.
Together, these findings offer insights that could inform policy for the use of AI tools within educational settings.
JOURNAL
Scientific Reports
ARTICLE TITLE
Perception, performance, and detectability of conversational artificial intelligence across 32 university courses
ARTICLE PUBLICATION DATE
24-Aug-2023
ChatGPT shows limited ability to recommend guidelines-based cancer treatments
Correct and incorrect recommendations inter-mingled in one-third of the chatbot’s responses, making errors more difficult to detect
Peer-Reviewed PublicationCorrect and incorrect recommendations inter-mingled in one-third of the chatbot’s responses, making errors more difficult to detect
For many patients, the internet serves as a powerful tool for self-education on medical topics. With ChatGPT now at patients’ fingertips, researchers from Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, assessed how consistently the artificial intelligence chatbot provides recommendations for cancer treatment that align with National Comprehensive Cancer Network (NCCN) guidelines. Their findings, published in JAMA Oncology, show that in approximately one-third of cases, ChatGPT 3.5 provided an inappropriate (“non-concordant”) recommendation, highlighting the need for awareness of the technology’s limitations.
“Patients should feel empowered to educate themselves about their medical conditions, but they should always discuss with a clinician, and resources on the Internet should not be consulted in isolation,” said corresponding author Danielle Bitterman, MD, of the Department of Radiation Oncology and the Artificial Intelligence in Medicine (AIM) Program of Mass General Brigham. “ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient’s unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide.”
The emergence of artificial intelligence tools in health has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation’s top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.
Although medical decision-making can be influenced by many factors, Bitterman and colleagues chose to evaluate the extent to which ChatGPT’s recommendations aligned with the NCCN guidelines, which are used by physicians at institutions across the country. They focused on the three most common cancers (breast, prostate and lung cancer) and prompted ChatGPT to provide a treatment approach for each cancer based on the severity of the disease. In total, the researchers included 26 unique diagnosis descriptions and used four, slightly different prompts to ask ChatGPT to provide a treatment approach, generating a total of 104 prompts.
Nearly all responses (98 percent) included at least one treatment approach that agreed with NCCN guidelines. However, the researchers found that 34 percent of these responses also included one or more non-concordant recommendations, which were sometimes difficult to detect amidst otherwise sound guidance. A non-concordant treatment recommendation was defined as one that was only partially correct; for example, for a locally advanced breast cancer, a recommendation of surgery alone, without mention of another therapy modality. Notably, complete agreement in scoring only occurred in 62 percent of cases, underscoring both the complexity of the NCCN guidelines themselves and the extent to which ChatGPT’s output could be vague or difficult to interpret.
In 12.5 percent of cases, ChatGPT produced “hallucinations,” or a treatment recommendation entirely absent from NCCN guidelines. These included recommendations of novel therapies, or curative therapies for non-curative cancers. The authors emphasized that this form of misinformation can incorrectly set patients’ expectations about treatment and potentially impact the clinician-patient relationship.
Going forward, the researchers are exploring how well both patients and clinicians can distinguish between medical advice written by a clinician versus a large language model (LLM) like ChatGPT. They are also prompting ChatGPT with more detailed clinical cases to further evaluate its clinical knowledge.
The authors used GPT-3.5-turbo-0301, one of the largest models available at the time they conducted the study and the model class that is currently used in the open-access version of ChatGPT (a newer version, GPT-4, is only available with the paid subscription). They also used the 2021 NCCN guidelines, because GPT-3.5-turbo-0301 was developed using data up to September 2021. While results may vary if other LLMs and/or clinical guidelines are used, the researchers emphasize that many LLMs are similar in the way they are built and the limitations they possess.
“It is an open research question as to the extent LLMs provide consistent logical responses as oftentimes ‘hallucinations’ are observed,” said first author Shan Chen, MS, of the AIM Program. “Users are likely to seek answers from the LLMs to educate themselves on health-related topics---similarly to how Google searches have been used. At the same time, we need to raise awareness that LLMs are not the equivalent of trained medical professionals.”
Disclosures: Bitterman is the Associate Editor of Radiation Oncology, HemOnc.org and receives funding from the American Association for Cancer Research.
Funding: This study was supported by the Woods Foundation.
Paper cited: Chen, S, et al. “Use of Artificial Intelligence Chatbots for Cancer Treatment Information” JAMA Oncology DOI: 10.1001/jamaoncol.2023.2954
JOURNAL
JAMA Oncology
METHOD OF RESEARCH
Observational study
SUBJECT OF RESEARCH
People
ARTICLE TITLE
Use of Artificial Intelligence Chatbots for Cancer Treatment Information
ARTICLE PUBLICATION DATE
24-Aug-2023
COI STATEMENT
Bitterman is the Associate Editor of Radiation Oncology, HemOnc.org and receives funding from the American Association for Cancer Research.
Assessment of AI chatbot responses to top searched queries about cancer
JAMA Oncology
Peer-Reviewed PublicationAbout The Study: The findings of this study suggest that artificial intelligence (AI) chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.
Authors: Abdo E. Kabarriti, M.D., of the State University of New York Downstate Health Sciences University in New York, is the corresponding author.
To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/
(doi:10.1001/jamaoncol.2023.2947)
Editor’s Note: Please see the article for additional information, including other authors, author contributions and affiliations, conflict of interest and financial disclosures, and funding and support.
# # #
Embed this link to provide your readers free access to the full-text article
JOURNAL
JAMA Oncology
No comments:
Post a Comment