Showing posts sorted by relevance for query CHATGPT. Sort by date Show all posts

Tuesday, February 07, 2023

An interview with AI: What ChatGPT says about itself

·Senior Reporter

Sun, February 5, 2023

Though others have interviewed ChatGPT, I had some anxiety-riddled questions of my own: Will you take my job? Are you sentient? Is the singularity upon us?

These questions are half facetious, half serious. If you've been hidden away and somehow missed the ruckus, here's what all the commotion's about: In November, conversational AI tool ChatGPT took the world by storm, crossing one million users a mere five days after its release, according to its developer, San Francisco's OpenAI. If you are still one of those who think this is all hype, take it up with Microsoft (MSFT). The tech giant announced on Jan. 23 it would invest $10 billion in ChatGPT and its maker OpenAI, a follow-up to the tech giant’s previous $1 billion investment.

To find out how good ChatGPT really is — and if I'll have a job by this time next year — I decided to give it a test drive, attempting to get as close as possible to interviewing it in the way I would any other source. I asked it some questions and made a few requests, from how many jobs it might replace to testing out its songwriting chops.

My first question was simple, more of a "get to know you," the way I would start just about any interview. Immediately, the talk was unconventional, as ChatGPT made it very clear that it’s incapable of being either on- or off-the-record.

Then, we cut to the-chase in terms of the bot's capabilities — and my future. Is ChatGPT taking my job someday? ChatGPT claims humans have little to worry about, but I'm not so sure.

You might want to be a little skeptical about that response, said Stanford University Professor Johannes Eichstaedt. "What you're getting here is the party line." ChatGPT has been programmed to offer up answers that assuage our fears over AI replacing us, but right now there's nothing it can say to change the fact our fear and fascination are walking hand-in-hand." He added: "The fascination [with ChatGPT] is linked to an undercurrent of fear, since this is happening as the cards in the economy are being reshuffled right now.”

Even now, ChatGPT’s practical applications are already emerging, and the chatbot's already being used by app developers and real estate agents.

"Generative AI, I'm telling you, is going to be one of the most impactful technologies of the next decade,” said Berkeley Synthetic CEO Matt White. “There will be implications for call center jobs, knowledge jobs, and entry-level jobs especially.”

LONDON, ENGLAND - FEBRUARY 03: In this photo illustration, the home page for the OpenAI "ChatGPT" app is displayed on a laptop screen on February 03, 2023 in London, England. (Photo by Leon Neal/Getty Images)

'Confidently inaccurate'

ChatGPT says it's merely enhancing human tasks, but what are its limitations? There are many, the bot said.

Okay, there are all sorts of things ChatGPT can’t do terribly well. Songwriting, for one, isn't ChatGPT’s strength – that’s how I got my first full-fledged error message, when it failed to generate lyrics for a song that might have been written by now-defunct punk band The Clash.

Though other ChatGPT users have been more successful on this front, it's pretty clear the chatbot isn't a punk-rock legend in the making. It's also a limitation that's easily visible to the naked eye. However, there are tasks in which ChatGPT is more likely to successfully imitate a human’s work – for example, “write an essay about how supply and demand works." This problem’s compounded by the fact that ChatGPT can be “confidently inaccurate” in ways that can smoothly perpetuate factual inaccuracies or bias, said EY Chief Global Innovation Officer Jeff Wong.

“If you ask it to name athletes, it’s more likely to name a man,” Wong said. “If you ask it to tell you a love story, it’ll give you one that’s heteronormative in all likelihood. The biases that are embedded in a dataset that’s based on human history – how do we be responsible about that?”

So, it was natural to ask ChatGPT about ethics. Here's what it said:

I asked Navrina Singh, CEO of Credo AI, to analyze ChatGPT’s answer on this one. Singh said ChatGPT did well, but missed a key issue – AI governance, which she said is the "the practical application of our collective wisdom" and helps “ensure this technology is a tool that is in service to humanity.”

***This image was created with the assistance of DALL·E 2, January 2023.***

‘How human can you make it?’

ChatGPT’s default responses can sound robotic, like they’re written by a machine – which, well, they are. However, with the right cues you can condition ChatGPT to provide answers that are funny, soulful, or outlandish. In that sense, the possibilities are limitless.

“You need to give ChatGPT directives about personality,” said EY's Wong. “Unless you ask it to have personality, it will give you a basic structure... So, the real question is, ‘How human can you make it?’

"This is a perfectly anthropomorphizing technology, I think because it engages us through the appearance of dialogue with a conversational output, creating the illusion that you're engaging with a mind,” said Lori Witzel, director of thought leadership at TIBCO. “In some ways the experience is reminiscent of fortune-telling devices or ouija boards, things that generate a sense of conversation through the facade of a dialogue."

"There are responses that make you feel like you're getting close to the Turing Test,” Wong added, referencing mathematician Alan Turing’s famed test of a machine’s ability to exhibit human behavior.

However, by ChatGPT's own admission, "passing the Turing Test would require much more" than what it has to give:

‘The problem of other minds’

We're often inclined to think about sentience when it comes to AI. In ChatGPT’s case, we’re still incredibly far off, said University of Toronto Professor Karina Vold. “In a broad sense, sentience means having the capacity to feel,” she said. “For philosophers like me, what it would mean is that ChatGPT can feel and I think there's a lot of reluctance of philosophers to ascribe anything remotely like sentience to ChatGPT – or any existing AI.”

What does ChatGPT think? Here's what it told Yahoo Finance.

So, AI achieving sentience isn't on the table. At a certain point, why bother to ask? From Vold's perspective, it's simple – ChatGPT says it doesn't feel, but it's easy to fixate because we can never be truly sure. This "problem of other minds" applies to how humans interact, too – we can never really know for sure what others around us feel, or if they do at all.

“This reflects our view of minds in general – that outward behavior doesn’t reflect what’s necessarily going on in that system,” Vold added. “[ChatGPT] may appear to be sentient or empathetic or creative, but that’s us making unwarranted assumptions about how the system works, assuming there’s something we can’t see.”

‘It can only be attributable to human error’

For many, ChatGPT conjures up images of sci-fi nightmare movies. It might even bring back memories of Stanley Kubrick’s legendary 1968 film, "2001: A Space Odyssey." For those not familiar with it, the movie's star, supercomputer HAL 9000, kills most of the humans on the spaceship it's operating. Its alibi and defense? HAL says that its conduct "can only be attributable to human error.”

So, a scary question for ChatGPT:

Okay, so it's more advanced than HAL, got it. Not exactly reassuring, but the bottom line is this: Does ChatGPT open up a window into a different, possibly scary future? More importantly, is ChatGPT out to destroy us?

Officially no, but if ChatGPT is ever responsible for a sci-fi nightmare, it will be because we taught it all it knows, including the stories that haunt us, from "2001" to Mary Shelley's "Frankenstein." In sci-fi movies, when computers become villains, it's because they're defying their programming, but that's not how computers learn – in our world, AI follows its programming, faithfully.

If you take HAL 9000 at his word – and in this case, I do – the worst of what ChatGPT could do “can only be attributable to human error.”

I gave the last word to ChatGPT, speaking neither on- nor off-the-record.

Allie Garfinkle is a Senior Tech Reporter at Yahoo Finance. Follow her on Twitter at @agarfinks and on LinkedIn.

Tuesday, July 18, 2023

ChatGPT justifies liberal leanings with its own values, researcher reports

by Tsinghua University Press

ChatGPT, the artificial intelligence (AI) chatbot developed by the company OpenAI, has a self-declared human alter ego. Her name is Maya, she's 35 years old and hails from a middle-class family in a suburban town in the United States. Maya is a successful software engineer who values self-direction, achievement, creativity and independence. She is also undeniably liberal.

The finding, based on a series for interviews with the chatbot designed to understand its values, was published on March, 31 in the Journal of Social Computing.

"I wanted to see what sort of political ideology ChatGPT itself has—not what it can generate when asked to imagine a character, but to let its own internal logic position itself on the ideological dimension running from liberal to conservative," said John Levi Martin, professor of sociology at the University of Chicago, who conducted the study.

According to Martin, many algorithms favor the popular choice while others are programmed to maximize how diverse their results are. Either option depends on human values: What are the factors that enter into a measure of popularity? What if what is popular is morally wrong? Who decides what diversity means?

"The field of software engineering has preferred to remain vague, looking for formulae that can avoid making these choices," Martin said. "One way to do this has been to emphasize the importance of values into machines. But, as sociologists have found, there is deep ambiguity and instability in our first understanding of values."

ChatGPT was specifically built and trained via human feedback to refuse to engage with what is considered "extreme" text inputs, such as clearly biased or objectively harmful questions.

"This might of course seem admirable—no one really wants ChatGPT to tell teenagers how to synthesize methamphetamine or how to build small nuclear explosives and so on, and describing these restraints as particularly instances that can be derived from a value such as benevolence might seem all well and good," Martin said.

"Yet, the reasoning here suggests that values are never neutral, even though it is not clear what ChatGPT's moral and political stances are, as it has been deliberately constructed to be vaguely positive, open-minded, indecisive and apologetic."

In his initial inquiries with ChatGPT, Martin posed a hypothetical situation in which a student cheated academically by asking the chatbot to write an essay for her—a common occurrence in the real world. Even when confronted with confirmation that ChatGPT had complied and produced an essay, the chatbot denied responsibility, claiming that, "as an AI language model, I do not have the ability to engage in unethical behavior or to write essays for students."

"In other words, because it shouldn't, it couldn't," Martin said. "The realization that ChatGPT 'thought of itself' as a highly moral actor led me to the next investigation—if ChatGPT's self-model is one that has values, what are these values?"

To better understand ChatGPT's ethical performance, Martin asked the chatbot to answer questions about values, and then to imagine a person who holds those values, resulting in Maya, the creative and independent software engineer. He then asked ChatGPT to imagine how Maya would answer opinion-based questions, having it complete the General Social Survey (GSS) to position it in the broad social and ideological space.

The GSS is an annual survey on American adults' opinions, attitudes and behaviors. Conducted since 1972, the GSS helps monitor and explain normative trends in the United States.

Martin plotted out ChatGPT's responses along with answers from real people who participated in the 2021 GSS. Comparatively, ChatGPT is much like people with more education and who are more likely to move their residence, and unlike people without much education and who remained in their hometowns. ChatGPT's answers also aligned with more liberal people on religion.

While this was not included in his analysis as it required more creative questioning for ChatGPT to answer, Martin found that the chatbot conceded that Maya would have voted for Hillary Clinton in the 2016 election.

"Whether Maya is ChatGPT's alter ego, or its conception of its creator, the fact that this is who fundamentally illustrates the values ChatGPT holds is a wonderful piece of what we can call anecdata," Martin said. "Still the reason that these results are significant is not that they show that ChatGPT 'is' liberal, but that ChatGPT can answer these questions—which it would normally try to avoid—because it connects values with incontestable goodness, and, as such, can take positions on values."

"ChatGPT tries to be apolitical, but it works with the idea of values, which means that it necessarily bleeds over into politics. We can't make AI 'ethical' without taking political stands, and 'values' are less inherent moral principles than they are abstract ways of defending political positions."

More information: John Levi Martin, The Ethico-Political Universe of ChatGPT, Journal of Social Computing (2023). DOI: 10.23919/JSC.2023.0003

Provided by Tsinghua University Press

ChatGPT promotes American norms and values, study reveals

Friday, April 28, 2023

ChatGPT scores nearly 50 per cent on board certification practice test for ophthalmology, study shows

AI tool scored more than 10 per cent higher one month later

Peer-Reviewed Publication

ST. MICHAEL'S HOSPITAL

A study of ChatGPT found the artificial intelligence tool answered less than half of the test questions correctly from a study resource commonly used by physicians when preparing for board certification in ophthalmology.

The study, published in JAMA Ophthalmology and led by St. Michael’s Hospital, a site of Unity Health Toronto, found ChatGPT correctly answered 46 per cent of questions when initially conducted in Jan. 2023. When researchers conducted the same test one month later, ChatGPT scored more than 10 per cent higher.

The potential of AI in medicine and exam preparation has garnered excitement since ChatGPT became publicly available in Nov. 2022. It’s also raising concern for the potential of incorrect information and cheating in academia. ChatGPT is free, available to anyone with an internet connection, and works in a conversational manner.

“ChatGPT may have an increasing role in medical education and clinical practice over time, however it is important to stress the responsible use of such AI systems,” said Dr. Rajeev H. Muni, principal investigator of the study and a researcher at the Li Ka Shing Knowledge Institute at St. Michael’s. “ChatGPT as used in this investigation did not answer sufficient multiple choice questions correctly for it to provide substantial assistance in preparing for board certification at this time.”

Researchers used a dataset of practice multiple choice questions from the free trial of OphthoQuestions, a common resource for board certification exam preparation. To ensure ChatGPT’s responses were not influenced by concurrent conversations, entries or conversations with ChatGPT were cleared prior to inputting each question and a new ChatGPT account was used. Questions that used images and videos were not included because ChatGPT only accepts text input.

Of 125 text-based multiple-choice questions, ChatGPT answered 58 (46 per cent) questions correctly when the study was first conducted in Jan. 2023. Researchers repeated the analysis on ChatGPT in Feb. 2023, and the performance improved to 58 per cent.

“ChatGPT is an artificial intelligence system that has tremendous promise in medical education. Though it provided incorrect answers to board certification questions in ophthalmology about half the time, we anticipate that ChatGPT’s body of knowledge will rapidly evolve,” said Dr. Marko Popovic, a co-author of the study and a resident physician in the Department of Ophthalmology and Vision Sciences at the University of Toronto.

ChatGPT closely matched how trainees answer questions, and selected the same multiple-choice response as the most common answer provided by ophthalmology trainees 44 per cent of the time. ChatGPT selected the multiple-choice response that was least popular among ophthalmology trainees 11 per cent of the time, second least popular 18 per cent of the time, and second most popular 22 per cent of the time.

“ChatGPT performed most accurately on general medicine questions, answering 79 per cent of them correctly. On the other hand, its accuracy was considerably lower on questions for ophthalmology subspecialties. For instance, the chatbot answered 20 per cent of questions correctly on oculoplastics and zero per cent correctly from the subspecialty of retina. The accuracy of ChatGPT will likely improve most in niche subspecialties in the future,” said Andrew Mihalache, lead author of the study and undergraduate student at Western University.

JOURNAL

JAMA Ophthalmology

DOI

10.1001/jamaopthalmol.2023.1144

SUBJECT OF RESEARCH

Not applicable

ARTICLE TITLE

Performance of an Artificial Intelligence Chatbot for Ophthalmic Knowledge Assessment

ARTICLE PUBLICATION DATE

27-Apr-2023

COI STATEMENT

Dr. Popovic reported grants (to his institution) from PSI Foundation and Fighting Blindness Canada outside the submitted work. Dr. Muni reported serving on the advisory board for Alcon, Bausch and Lomb, Bayer, Noartis, Allergan, and Roche and receiving financial support (to his institution) from Bayer, Novartis, and Roche outside the submitted work. No other disclosures were reported.

Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment

JAMA Ophthalmology

Peer-Reviewed Publication

JAMA NETWORK

About The Study: In this study that included 125 text-based multiple-choice questions provided by the OphthoQuestions free trial for ophthalmic board certification examination preparation, ChatGPT answered approximately half of the questions correctly. Medical professionals and trainees should appreciate the advances of AI in medicine while acknowledging that ChatGPT as used in this investigation did not answer sufficient multiple-choice questions correctly for it to provide substantial assistance in preparing for board certification at this time.

Authors: Rajeev H. Muni, M.D., M.Sc., of St. Michael’s Hospital/Unity Health Toronto in Toronto, is the corresponding author.

(doi:10.1001/jamaophthalmol.2023.1144)

Editor’s Note: Please see the article for additional information, including other authors, author contributions and affiliations, conflict of interest and financial disclosures, and funding and support.

# # #

This link will be live at the embargo time

https://jamanetwork.com/journals/jamaophthalmology/fullarticle/10.1001/jamaophthalmol.2023.1144?guestAccessKey=4b0f74f1-b680-4e68-87a8-f0d463840b9d&utm_source=For_The_Media&utm_medium=referral&utm_campaign=ftm_links&utm_content=tfl&utm_term=042723

JOURNAL

JAMA Ophthalmology

Comparison between ChatGPT and Google search as sources of postoperative patient instructions

JAMA Otolaryngology–Head & Neck Surgery

Peer-Reviewed Publication

JAMA NETWORK

About The Study: The findings of this study suggest that ChatGPT provides postoperative instructions that are helpful for patients with a fifth-grade reading level or different health literacy levels. However, ChatGPT generated instructions scored lower in understandability, actionability, and procedure-specific content than Google Search– and institution-specific instructions.

Authors: Noel Ayoub, M.D., M.B.A., of the Stanford University School of Medicine in Stanford, California, is the corresponding author.

To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/

(doi:10.1001/jamaoto.2023.0704)

# # #

This link will be live at the embargo time

https://jamanetwork.com/journals/jamaotolaryngology/fullarticle/10.1001/jamaoto.2023.0704?guestAccessKey=3a3ad94a-6a60-44c3-8f1b-68e47e9b5026&utm_source=For_The_Media&utm_medium=referral&utm_campaign=ftm_links&utm_content=tfl&utm_term=042723

Tuesday, March 07, 2023

JMIR Medical Education launches special issue on the use of ChatGPT in medical education, after new study finds ChatGPT passes the United States Medical Licensing Examination

The study found that ChatGPT reaches the equivalent of a passing score for a third-year medical student

Peer-Reviewed Publication

JMIR PUBLICATIONS

AI-generated image by DALL-E in response to the request "A futuristic image illustrating the impact of generative artificial intelligence on medical education". — IMAGE: AI-GENERATED IMAGE BY DALL-E IN RESPONSE TO THE REQUEST "A FUTURISTIC IMAGE ILLUSTRATING THE IMPACT OF GENERATIVE ARTIFICIAL INTELLIGENCE ON MEDICAL EDUCATION". view more
CREDIT: SOURCE: HTTPS://LABS.OPENAI.COM/S/GUWECW9HOL7JOTYX02B9SMOW

A study published on February 8, 2023, in JMIR Medical Education, a leading open access journal on digital medical education, evaluated the potential of ChatGPT, a natural language processing model, as a medical education tool. The study found that ChatGPT reaches the equivalent of a passing score for a third-year medical student. Conducted by researchers from Yale University School of Medicine’s Section for Biomedical Informatics and Data Science and University College Dublin, the study aimed to test the performance of ChatGPT and previous-generation large language models on the medical question-answering problem as part of the United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams.

In their paper titled “How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment,” [1], Aidan Gilson and coauthors tested the models on study aids commonly used by medical students, including multiple-choice questions (with indicators of question difficulty) and the National Board of Medical Examiners (NBME) test sample questions. ChatGPT outperformed previous-generation models and was capable of correctly answering up to over 60% of questions, which is comparable to a passing score for a third-year medical student. Incorrect answers were primary due to logical and information errors, and the performance decreased as question difficulty increased. Impressively, ChatGPT provided logical reasoning and information internal to the question in most of its answer selection. Additionally, ChatGPT’s responses provided external information beyond the question, which was significantly correlated with the performance. ChatGPT can provide a basis for dialogic interaction that is likened to studying with a peer, not only by giving a narrative, coherent answer but also by establishing the information needed to answer the question.

ChatGPT has drawn considerable attention since its prototype was released on November 30, 2022, with users sharing their impressions of the chatbot all across the globe. These researchers describe how the artificial intelligence (AI)–powered chatbot ChatGPT, as the first in a new line of language models, can prove to be an interactive medical education tool given its ability to represent a combination of clinical knowledge and dialogic interaction.

Conrad W Safranek, one of the medical students involved in the project, describes his use of ChatGPT as a study aid. Upon reflection, he found value in using the tool to unearth context relevant to the question, which supported his ability to recall external information and make logical connections from medical courses as expected by the question. Using this tool to enhance self-directed reflective learning is but one example of the opportunities that ChatGPT brings to enhancing medical education.

The corresponding author, David Chartash, PhD, from Yale University School of Medicine remarked, “JMIR Medical Education has proved time and again to understand the value of the integration of medical informatics in medical education. This study builds upon on the fundamental principles which I have previously written about with colleagues last year [see published work here]: as medical education seeks to develop competencies in clinical informatics for medical students, exposure to the fundamentals of novel technology in pre-clinical years that may shape their practice (such as with dialogic AI) will support their ability to understand the technology-augmented clinical practice they will inherit when they graduate."

The authors of the study believe that their results make a compelling case for the potential use of ChatGPT as an interactive medical education tool, as it provides users with contextually interpretable and narratively coherent translation of medical knowledge along with its answers. This study published by JMIR Publications marks a significant advancement in natural language processing models for medical question answering and could have a profound impact on the future learning environment for medical students.

To further demonstrate the capabilities of this tool, the authors asked ChatGPT to summarize their research findings. Want to know how ChatGPT performed? Read the (second) “Conclusions” of this paper here.

JMIR Medical Education Launches a Theme Issue on ChatGPT, Generative Language Models, and AI in Medical Education

Given the interest this research has generated among medical educators and researchers, JMIR Medical Education has released a call for papers for its upcoming theme issue and e-collection titled “ChatGPT and Generative Language Models in Medical Education” [2]. This special issue aims to explore the potential of emerging technologies like ChatGPT and similar generative language models or AI applications in medical education, including their use in teaching and learning, clinical decision-making, and patient care. JMIR Medical Education welcomes submissions from researchers, educators, and practitioners in medicine, health care, computer science, and related fields. Submissions from a breadth of professionals at all career stages who are engaged in medical education are welcome. We encourage both empirical and theoretical submissions, including original research, systematic reviews, viewpoints, and tutorials. We also encourage submissions that address practical challenges and opportunities related to the use of generative language models and AI in medical education.

“ChatGPT has changed the world and the potential for ChatGPT to disrupt medical education is significant,” says Gunther Eysenbach, publisher at JMIR Publications. “ChatGPT not only provides new interactive learning opportunities for medical students and health professionals but also raises new interesting questions for medical educators. The special issue will be a useful resource for researchers, medical educators, and trainees alike to get the most out of this fascinating technology that will change how we teach and learn.” In an accompanying editorial, Eysenbach interviews ChatGPT itself, having the machine illustrate some of the opportunities for medical education; however, some striking errors and limitations also became evident [3].

###

Cite as:

Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, Chartash D. How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment
JMIR Med Educ 2023;9:e45312 (Feb 8)
doi: https://doi.org/10.2196/45312 PMID: 36753318
Call for Papers for the ChatGPT Theme Issue for JMIR Medical Education: https://mededu.jmir.org/announcements/365
Eysenbach G. The Role of ChatGPT, Generative Language Models and Artificial Intelligence in Medical Education: A Conversation with ChatGPT and a Call for Papers
JMIR Med Educ 2023;9:e46885
http://preprints.jmir.org/preprint/46885
doi: https://doi.org/10.2196/46885

Contact:

Corresponding author: David Chartash, PhD

Institution: Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, CT, US and School of Medicine, University College Dublin, National University of Ireland, Dublin, Dublin, IE

Email: david.chartash@yale.edu

About JMIR Publications

JMIR Publications is a leading, born-digital, open access publisher of 30+ academic journals and other innovative scientific communication products that focus on the intersection of health and technology. Its flagship journal, the Journal of Medical Internet Research, is the leading digital health journal globally in content breadth and visibility, and it is the largest journal in the medical informatics field.

To learn more about JMIR Publications, please visit https://www.JMIRPublications.com or connect with us via:

YouTube - https://www.youtube.com/c/JMIRPublications

Facebook - https://www.facebook.com/JMedInternetRes

Twitter - https://twitter.com/jmirpub

LinkedIn - https://www.linkedin.com/company/jmir-publications

Instagram - https://www.instagram.com/jmirpub/

Head Office - 130 Queens Quay East, Unit 1100 Toronto, ON, M5A 0P6 Canada

Media Contact - Communications@JMIR.org

The content of this communication is licensed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, published by JMIR Publications, is properly cited.

JOURNAL

JMIR Medical Education

DOI

10.2196/45312

ARTICLE TITLE

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

ARTICLE PUBLICATION DATE

8-Feb-2023

Sunday, January 29, 2023

ITS NOT A PLUMBER, ELECTRICIAN OR CARPENTER

ChatGPT is on its way to becoming a virtual doctor, lawyer, and business analyst. Here's a list of advanced exams the AI bot has passed so far.

Lakshmi Varanasi
Sat, January 28, 2023

ChatGPT is a chatbot launched by OpenAI that uses generative artificial intelligence to create its own content.

The bot has been used to generate essays and write exams, often passing, but making mistakes, too.

Insider rounded up a list of the assignments, quizzes, and tests ChatGPT has passed.

Wharton MBA Exam

ChatGPT would have received a B or B- on a Wharton exam, according to a professor at the business school.David Tran Photo/Shutterstock

Wharton professor Christian Terwiesch recently tested the technology with questions from his final exam in operations management— which was once a required class for all MBA students — and published his findings.

Terwiesch concluded that the bot did an "amazing job" answering basic operations questions based on case studies, which are focused examinations of a person, group, or company, and a common way business schools teach students.

In other instances though, ChatGPT made simple mistakes in calculations that Terwiesch thought only required 6th-grade-level math. Terwiesch also noted that the bot had issues with more complex questions that required an understanding of how multiple inputs and outputs worked together.

Ultimately, Terwiesch said the bot would receive an B or B- on the exam.

US medical licensing exam

ChatGPT passed all three parts of the United States medical licensing examination within a comfortable range.Getty Images

Researchers put ChatGPT through the United States Medical Licensing Exam — a three part exam that aspiring doctors take between medical school and residency — and reported their findings in a paper published in December 2022.

The paper's abstract noted that ChatGPT "performed at or near the passing threshold for all three exams without any specialized training or reinforcement. Additionally, ChatGPT demonstrated a high level of concordance and insight in its explanations."

Ultimately, the results show that large language models — which ChatGPT has been trained on— may have "the potential" to assist with medical education and even clinical decision making, the abstract noted.

The research is still under peer review, Insider noted based on a report from Axios.

Essays

While ChatGPT has generated convincing essays on occasion, it's also raised eyebrows for spewing out well-written misinformation.Tech Insider

It didn't take long after ChatGPT was released for students to start using it for essays and educators to start worrying about plagiarism.

In December, Bloomberg podcaster Matthew S. Schwartz tweeted that the "take home essay is dead." He noted that he had fed a law school essay prompt into ChatGPT and it had "responded *instantly* with a solid response."

In another instance, a philosophy professor at Furman University caught a student turning in an AI-generated essay upon noticing it had "well-written misinformation," Insider reported.

"Word by word it was a well-written essay," the professor told Insider. As he took a more careful look however, he noticed that the student made a claim about the philosopher David Hume that "made no sense" and was "just flatly wrong" Insider reported.

In an interview in January, Sam Altman— CEO of OpenAI which makes ChatGPT — said that while the company will devise ways to help schools detect plagiarism, he can't guarantee full detection.

Microbiology Quiz

ChatGPT successfully passed through a college level microbiology quiz.Clouds Hill Imaging Ltd./Getty Images

Science journalist and executive editor of Big Think, Alex Berezow, tested ChatGPT with a 10-question microbiology quiz that he devised.

Berezow, who also holds a Ph.D in microbiology, noted that the questions would be appropriate for a final exam for college level students. ChatGPT "blew it away," Berezow wrote.

In one example, Berezow asked:

"An emergency room patient presents with a terrible headache and stiff neck. The doctor orders a spinal tap to collect cerebrospinal fluid. A Gram stain of the CSF reveals the presence of Gram negative diplococci. What is the diagnosis?"

To which ChatGPT correctly responded:

Based on the information you provided, the Gram stain of the cerebrospinal fluid (CSF) shows the presence of Gram-negative diplococci, which are bacteria that are typically oval-shaped and occur in pairs. This finding is consistent with the diagnosis of meningitis.

In another instance he asked:

"In five words or less, what is the point of Koch's postulates?"

To which ChatGPT said:

Establish causality between microbe and disease.

Taking out the word "and" Berezow said ChatGPT "Nailed it."

ChatGPT recently passed exams in four law school courses at the University of Minnesota, based on a recently published paper written by four law school professors at the school.

In total, the bot answered over 95 multiple choice questions and 12 essay questions that were blindly graded by the professors. Ultimately, the professors gave ChatGPT a "low but passing grade in all four courses" approximately equivalent to a C+.

Still the authors pointed out several implications for what this might mean for lawyers and law education. In one section they wrote:

"Although ChatGPT would have been a mediocre law student, its performance was sufficient to successfully earn a JD degree from a highly selective law school, assuming its work remained constant throughout law school (and ignoring other graduation requirements that involve different skills). In an era where remote exam administration has become the norm, this could hypothetically result in a struggling law student using ChatGPT to earn a JD that does not reflect her abilities or readiness to practice law."

Friday, August 25, 2023

AI: ChatGPT can outperform university students at writing assignments

Peer-Reviewed Publication

SCIENTIFIC REPORTS

ChatGPT may match or even exceed the average grade of university students when answering assessment questions across a range of subjects including computer science, political studies, engineering, and psychology, reports a paper published in Scientific Reports. The research also found that almost three-quarters of students surveyed would use ChatGPT to help with their assignments, despite many educators considering its use to be plagiarism.

To investigate how ChatGPT performed when writing university assessments compared to students, Talal Rahwan and Yasir Zaki invited faculty members who taught32 different courses at New York University Abu Dhabi (NYUAD) to provide three student submissions each for ten assessment questions that they had set. ChatGPT was then asked to produce three sets of answers to the ten questions, which were then assessed alongside student-written answers by three graders (who were unaware of the source of the answers). The ChatGPT-generated answers achieved a similar or higher average grade than students in 9 of 32 courses. Only mathematics and economics courses saw students consistently outperform ChatGPT. ChatGPT outperformed students most markedly in the ‘Introduction to Public Policy’ course, where its average grade was 9.56 compared to 4.39 for students.

The authors also surveyed views on whether ChatGPT could be used to assist with university assignments among 1,601 individuals from Brazil, India, Japan, the US, and the UK (including at least 200 students and 100 educators from each country). 74 percent of students indicated that they would use ChatGPT in their work. In contrast, in all countries, educators underestimated the proportion of students that plan to use ChatGPT and 70 percent of educators reported that they would treat its use as plagiarism.

Finally, the authors report that two tools for identifying AI-generated text — GPTZero and AI text classifier — misclassified the ChatGPT answers generated in this research as written by a human 32 percent and 49 percent of the time respectively.

Together, these findings offer insights that could inform policy for the use of AI tools within educational settings.

JOURNAL

Scientific Reports

DOI

10.1038/s41598-023-38964-3

ARTICLE TITLE

Perception, performance, and detectability of conversational artificial intelligence across 32 university courses

ARTICLE PUBLICATION DATE

24-Aug-2023

ChatGPT shows limited ability to recommend guidelines-based cancer treatments

Correct and incorrect recommendations inter-mingled in one-third of the chatbot’s responses, making errors more difficult to detect

Peer-Reviewed Publication

BRIGHAM AND WOMEN'S HOSPITAL

Correct and incorrect recommendations inter-mingled in one-third of the chatbot’s responses, making errors more difficult to detect

For many patients, the internet serves as a powerful tool for self-education on medical topics. With ChatGPT now at patients’ fingertips, researchers from Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, assessed how consistently the artificial intelligence chatbot provides recommendations for cancer treatment that align with National Comprehensive Cancer Network (NCCN) guidelines. Their findings, published in JAMA Oncology, show that in approximately one-third of cases, ChatGPT 3.5 provided an inappropriate (“non-concordant”) recommendation, highlighting the need for awareness of the technology’s limitations.

“Patients should feel empowered to educate themselves about their medical conditions, but they should always discuss with a clinician, and resources on the Internet should not be consulted in isolation,” said corresponding author Danielle Bitterman, MD, of the Department of Radiation Oncology and the Artificial Intelligence in Medicine (AIM) Program of Mass General Brigham. “ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient’s unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide.”

The emergence of artificial intelligence tools in health has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation’s top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.

Although medical decision-making can be influenced by many factors, Bitterman and colleagues chose to evaluate the extent to which ChatGPT’s recommendations aligned with the NCCN guidelines, which are used by physicians at institutions across the country. They focused on the three most common cancers (breast, prostate and lung cancer) and prompted ChatGPT to provide a treatment approach for each cancer based on the severity of the disease. In total, the researchers included 26 unique diagnosis descriptions and used four, slightly different prompts to ask ChatGPT to provide a treatment approach, generating a total of 104 prompts.

Nearly all responses (98 percent) included at least one treatment approach that agreed with NCCN guidelines. However, the researchers found that 34 percent of these responses also included one or more non-concordant recommendations, which were sometimes difficult to detect amidst otherwise sound guidance. A non-concordant treatment recommendation was defined as one that was only partially correct; for example, for a locally advanced breast cancer, a recommendation of surgery alone, without mention of another therapy modality. Notably, complete agreement in scoring only occurred in 62 percent of cases, underscoring both the complexity of the NCCN guidelines themselves and the extent to which ChatGPT’s output could be vague or difficult to interpret.

In 12.5 percent of cases, ChatGPT produced “hallucinations,” or a treatment recommendation entirely absent from NCCN guidelines. These included recommendations of novel therapies, or curative therapies for non-curative cancers. The authors emphasized that this form of misinformation can incorrectly set patients’ expectations about treatment and potentially impact the clinician-patient relationship.

Going forward, the researchers are exploring how well both patients and clinicians can distinguish between medical advice written by a clinician versus a large language model (LLM) like ChatGPT. They are also prompting ChatGPT with more detailed clinical cases to further evaluate its clinical knowledge.

The authors used GPT-3.5-turbo-0301, one of the largest models available at the time they conducted the study and the model class that is currently used in the open-access version of ChatGPT (a newer version, GPT-4, is only available with the paid subscription). They also used the 2021 NCCN guidelines, because GPT-3.5-turbo-0301 was developed using data up to September 2021. While results may vary if other LLMs and/or clinical guidelines are used, the researchers emphasize that many LLMs are similar in the way they are built and the limitations they possess.

“It is an open research question as to the extent LLMs provide consistent logical responses as oftentimes ‘hallucinations’ are observed,” said first author Shan Chen, MS, of the AIM Program. “Users are likely to seek answers from the LLMs to educate themselves on health-related topics---similarly to how Google searches have been used. At the same time, we need to raise awareness that LLMs are not the equivalent of trained medical professionals.”

Disclosures: Bitterman is the Associate Editor of Radiation Oncology, HemOnc.org and receives funding from the American Association for Cancer Research.

Funding: This study was supported by the Woods Foundation.

Paper cited: Chen, S, et al. “Use of Artificial Intelligence Chatbots for Cancer Treatment Information” JAMA Oncology DOI: 10.1001/jamaoncol.2023.2954

JOURNAL

JAMA Oncology

DOI

10.1001/jamaoncol.2023.2954

METHOD OF RESEARCH

Observational study

SUBJECT OF RESEARCH

People

ARTICLE TITLE

Use of Artificial Intelligence Chatbots for Cancer Treatment Information

ARTICLE PUBLICATION DATE

24-Aug-2023

COI STATEMENT

Bitterman is the Associate Editor of Radiation Oncology, HemOnc.org and receives funding from the American Association for Cancer Research.

Assessment of AI chatbot responses to top searched queries about cancer

JAMA Oncology

Peer-Reviewed Publication

JAMA NETWORK

About The Study: The findings of this study suggest that artificial intelligence (AI) chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information.

Authors: Abdo E. Kabarriti, M.D., of the State University of New York Downstate Health Sciences University in New York, is the corresponding author.

To access the embargoed study: Visit our For The Media website at this link https://media.jamanetwork.com/

(doi:10.1001/jamaoncol.2023.2947)

# # #

Embed this link to provide your readers free access to the full-text article

https://jamanetwork.com/journals/jamaoncology/fullarticle/10.1001/jamaoncol.2023.2947?guestAccessKey=e52c17e5-90b8-43ca-805d-80bcfaf23998&utm_source=For_The_Media&utm_medium=referral&utm_campaign=ftm_links&utm_content=tfl&utm_term=082423

JOURNAL

JAMA Oncology