Is writing with AI at work undermining your credibility?
With over 75% of professionals using AI in their daily work, writing and editing messages with tools like ChatGPT, Gemini, Copilot or Claude has become a commonplace practice. While generative AI tools are seen to make writing easier, are they effective for communicating between managers and employees?
A new study of 1,100 professionals reveals a critical paradox in workplace communications: AI tools can make managers’ emails more professional, but regular use can undermine trust between them and their employees.
“We see a tension between perceptions of message quality and perceptions of the sender,” said Anthony Coman, Ph.D., a researcher at the University of Florida's Warrington College of Business and study co-author. “Despite positive impressions of professionalism in AI-assisted writing, managers who use AI for routine communication tasks put their trustworthiness at risk when using medium- to high-levels of AI assistance."
In the study published in the International Journal of Business Communication, Coman and his co-author, Peter Cardon, Ph.D., of the University of Southern California, surveyed professionals about how they viewed emails that they were told were written with low, medium and high AI assistance. Survey participants were asked to evaluate different AI-written versions of a congratulatory message on both their perception of the message content and their perception of the sender.
While AI-assisted writing was generally seen as efficient, effective, and professional, Coman and Cardon found a “perception gap” in messages that were written by managers versus those written by employees.
“When people evaluate their own use of AI, they tend to rate their use similarly across low, medium and high levels of assistance,” Coman explained. “However, when rating other’s use, magnitude becomes important. Overall, professionals view their own AI use leniently, yet they are more skeptical of the same levels of assistance when used by supervisors.”
While low levels of AI help, like grammar or editing, were generally acceptable, higher levels of assistance triggered negative perceptions. The perception gap is especially significant when employees perceive higher levels of AI writing, bringing into question the authorship, integrity, caring and competency of their manager.
The impact on trust was substantial: Only 40% to 52% of employees viewed supervisors as sincere when they used high levels of AI, compared to 83% for low-assistance messages. Similarly, while 95% found low-AI supervisor messages professional, this dropped to 69-73% when supervisors relied heavily on AI tools.
The findings reveal employees can often detect AI-generated content and interpret its use as laziness or lack of caring. When supervisors rely heavily on AI for messages like team congratulations or motivational communications, employees perceive them as less sincere and question their leadership abilities.
“In some cases, AI-assisted writing can undermine perceptions of traits linked to a supervisor’s trustworthiness,” Coman noted, specifically citing impacts on perceived ability and integrity, both key components of cognitive-based trust.
The study suggests managers should carefully consider message type, level of AI assistance and relational context before using AI in their writing. While AI may be appropriate and professionally received for informational or routine communications, like meeting reminders or factual announcements, relationship-oriented messages requiring empathy, praise, congratulations, motivation or personal feedback are better handled with minimal technological intervention.
Journal
International Journal of Business Communication
Method of Research
Survey
Subject of Research
People
Article Title
Professionalism and Trustworthiness in AI-Assisted Workplace Writing: The Benefits and Drawbacks of Writing With AI
AI could help emergency rooms predict admissions, driving more timely, effective care
The Mount Sinai Hospital / Mount Sinai School of Medicine
New York, NY [August 11, 2025]— Artificial intelligence (AI) can help emergency department (ED) teams better anticipate which patients will need hospital admission, hours earlier than is currently possible, according to a multi-hospital study by the Mount Sinai Health System.
By giving clinicians advance notice, this approach may enhance patient care and the patient experience, reduce overcrowding and “boarding” (when a patient is admitted but remains in the ED because no bed is available), and enable hospitals to direct resources where they’re needed most. Among the largest prospective evaluations of AI in the emergency setting to date, the study published in the July 9 online issue of the journal Mayo Clinic Proceedings: Digital Health [https://doi.org/10.1016/j.mcpdig.2025.100249].
In the study, researchers collaborated with more than 500 ED nurses across the seven-hospital Health System. Together, they evaluated a machine learning model trained on data from more than 1 million past patient visits. Over two months, they compared AI-generated predictions with nurses’ triage assessments to see whether AI could help identify likely hospital admissions sooner after the patient arrives.
“Emergency department overcrowding and boarding have become a national crisis, affecting everything from patient outcomes to financial performance. Industries like airlines and hotels use bookings to forecast demand and plan. In the ED, we don’t have reservations. Could you imagine airlines and hotels without reservations, solely forecasting and planning from historical trends? Welcome to health care,” says lead author Jonathan Nover, MBA, RN, Vice President of Nursing and Emergency Services, Mount Sinai Health System. “Our goal was to see if AI combined with input from our nurses, could help hasten admission planning, a reservation of sorts. We developed a tool to forecast admissions needs before an order is placed, offering insights that could fundamentally improve how hospitals manage patient flow, leading to better outcomes.”
The study, involving nearly 50,000 patient visits across Mount Sinai’s urban and suburban hospitals, showed that the AI model performed reliably across these diverse hospital settings. Surprisingly, the researchers found that combining human and machine predictions did not significantly boost accuracy, indicating that the AI system alone was a strong predictor.
“We wanted to design a model that doesn’t just perform well in theory but can actually support decision-making on the front lines of care,” says co-corresponding senior author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “By training the algorithm on more than a million patient visits, we aimed to capture meaningful patterns that could help anticipate admissions earlier than traditional methods. The strength of this approach is its ability to turn complex data into timely, actionable insights for clinical teams—freeing them up to focus less on logistics and more on delivering the personal, compassionate care that only humans can provide.”
While the study was limited to one health system over a two-month period, the team hopes the findings will serve as a springboard for future live clinical testing. The next phase involves implementing the AI model into real-time workflows and measuring outcomes such as reduced boarding times, improved patient flow, and operational efficiency.
“We were encouraged to see that AI could stand on its own in making complex predictions. But just as important, this study highlights the vital role of our nurses—more than 500 participated directly—demonstrating how human expertise and machine learning can work hand in hand to reimagine care delivery,” says co-corresponding senior author Robbie Freeman, DNP, RN, NE-BC3, Chief Digital Transformation Officer at Mount Sinai Health System. “This tool isn’t about replacing clinicians; it’s about supporting them. By predicting admissions earlier, we can give care teams the time they need to plan, coordinate, and ultimately provide better, more compassionate care. It’s inspiring to see AI emerge not as a futuristic idea, but as a practical, real-world solution shaped by the people delivering care every day.”
The paper is titled “Comparing Machine Learning and Nurse Predictions for Hospital Admissions in a Multisite Emergency Care System.”
The study’s authors, as listed in the journal, are Jonathan Nover, MBA, RN; Matthew Bai, MD; Prem Tismina; Ganesh Raut; Dhavalkumar Patel; Girish N Nadkarni, MD, MPH; Benjamin S. Abella, MD, MPhil; Eyal Klang, MD, and Robert Freeman, DNP, RN, NE-BC3.
This work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. The research was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463.
-####-
About the Mount Sinai Health System
Mount Sinai Health System is one of the largest academic medical systems in the New York metro area, with 48,000 employees working across seven hospitals, more than 400 outpatient practices, more than 600 research and clinical labs, a school of nursing, and a leading school of medicine and graduate education. Mount Sinai advances health for all people, everywhere, by taking on the most complex health care challenges of our time—discovering and applying new scientific learning and knowledge; developing safer, more effective treatments; educating the next generation of medical leaders and innovators; and supporting local communities by delivering high-quality care to all who need it.
Through the integration of its hospitals, labs, and schools, Mount Sinai offers comprehensive health care solutions from birth through geriatrics, leveraging innovative approaches such as artificial intelligence and informatics while keeping patients’ medical and emotional needs at the center of all treatment. The Health System includes approximately 9,000 primary and specialty care physicians and 10 free-standing joint-venture centers throughout the five boroughs of New York City, Westchester, Long Island, and Florida. Hospitals within the System are consistently ranked by Newsweek’s® “The World’s Best Smart Hospitals, Best in State Hospitals, World Best Hospitals and Best Specialty Hospitals” and by U.S. News & World Report's® “Best Hospitals” and “Best Children’s Hospitals.” The Mount Sinai Hospital is on the U.S. News & World Report® “Best Hospitals” Honor Roll for 2025-2026.
For more information, visit https://www.mountsinai.org or find Mount Sinai on Facebook, Instagram, LinkedIn, X, and YouTube.
Journal
Mayo Clinic Proceedings Digital Health
Method of Research
Computational simulation/modeling
Subject of Research
People
Article Title
Comparing Machine Learning and Nurse Predictions for Hospital Admissions in a Multisite Emergency Care System
Assessing and understanding creativity in large language models
Beijing Zhongke Journal Publising Co. Ltd.
image:
A TTCT-inspired dataset was constructed to evaluate LLMs under varied prompts and role-play settings. GPT-4 served as the evaluator to score model outputs.
view moreCredit: Beijing Zhongke Journal Publising Co. Ltd.
In recent years, the realm of artificial intelligence (AI) has witnessed a meteoric rise in the development and sophistication of large language models (LLMs). LLMs have significantly advanced in their capabilities in addressing a variety of conventional natural language processing tasks, such as reasoning and natural language understanding. Moreover, LLMs also have demonstrated significant value in widespread applications. From transforming rudimentary text into compelling narratives, unlocking a new realm of storytelling, to solving complex algorithmic problems, these models have shown a semblance of what could be interpreted as creativity. The practical manifestations of this creativity have penetrated various sectors, including science research, where they assist in idea generation and suggestion; education, by providing personalized learning experiences; and in the entertainment industry, creating music and art. In many of their applications, LLMs seem to exhibit the ability to generate original text, aiding tasks related to imagination and creativity, suggesting that they may indeed possess elements of creativity.
From the broad capabilities demonstrated by LLMs, the creativity they exhibit is a key reason they are considered powerful. However, behind the impressive abilities of LLMs lies a significant question that warrants careful examination: Do these models actually possess real creativity, or is their apparent intelligence merely an illusion – a complex imitation of human thinking created by their training paradigm? This question touches on the very nature of LLM intelligence, which may not be easily explained. Since LLMs have shown considerable creativity, understanding the extent and characteristics of this creativity is essential. Gaining deeper insight into the creativity of LLMs can not only guide people in further improving their performance but also in enhancing peoples’ understanding of the nature of their creativity. This, in turn, informs their daily use and application of these models, underscoring the need for an effective method to measure and assess their creativity. Specifically, creative abilities are critical for the following application scenarios. First, LLM can inspire humans on creative tasks and provide novel ideas, especially in research idea generation. It has also been suggested that the use of LLM can also lead to homogenization of creativity. Second, humor generation with LLMs offer significant value in both creative and practical applications. By simulating human-like humor, LLMs can assist in content creation for entertainment, marketing, and social media. Finally, LLMs can serve as powerful cocreators in creative writings by generating narrative ideas, suggesting plot developments, or even drafting sections of text that inspire further refinement by human writers.
Creativity, as a term, traditionally refers to the natural ability to think innovatively, to make unconventional connections, and to devise solutions that are both novel and effective. Assessing the creativity of LLMs is fraught with challenges. First, the question of creativity does not have clear answers to refer to. When people ask an LLM a question such as “what is the speed of light in vacuum in meters per second?”, the answer can be formally vetted, given the objective nature of the topic. However, when posed with a prompt such as “what would be the implications if animals could talk?”, the situation becomes different in this case because there is no definitive answer and the answer is open and divergent, making it challenging to judge the correctness of the output. Additionally, since creativity encompasses various aspects, including originality and flexibility, it is necessary to design diverse tasks and criteria to measure these qualities effectively in LLMs. In addition, there are differences between LLMs and humans, which might lead to irrelevant responses or serious logical issues, requiring us to additionally assess these aspects. Finally, evaluating creativity necessitates a delicate balance between accuracy and efficiency, rendering traditional human-based evaluation methods less practical. Therefore, it is imperative to address the challenges outlined above to make a robust and sound assessment of creativity in LLMs.
Recognizing the need for a comprehensive assessment of LLM’s creativity, researchers of the paper published in Machine Intelligence Research design an efficient framework to automatically assess the creativity of LLMs by adapting and modifying the Torrance tests of creative thinking (TTCT), a widely recognized tool in psychometrics’ research for human creativity assessment. To enhance the credibility of the results and reduce the randomness, seven verbal tasks, which use verbal stimuli, were selected. Researchers employed GPT-4, the most advanced LLM, to expand the question set for each task, thereby constructing the testing dataset. To ensure a thorough and objective evaluation of creativity and capture creativity’s various manifestations, researchers combine diverse tasks and criteria. Researchers design a comprehensive test protocol incorporating four criteria for measuring creativity: Fluency, flexibility, originality, and elaboration. Researchers let the LLMs answer questions from the constructed dataset, obtaining many question-answer pairs. Researchers utilized GPT-4 as an evaluator to assess each answer, as the GPT-4 is capable of effectively assessing the openness of responses and identifying their shortcomings and errors. Under proper prompt engineering, GPT-4 can efficiently and effectively complete the evaluation of the entire dataset results. Thus, researchers can achieve a balance between efficiency and accuracy in their assessment method.
Researchers selected six popular LLMs as test subjects, each possessing different architectures and parameter scales. In addition to the overall testing, researchers conducted some additional exploratory experiments that investigate the changes of creativity levels exhibited by LLMs when given different types of prompts and different roles that LLMs play. Then, researchers designed a collaboration mechanism for LLMs to explore the impact of multiple LLMs collaborating on creativity. Last, researchers also performed some psychological experiments related to personality traits on the LLMs, including emotional intelligence (EI), empathy, the big five inventory (BFI) and self-efficacy. Because researchers found in relevant psychological research showing that human creativity is correlated with these personality traits and researchers verified the consistency between LLMs and humans in this regard.
Researchers’ experiments and analysis yielded several conclusions. First, there are significant differences in creative performance among different models, even among those of the same scale with an equal number of parameters. This variation primarily exists between different types of models. Their differences are reflected mainly in the model architecture, parameter settings during training, alignment strategies, and the datasets used for training. Additionally, researchers observed that models generally excel in the elaboration metric, but tend to be less adept in demonstrating originality. In addition, the type of prompt and the specific role-play request given to the model also plays a significant role in influencing its creative output. When the models are given instructive prompts or chain-of-thought prompts, there is a significant increase in the level of creativity. Additionally, having LLM play different roles leads to notable differences; the role of a scientist demonstrates the highest level of creativity. Many roles even show a decrease compared to the default scenario, but there is generally an improvement in originality. Then, collaboration among multiple LLMs can enhance the level of creativity, with the most notable improvement in originality. Finally, the results of the psychological scale revealed consistency between LLMs and humans in terms of associated creativity factors, such as emotional intelligence (EI), empathy, self-efficacy, and others.
Section 2 reviews related works in three aspects. It first introduces creativity assessment in psychological research. Then, it discusses findings in psychological research on creativity and personality. Finally, it addresses the assessment of large language models' (LLMs) creativity.
In Section 3, researchers design an overall framework to evaluate LLM’s creativity. First, researchers constructed a dataset containing 700 questions of 7 tasks that were derived and modified from the psychology scale of the TTCT and expanded the number of questions via GPT-4. Researchers tested six models on four different criteria using the dataset they constructed. Following this, researchers conducted a series of experiments on the creativity of LLMs when giving different types of prompts and assigning different roles to LLMs. Finally, researchers used the GPT-4 as the evaluator to obtain the performance results of the LLMs and verify the consistency of the LLM-based evaluation with humans.
Section 4 introduces evaluation and results. Researchers conducted a statistical analysis of the creativity scores of 6 popular LLMs across seven tasks, totaling 700 questions. Researchers unveiled hidden conclusions within the data results from various dimensions. Researchers compared the differences in creativity levels between the models, and they compared the performance variations under different criteria within the same model. Subsequently, researchers experimented with many types of prompts to see whether changes in prompts would affect the models′ levels of creativity. Since LLMs possess the ability to play user-specified roles, researchers select six typical human identities to explore the impact on creativity under different role-playing conditions. Finally, researchers utilize some psychological scales to test the LLMs, investigating the correlation between the personality traits of the LLMs and creativity.
Section 5 are about conclusions and discussions. Researchers believe that the creativity exhibited by LLMs is only an outcome-oriented interpretation. Whether AI models possess true creativity from a human cognitive perspective remains an open question in the field of artificial intelligence. LLM’s expression of creativity is likely to be an imitation of human creativity through a large amount of learning. Understanding the creativity of LLMs is also beneficial for uncovering the inner secrets of the model “black box”, and for a deeper understanding of the nature of intelligence and cognition. Although analyzing the nature of creativity is difficult, researchers’ analysis and evaluation of LLM creativity performance is fundamental to study of the kernel of creativity.
See the article:
Assessing and Understanding Creativity in Large Language Models
http://doi.org/10.1007/s11633-025-1546-4
Journal
Machine Intelligence Research
Article Title
Assessing and Understanding Creativity in Large Language Models
No comments:
Post a Comment