A new way to let AI chatbots converse all day without crashing
Researchers developed a simple yet effective solution for a puzzling problem that can worsen the performance of large language models such as ChatGPT
Reports and Proceedings
When a human-AI conversation involves many rounds of continuous dialogue, the powerful large language machine-learning models that drive chatbots like ChatGPT sometimes start to collapse, causing the bots’ performance to rapidly deteriorate.
A team of researchers from MIT and elsewhere has pinpointed a surprising cause of this problem and developed a simple solution that enables a chatbot to maintain a nonstop conversation without crashing or slowing down.
Their method involves a tweak to the key-value cache (which is like a conversation memory) at the core of many large language models. In some methods, when this cache needs to hold more information than it has capacity for, the first pieces of data are bumped out. This can cause the model to fail.
By ensuring that these first few data points remain in memory, the researchers’ method allows a chatbot to keep chatting no matter how long the conversation goes.
The method, called StreamingLLM, enables a model to remain efficient even when a conversation stretches on for more than 4 million words. When compared to another method that avoids crashing by constantly recomputing part of the past conversations, StreamingLLM performed more than 22 times faster.
This could allow a chatbot to conduct long conversations throughout the workday without needing to be continually rebooted, enabling efficient AI assistants for tasks like copywriting, editing, or generating code.
“Now, with this method, we can persistently deploy these large language models. By making a chatbot that we can always chat with, and that can always respond to us based on our recent conversations, we could use these chatbots in some new applications,” says Guangxuan Xiao, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on StreamingLLM.
Xiao’s co-authors include his advisor, Song Han, an associate professor in EECS, a member of the MIT-IBM Watson AI Lab, and a distinguished scientist of NVIDIA; as well as Yuandong Tian, a research scientist at Meta AI; Beidi Chen, an assistant professor at Carnegie Mellon University; and senior author Mike Lewis, a research scientist at Meta AI. The work will be presented at the International Conference on Learning Representations.
A puzzling phenomenon
Large language models encode data, like words in a user query, into representations called tokens. Many models employ what is known as an attention mechanism that uses these tokens to generate new text.
Typically, an AI chatbot writes new text based on text it has just seen, so it stores recent tokens in memory, called a KV Cache, to use later. The attention mechanism builds a grid that includes all tokens in the cache, an “attention map” that maps out how strongly each token, or word, relates to each other token.
Understanding these relationships is one feature that enables large language models to generate human-like text.
But when the cache gets very large, the attention map can become even more massive, which slows down computation.
Also, if encoding content requires more tokens than the cache can hold, the model’s performance drops. For instance, one popular model can store 4,096 tokens, yet there are about 10,000 tokens in an academic paper.
To get around these problems, researchers employ a “sliding cache” that bumps out the oldest tokens to add new tokens. However, the model’s performance often plummets as soon as that first token is evicted, rapidly reducing the quality of newly generated words.
In this new paper, researchers realized that if they keep the first token in the sliding cache, the model will maintain its performance even when the cache size is exceeded.
But this didn’t make any sense. The first word in a novel likely has nothing to do with the last word, so why would the first word be so important for the model to generate the newest word?
In their new paper, the researchers also uncovered the cause of this phenomenon.
Attention sinks
Some models use a Softmax operation in their attention mechanism, which assigns a score to each token that represents how much it relates to each other token. The Softmax operation requires all attention scores to sum up to 1. Since most tokens aren’t strongly related, their attention scores are very low. The model dumps any remaining attention score in the first token.
The researchers call this first token an “attention sink.”
“We need an attention sink, and the model decides to use the first token as the attention sink because it is globally visible — every other token can see it. We found that we must always keep the attention sink in the cache to maintain the model dynamics,” Han says.
In building StreamingLLM, the researchers discovered that having four attention sink tokens at the beginning of the sliding cache leads to optimal performance.
They also found that the positional encoding of each token must stay the same, even as new tokens are added and others are bumped out. If token 5 is bumped out, token 6 must stay encoded as 6, even though it is now the fifth token in the cache.
By combining these two ideas, they enabled StreamingLLM to maintain a continuous conversation while outperforming a popular method that uses recomputation.
For instance, when the cache has 256 tokens, the recomputation method takes 63 milliseconds to decode a new token, while StreamingLLM takes 31 milliseconds. However, if the cache size grows to 4,096 tokens, recomputation requires 1,411 milliseconds for a new token, while StreamingLLM needs just 65 milliseconds.
The researchers also explored the use of attention sinks during model training by prepending several placeholder tokens in all training samples.
They found that training with attention sinks allowed a model to maintain performance with only one attention sink in its cache, rather than the four that are usually required to stabilize a pretrained model’s performance.
But while StreamingLLM enables a model to conduct a continuous conversation, the model cannot remember words that aren’t stored in the cache. In the future, the researchers plan to target this limitation by investigating methods to retrieve tokens that have been evicted or enable the model to memorize previous conversations.
StreamingLLM has been incorporated into NVIDIA's large language model optimization library, TensorRT-LLM.
This work is funded, in part, by the MIT-IBM Watson AI Lab, the MIT Science Hub, and the U.S. National Science Foundation.
###
Written by Adam Zewe, MIT News
Paper: “Efficient streaming language models with attention sinks”
https://arxiv.org/pdf/2309.17453.pdf
ARTICLE TITLE
“Efficient streaming language models with attention sinks”
New chip opens door to AI computing at light speed
Penn Engineers have developed a new chip that uses light waves, rather than electricity, to perform the complex math essential to training AI. The chip has the potential to radically accelerate the processing speed of computers while also reducing their energy consumption.
The silicon-photonic (SiPh) chip’s design is the first to bring together Benjamin Franklin Medal Laureate and H. Nedwill Ramsey Professor Nader Engheta’s pioneering research in manipulating materials at the nanoscale to perform mathematical computations using light — the fastest possible means of communication — with the SiPh platform, which uses silicon, the cheap, abundant element used to mass-produce computer chips.
The interaction of light waves with matter represents one possible avenue for developing computers that supersede the limitations of today’s chips, which are essentially based on the same principles as chips from the earliest days of the computing revolution in the 1960s.
In a paper in Nature Photonics, Engheta’s group, together with that of Firooz Aflatouni, Associate Professor in Electrical and Systems Engineering, describes the development of the new chip. “We decided to join forces,” says Engheta, leveraging the fact that Aflatouni’s research group has pioneered nanoscale silicon devices.
Their goal was to develop a platform for performing what is known as vector-matrix multiplication, a core mathematical operation in the development and function of neural networks, the computer architecture that powers today’s AI tools.
Instead of using a silicon wafer of uniform height, explains Engheta, “you make the silicon thinner, say 150 nanometers,” but only in specific regions. Those variations in height — without the addition of any other materials — provide a means of controlling the propagation of light through the chip, since the variations in height can be distributed to cause light to scatter in specific patterns, allowing the chip to perform mathematical calculations at the speed of light.
Due to the constraints imposed by the commercial foundry that produced the chips, Aflatouni says, this design is already ready for commercial applications, and could potentially be adapted for use in graphics processing units (GPUs), the demand for which has skyrocketed with the widespread interest in developing new AI systems. “They can adopt the Silicon Photonics platform as an add-on,” says Aflatouni, “and then you could speed up training and classification.”
In addition to faster speed and less energy consumption, Engheta and Aflatouni’s chip has privacy advantages: because many computations can happen simultaneously, there will be no need to store sensitive information in a computer’s working memory, rendering a future computer powered by such technology virtually unhackable. “No one can hack into a non-existing memory to access your information,” says Aflatouni.
This study was conducted at the University of Pennsylvania School of Engineering and Applied science and supported in part by a grant from the U.S. Air Force Office of Scientific Research’s (AFOSR) Multidisciplinary University Research Initiative (MURI) to Engheta (FA9550-21-1-0312) and a grant from the U.S. Office of Naval Research (ONR) to Afaltouni (N00014-19-1-2248).
Other co-authors include Vahid Nikkhah, Ali Pirmoradi, Farshid Ashtiani and Brian Edwards of Penn Engineering.
JOURNAL
Nature Photonics
METHOD OF RESEARCH
Experimental study
SUBJECT OF RESEARCH
Not applicable
ARTICLE TITLE
Inverse-designed low-index-contrast structures on silicon photonics platform for vector-matrix multiplication
ARTICLE PUBLICATION DATE
16-Feb-2024
Widespread machine learning methods behind ‘link prediction’ are performing very poorly
New research indicates that methods used to test the accuracy of link prediction are flawed, and that link prediction does not work as well as common benchmarking tests currently indicate
Peer-Reviewed PublicationAs you scroll through any social media feed, you are likely to be prompted to follow or friend another person, expanding your personal network and contributing to the growth of the app itself. The person suggested to you is a result of link prediction: a widespread machine learning (ML) task that evaluates the links in a network — your friends and everyone else’s — and tries to predict what the next links will be.
Beyond being the engine that drives social media expansion, link prediction is also used in a wide range of scientific research, such as predicting the interaction between genes and proteins, and is used by researchers as a benchmark for testing the performance of new ML algorithms.
New research from UC Santa Cruz Professor of Computer Science and Engineering C. “Sesh” Seshadhri published in the journal Proceedings of the National Academy of Sciences establishes that the metric used to measure link prediction performance is missing crucial information, and link prediction tasks are performing significantly worse than popular literature indicates.
Seshadhri and his coauthor Nicolas Menand, who is a former UCSC undergraduate and masters student and a current Ph.D. candidate at the University of Pennsylvania, recommend that ML researchers stop using the standard practice metric for measuring link prediction, known as AUC, and introduce a new, more comprehensive metric for this problem. The research has implications for trustworthiness around decisionmaking in ML.
AUC’s ineffectiveness
Seshadhri, who works in the fields of theoretical computer science and data mining and is currently an Amazon scholar, has done previous research on ML algorithms for networks. In this previous work he found certain mathematical limitations that were negatively impacting algorithm performance, and in an effort to better understand the mathematical limitations in context, dove deeper into link prediction due to its importance as a testbed problem for ML algorithms.
‘“The reason why we got interested is because link prediction is one of these really important scientific tasks which is used to benchmark a lot of machine learning algorithms,” Seshadhri said. “What we were seeing was that the performance seemed to be really good… but we had an inkling that there seemed to be something off with this measurement. It feels like if you measured things in a different way, maybe you wouldn’t see such great results.”
Link prediction is based on the ML algorithm’s ability to carry out low dimensional vector embeddings, the process by which the algorithm represents the people within a network as a mathematical vector in space. All of the machine learning occurs as mathematical manipulations to those vectors.
AUC, which stands for “area under curve” and is the most common metric for measuring link prediction, gives ML algorithms a score from zero to one based on the algorithm's performance.
In their research, the authors discovered that there are fundamental mathematical limitations to using low dimensional embeddings for link predictions, and that AUC can not measure these limitations. The inability to measure these limitations caused the authors to conclude that AUC does not accurately measure link prediction performance.
Seshadhri said these results call into question the widespread use of low dimensional vector embeddings in the ML field, considering the mathematical limitations that his research has surfaced on their performance.
Leading methods fall short
The discovery of AUC’s shortcomings led the researchers to create a new metric to better capture the limitations, which they call VCMPR. They used VCMPR to measure 12 ML algorithms chosen to be representative of the field, including algorithms such as DeepWalk, Node2vec, NetMF, GraphSage, and graph benchmark leader HOP-Rec, and found that the link prediction performance was worse using VCMPR as the metric rather than AUC.
“When we look at the VCMPR scores, we see that the scores of most of the leading methods out there are really poor,” Seshadhri said. “It looks like they're actually not doing a good job when you measure things a different way.”
The results also showed that not only was performance lower across the board, some of the algorithms that performed worse than other algorithms when measured with AUC in turn performed better than the cohort with VCMPR, and vice versa.
Trustworthiness in machine learning
Seshadhri suggests that ML researchers use VCMPR to benchmark the link prediction performance of their algorithms, or at the very least stop using AUC as their measure. As metrics are so tightly connected to decision making in ML, using a flawed system to measure performance could lead to flawed decision making about which algorithms to employ in real world ML applications.
“Metrics are so closely tied to what we decide to deploy in the real world — people need to have some trust in that. If you have the wrong way of measuring, how can you trust the results?” Seshadri said. “This paper is in some sense cautionary: we have to be more careful about how we do our machine learning experiments, and we need to come up with a richer set of measures.”
In academia, using an accurate metric is crucial to creating progress in the ML field.
“This is in some sense a bit of a conundrum for scientific progress. A new result has to supposedly be better than everything previously, otherwise it's not doing anything new — but that all depends on how you measure it.”
Beyond machine learning, there are researchers across a wide range of fields who use link prediction and ML to conduct their research, often with profound potential impact. For example, some biologists use link prediction to determine which proteins are likely to interact as a part of drug discovery. These biologists and other researchers outside of ML depend on the ML experts to create trustworthy tools, as they often cannot become ML experts themselves.
While he thinks these results may not be a huge surprise to those deeply involved in the field, he hopes that the larger community of ML researchers, and particularly graduate and Ph.D. students who use the current literature to learn best practices and common wisdom about the field, will take note of these results and take caution in their work. He sees this research that presents a skeptical view to be in somewhat contrast to a dominant philosophy in ML, which tends to accept a set of metrics and focuses on “pushing the bar” when it comes to progress in the field.
“It's important that we have the skeptical view, are trying to understand deeper, and are constantly asking ourselves ‘Are we measuring things correctly?’”
This research was funded by the National Science Foundation and the Army Research Office.
JOURNAL
Proceedings of the National Academy of Sciences
ARTICLE TITLE
Link prediction using low-dimensional node embeddings: the measurement problem
ARTICLE PUBLICATION DATE
12-Feb-2024
Road features that predict crash sites identified in new machine-learning model
The study used data collected from 9,300 miles of road
AMHERST, Mass. – Issues such as abrupt changes in speed limits and incomplete lane markings are among the most influential factors that can predict road crashes, finds new research by University of Massachusetts Amherst engineers. The study then used machine learning to predict which roads may be the most dangerous based on these features.
Published in the journal Transportation Research Record, the study was a collaboration between UMass Amherst civil and environmental engineers Jimi Oke, assistant professor; Eleni Christofa, associate professor; and Simos Gerasimidis, associate professor; and civil engineers from Egnatia Odos, a publicly owned engineering firm in Greece.
The most influential features included road design issues (such as changes in speed limits that are too abrupt or guardrail issues), pavement damage (cracks that stretch across the road and webbed cracking referred to as “alligator” cracking) and incomplete signage and road markings.
To identify these features, the researchers used a dataset of 9,300 miles of roads across 7,000 locations in Greece. “Egnatia Odos had the real data from every highway in the country, which is very hard to find,” says Gerasimidis.
Oke, who, with Christofa, is also a faculty member in the UMass Transportation Center, suspects the findings may stretch well beyond Greek borders.
“The problem itself is globally applicable—not just to Greece, but to the United States,” he says. Differences in road designs may influence how variables rank, but given the intuitive nature of the features, he suspects that the features themselves would be important regardless of location. “The indicators themselves are universal types of observations, so there’s no reason to believe that they wouldn’t be generalizable to the US.” He also notes that this approach can be readily deployed on new data from other locations as well.
Importantly, it puts decades of road data to good use: “We have all these measures that we can use to predict the crash risk on our roads and that is a big step in improving safety outcomes for everyone,” he says.
There are many future applications for this work. For starters, it will help future research home in on the important features to study. “We had 60-some-odd indicators. But now, we can just really focus our money on capturing the ones that we need,” says Oke. “One could dig deeper to understand how a certain feature actually could contribute to crashes,” and then measure to see if fixing the issue would actively reduce the number of incidents that occur.
He also envisions how this could be used to train AI for real-time road condition monitoring. “You could train models that can identify these features from images and then predict the crash risk as a first step towards an automated monitoring system, and also provide recommendations on what we should fix,” he says.
Gerasimidis adds that this is an exciting, real-world application of AI. “This is a big initiative we are doing here and it has specific engineering outcomes,” he says. “The purpose was to do this AI study and bring it up to [Greek] officials to say ‘look what we can do.’ It is very difficult to use AI and come up with specific results that could be implemented, and I think this study is one of them. It is now up to the Greek officials to utilize these new tools to mitigate the huge problem of car crash fatalities. We are very eager to see our findings lead to improving this problem.”
“This work could serve as the roadmap for future collaborations between academics and engineers on other topics,” he adds. “The mathematical tools along with real data consist of a truly powerful combination when looking at societal problems.”
JOURNAL
Transportation Research Record Journal of the Transportation Research Board
METHOD OF RESEARCH
Computational simulation/modeling
SUBJECT OF RESEARCH
Not applicable
ARTICLE TITLE
Feature Engineering and Decision Trees for Predicting High Crash-Risk Locations Using Roadway Indicators
Imageomics poised to enable new understanding of life
Research on mimicry in butterflies provides one example
Reports and ProceedingsDENVER – Imageomics, a new field of science, has made stunning progress in the past year and is on the verge of major discoveries about life on Earth, according to one of the founders of the discipline.
Tanya Berger-Wolf, faculty director of the Translational Data Analytics Institute at The Ohio State University, outlined the state of imageomics in a presentation on Feb. 17, 2024, at the annual meeting of the American Association for the Advancement of Science.
“Imageomics is coming of age and is ready for its first major discoveries,” Berger-Wolf said in an interview before the meeting.
Imageomics is a new interdisciplinary scientific field focused on using machine learning tools to understand the biology of organisms, particularly biological traits, from images.
Those images can come from camera traps, satellites, drones – even the vacation photos that tourists take of animals like zebras and whales, said Berger-Wolf, who is director of the Imageomics Institute at Ohio State, funded by the National Science Foundation.
These images contain a wealth of information that scientists couldn’t properly analyze and use before the development of artificial intelligence and machine learning.
The field is new – the Imageomics Institute was just founded in 2021 – but big things are happening, Berger-Wolf told AAAS.
One major area of study that is coming to fruition involves how phenotypes – the observable traits of animals that can be seen in images – are related to their genome, the DNA sequence that produces these traits.
“We are on the cusp of understanding the direct connections of observable phenotype to genotype,” she said.
“We couldn’t do this without imageomics. It is pushing forward both artificial intelligence and biological science.”
Berger-Wolf cited new research on butterflies as one example of the advances that imageomics is making. She and colleagues are studying mimics – butterfly species whose appearance is similar to a different species. One reason for mimicry is to look like a species that predators, such as birds, avoid because their taste is not appealing.
In these cases, birds – as well as humans – can’t tell the species apart by looking at them, even though the butterflies themselves know the difference. However, machine learning can analyze images and learn the very subtle differences in color or other traits that differentiate the types of butterflies.
“We can’t tell them apart because these butterflies didn’t evolve these traits for our benefit. They evolved to signal to their own species and to their predators,” she said.
“The signal is there – we just can’t see it. Machine learning can allow us to learn what those differences are.”
But more than that, we can use the imageomics approach to change the images of the butterflies to see how extensive the mimics’ differences must be to fool birds. Researchers are planning to print realistic images of the butterflies with subtle differences to see which ones real birds respond to.
This is doing something new with AI that hasn’t been done before.
“We’re not using AI to just recapitulate what we know. We are using AI to generate new scientific hypotheses that are actually testable. It is exciting,” Berger-Wolf said.
Researchers are going even further with the imageomics approach to connect these subtle differences in how the butterflies look to the actual genes that lead to those differences.
“There’s a lot we are going to be learning in the next few years that will push imageomics forward into new areas that we can only imagine now,” she said.
One key goal is to use this new knowledge generated by imageomics to find ways to protect threatened species and the habitats where they live.
“There’s a lot of good that will come from imageomics in the coming years,” Berger-Wolf said.
Berger-Wolf’s AAAS presentation, titled “Imageomics: Images as the Source of Information About Life” is part of the session “Imageomics: Powering Machine Learning for Understanding Biological Traits.”
METHOD OF RESEARCH
Systematic review
SUBJECT OF RESEARCH
Animals
ARTICLE TITLE
Imageomics: Images as the Source of Information About Life
ARTICLE PUBLICATION DATE
17-Feb-2024
Q&A: AI and sophisticated scams
targeting businesses and consumers in
2024
By Dr. Tim Sandle
DIGITAL JOURNAL
February 15, 2024
Photo: © AFP
The rise of generative AI-aided DIY fraud, deceptive donation and investment opportunities, along with a resurgence of synthetic identities, are among the top ways Experian predicts fraudsters will take advantage of consumers and business in 2024.
Experian has launched its annual Future of Fraud Forecast, revealing the top five anticipated fraud threats for the new year.
To learn more, Digital Journal caught up with Kathleen Peters, chief innovation officer at Experian Decision Analytics in North America, to explore the biggest anticipated fraud threats for 2024. This is to help businesses and consumers protect against fraud.
Digital Journal: Experian recently launched its 2024 future of fraud forecast. Tell us about this year’s fraud predictions.
Kathleen Peters: This year’s predictions reflect the emergence of AI as a tool for fraudsters. We’re seeing deep fake content and social engineering on the horizon as fraudsters become more sophisticated in their use of AI and other technologies. On the other hand, we’re also seeing some old favourites gaining traction. For instance, consumers could fall victim to fraudulent requests for funding social causes or investment schemes.
Consumers, tired of contending with these online threats, are going back to physical bank branches, prompting the need for more on-site identity authentication through methods such as biometric verification. Retail scams, in which people ask for credit for goods they never returned, will increase. The year looks to be one in which businesses and consumers will face varied, clever threats, a pattern that has become the norm in fighting fraud.
DJ: Tell us more about how AI will contribute to fraud?
Peters: The recent growth and accessibility of generative artificial intelligence has made it possible for anyone to use the technology for worthwhile endeavours like research and analysis. It has also given fraudsters a new toolbox from which to develop a variety of scams. Easily available and manipulated, there really is no limit to the AI-powered schemes a fraudster can create. As a result, Experian predicts AI will be a popular tool in “do-it-yourself” fraud. We will see deepfake emails, voice, and video as well as fake websites – all designed to execute attacks online. Bad actors will be able to launch fraud attacks more quickly, easily and accurately, making it challenging to know what is real.
Social engineering, which has been in practice for a while as a means of creating fake identities, will also become more prevalent with AI. Fraudsters will use generative AI to create fake identities on social media. Once these profiles are online, they can interact with people and companies, appearing as if they are real and making it more difficult to identify and investigate. Given the endless number of profiles generative AI can produce, we expect to see a spike in these types of fraud attacks.
To combat fraud and protect consumers, we believe businesses will need to “fight AI with AI,” to stay ahead of these criminals.
DJ: Do you anticipate new developments in synthetic identity fraud?
Peters: During the pandemic, fraudsters established synthetic identities to commit fraud but abandoned them as they realized there were easier methods of stealing funds through the various aid programs that existed to help companies and consumers. Those dormant identities have since developed several years of history, making it easier to elude detection. We anticipate fraudsters will use those accounts to “bust out” and steal funds over the next year.
Businesses need to incorporate the right fraud prevention tools to detect and distinguish between different fraud types. By recognizing the type of fraud, they can then work with a partner to determine the best steps to treat it. Leveraging data, advanced analytics and technology provides companies with a more in-depth view of fraudulent behavior and helps with the detection of patterns and abnormalities that can flag potential fraud while still providing a frictionless experience for legitimate customers.
DJ: You mentioned retail scams. What is the latest trend?
Peters: Fraudsters continue to view retail as a lucrative opportunity due to the sheer volume of online purchases and returns. The practice of empty returns is a trend to watch this year.
Empty return fraud occurs when a retailer opens a return package and it’s empty. The customer claims they returned the item, and it must have gotten lost or stolen during the shipping process. Fraudsters will use this method to keep merchandise and cost companies lost goods and revenue.
DJ: It’s interesting that people are returning to physical bank locations. What’s driving this?
Peters: Consumers are seeking a refuge from online identity concerns and turning to brick-and-mortar bank branches to feel safer when opening new accounts or asking for financial advice. However, it’s important to keep in mind that human error or oversight can still happen in-person, so lenders will need to leverage more digital identity verification methods to keep legitimate customers safe.
An Experian survey found that 85 percent of people report that physical biometrics is the most trusted and secure authentication method they’ve recently encountered, yet only 32% of businesses are using this method to combat fraud. We anticipate that lenders will see the benefits and begin using more digital identity tools like physical biometrics to protect their customers’ identities and prevent losses.
DJ: What can consumers do to prevent identity fraud?
Peters: The number of ways fraudsters can perpetrate scams continues to grow, so consumers need to be vigilant and on high alert when it comes to the communications they receive. Whether it is via text message, email, phone call or even social media messages, people will need to do their due diligence when it comes to ensuring they know who they are interacting with.
We encourage people to practice security measures at the individual level that can help prevent identity theft. This includes avoiding using public Wi-Fi networks, using strong and varied passwords for important accounts, shopping on secure websites and avoiding giving out any personal information to those they don’t know.
Consumers should also check their credit reports regularly. Doing so can help them spot fraudulent activity and address it before it becomes a larger problem. They should also consider signing up for identity theft monitoring for peace of mind and an additional layer of security.
DJ: How can businesses fight back?
Peters: Businesses are balancing adapting to the new digital environment and the prevalence of technology like generative AI with protecting themselves from fraud attacks.
Fraudsters and their schemes are becoming more sophisticated, so businesses need to be proactive about reviewing their fraud strategy and implementing the right approach to fraud prevention to protect themselves and their customers. Companies should work with a trusted partner who can help them review their portfolio, identify risk and put into place the right fraud solutions to fit their needs. A partner can help a business incorporate a multilayered approach to fraud prevention that leverages data, advanced analytics and insights to identify multiple types of fraud and mitigate loss.
Companies should also educate employees on emerging fraud trends so they can take extra steps to protect themselves and the company from additional risk.
The Global Flourishing Study releases wave one open research data with the Center for Open Science
Researchers can now access the wave one dataset from the Global Flourishing Study, a five-year longitudinal study of 200,000 individuals in over 20 countries
Charlottesville, VA (February 13, 2024) – The wave one dataset from the Global Flourishing Study (GFS) initiative is now available to researchers.
The GFS, a partnership among Gallup, Center for Open Science (COS), and researchers at Baylor University and Harvard University is a $43.4 million, five-year study of more than 200,000 individuals in over 20 countries. The GFS data will become a resource for researchers, journalists, policymakers, and educators worldwide. Data can be accessed through the Open Science Framework (OSF) - COS’s free, open source project management and collaboration tool that supports researchers throughout the entire research lifecycle.
While several studies have tracked respondents over time in a single country, the scope and breadth of GFS is unprecedented.
Project co-director Dr. Byron Johnson, Distinguished Professor of the Social Sciences and Director of the Institute for Studies of Religion at Baylor, commented: “On a project of this scale and scope, it is essential that the data be made available not just to the academic community, but to a truly global audience. This is why our partnership with the Center for Open Science (COS) is so critically important to the success of the GFS. Guided by the principles of the Open Science Framework (OSF), COS is uniquely qualified to ensure that the access to this data resource takes place in a manner that is transparent, ethical, and reproducible.”
The wave one dataset release offers valuable insights over the next several years.
“The Open Science Framework and the expertise of COS makes this possible,” said Dr. Huajin Wang, COS’s Director of Programs. "COS is thrilled to be stewarding the data access process and ensuring that everyone around the globe is able to access this unprecedented data.”
Researchers can access GFS data in three ways:
- Preregistration: Preregister an analysis plan now to receive access to the wave one dataset at cos.io/gfs.
- Registered Report: Access is also available to those who submit a Registered Report to a journal. With Registered Reports, a journal reviews the preregistration plan and agrees to publish the findings regardless of the outcome, protecting against publication bias.
- Public release: Those wishing to receive the data without preregistration can access the non-sensitive data for each wave a year after the initial release.
“The GFS will be an incredible longitudinal data resource on the study of human well-being. I cannot wait to see what our research team and other teams around the world, facilitated by COS, will learn from it,” said project co-director Dr. Tyler VanderWeele, the John L. Loeb and Frances Lehman Loeb Professor of Epidemiology and Director of the Human Flourishing Program at Harvard.
As Gallup’s CEO Jon Clifton remarked, “The GFS initiative is more than a study; it’s a commitment to understanding the human condition. The release of this dataset is a significant step towards that goal.”
Watch the recordings of previous webinars to learn more about using the sample dataset from this study, preregistration and Registered Reports, and how to gain access to the wave one dataset. Sign up to the GFS newsletter for news about the data and to register for webinars in the coming months.
For more information on GFS’s data access, visit: cos.io/gfs.
About the Global Flourishing Study
The Global Flourishing Study is the product of collaboration among researchers from Harvard University, Baylor University, Gallup, and Center for Open Science to address significant limitations in current studies of Human Flourishing. The project is based on the creation of an important new data resource: a global, probability-based panel of more than 200,000 participants from over 20 geographically and culturally diverse countries. Funders include John Templeton Foundation, Templeton Religion Trust, Templeton World Charity Foundation, the Well-Being for Planet Earth Foundation, the Fetzer Institute, Well Being Trust, the Paul L. Foster Family Foundation, and the David & Carol Myers Foundation.
About Baylor’s Institute for Studies of Religion
Launched in 2004, Baylor’s Institute for Studies of Religion (ISR) initiates, supports, and conducts research on religion, involving scholars and projects spanning the intellectual spectrum: history, psychology, sociology, economics, anthropology, political science, philosophy, epidemiology, theology, and religious studies. The ISR mandate extends to all religions, everywhere, and throughout history, and embraces the study of religious effects on prosocial behavior, family life, population health, economic development, and social conflict.
About Harvard’s Human Flourishing Program
Founded in 2016, the Human Flourishing Program at Harvard’s Institute for Quantitative Social Science aims to study and promote human flourishing, and to develop systematic approaches to the synthesis of knowledge across disciplines. The program’s research contributes to the broad question of how knowledge from the quantitative social sciences can be integrated with that of the humanities on questions of human flourishing and how best to carry out this synthesis of knowledge across disciplines.
About Gallup
Gallup is a global analytics and advice firm with more than 80 years of experience measuring public opinion and human development. In the organization’s own research and in working partnerships with government, nonprofit and philanthropic organizations, Gallup develops indicators to measure key global development and social responsibility indicators over time.
About the Center for Open Science
Founded in 2013, COS is a nonprofit culture change organization with a mission to increase openness, integrity, and reproducibility of scientific research. COS pursues this mission by building communities around open science practices, supporting metascience research, and developing and maintaining free, open source software tools, including the Open Science Framework (OSF). Learn more at cos.io.
METHOD OF RESEARCH
Survey
SUBJECT OF RESEARCH
Animals