A large number of OpenAI credentials, approximately 225,000, were found listed for sale on the dark web last year. These credentials, if misused, could potentially grant unauthorized access to sensitive data transmitted to ChatGPT.
Discovery of Compromised ChatGPT Accounts
Researchers at Group-IB discovered the compromised ChatGPT accounts between January and October 2023. The findings were subsequently published in Group-IB’s Hi-Tech Crime Trends Report 2023/2024.
Origin of Stolen Credentials
The stolen credentials were traced back to logs available for purchase on dark web marketplaces. These logs originated from devices infected with infostealers such as LummaC2, Raccoon, and RedLine. These malicious tools are designed to seek out and gather sensitive information stored on infected devices, including login credentials and financial data.
Surge in Leaked Credentials
Group-IB’s research indicated a 36% surge in leaked ChatGPT access credentials between the first and last five months of their study. The number of infected hosts discovered escalated from just under 96,000 between January and May to over 130,000 between June and October 2023. The final month of the study recorded the highest number of OpenAI credential thefts, with a total of 33,080 instances.
Most Common Sources of Infostealer Logs
Between June and October 2023, LummaC2 emerged as the most prevalent source of infostealer logs containing ChatGPT credentials, with 70,484 cases. This was followed by Raccoon and RedLine, each with less than 23,000 cases.
This represents a shift from previous data from Group-IB, which identified Raccoon as the leading stealer of OpenAI details (with over 78,000 infections), followed by Vidar and RedLine, between June 2022 and May 2023.
“A large number of enterprises are incorporating ChatGPT into their operational processes. Employees often input classified correspondences or utilize the bot to refine proprietary code,” stated Dmitry Shestakov, Group-IB’s Head of Threat Intelligence. “Considering that ChatGPT’s default configuration retains all conversations, this could inadvertently provide a wealth of sensitive intelligence to threat actors if they manage to acquire account credentials.”
Risky Employee Behavior with Generative AI
Recent statistics on ChatGPT account compromises have shed light on the risky behavior of enterprise employees when using generative AI. A report by LayerX in June 2023 revealed that 6% of enterprise employees have at least once pasted sensitive data into generative AI applications, with 4% doing so weekly. The exposed data included 43% internal business data, 31% source code, and 12% personally identifiable information (PII).
A parallel study conducted by Cyberhaven in the same month, which specifically focused on ChatGPT, echoed these findings. The research discovered that 4.7% of employees pasted sensitive data into ChatGPT. Furthermore, incidents of confidential data leakage to ChatGPT per 100,000 employees saw a 60.4% increase between March 4 and April 15 of that year.
Recent Trends in Gen AI Usage
In a more recent report from February 2024, Menlo Security noted an 80% surge in attempts by enterprise employees to upload files to generative AI sites between July and December 2023. This could possibly be attributed to OpenAI’s introduction of a feature in October that allowed premium users to directly upload files to ChatGPT.
Alarmingly, nearly 40% of attempted sensitive inputs to generative AI applications included confidential documents, according to Menlo Security, and over half included PII.
OpenAI itself was not immune to data leaks. In March 2023, a vulnerability led to a data leak that exposed 1.2% of ChatGPT Plus users’ names, email addresses, and payment information.
Recommended Preventive Measures
Menlo Security advises organizations to adopt a layered approach to prevent sensitive information from leaking through generative AI use. This could involve implementing copy-and-paste controls that prevent large amounts of text or known proprietary code from being pasted into input fields. Additionally, it suggests applying generative AI group-level security controls instead of blocking generative AI sites on a domain-by-domain basis.