Intelligence Research

This research delves into adversarial attacks on AI systems, also known as jailbreaks, which can manipulate AI to deviate from ethical guidelines, leading to the generation of harmful content. Key insights from this research include the misuse of AI, which lowers the barrier to entry in the cyber threat landscape and acts as a force multiplier for threat actors. Moreover, it covers the increasing use of AI in cyber and disinformation campaigns. The paper predicts an increase in AI-driven cyber threats, underlining the need for robust AI-powered defense tools and comprehensive policy measures to mitigate these risks.

It is broken down into 5 sections that can be read independently:

1. Introduction to Large Language Models, Aligned LLMs, Guardrails and associated challenges

2. Adversarial Attacks on Large Language Models a.k.a. Jailbreaks

3. Proof of concept for the malicious use of ChatGPT’s hallucination

4. Generative AI tools

5. Predictive analysis of the future of AI related to cybersecurity

1. Artificial Intelligence, Cybersecurity and Large Language Models

Image created with DALL·E 3

AI and Cybersecurity

2023 marked significant advancements in AI, with a surge in tools developed for text, audio, and image generation that allowed a wider audience to create content in ways previously unseen. ChatGPT from OpenAI stood out, as it took 5 days for the AI chat service to reach 1 million users and about 1 month to reach 100 million users. [1] Large Language Models (LLMs) like ChatGPT are designed to understand and generate human-like text. They are trained on vast amounts of data, enabling them to generate coherent text, answer questions, summarize, and translate languages. While they can be fine-tuned for specific tasks, they also have limitations in how they produce information.

With AI's widespread deployment, misuse and abuse is spreading. Malicious AI tools and jailbroken legitimate AI tools are being used to facilitate cyberattacks. As AI becomes more integrated into various IT systems, gradually becoming their nexus, the future of hacking will increasingly revolve around AI-centric cyberattack strategies and tactics. While AI offers immense potential, its misuse, especially in hacking with the support of generative AI tools, poses significant threats to organizations and individuals. Opportunities are also arising as a consequence of the creation and implementation of AI driven cybersecurity tools. The arms race between AI driven cybersecurity offense and defense underscores the importance of research, global collaboration, and vigilance in the AI domain.

Large Language Models

ChatGPT is a Large Language Model. Large language Model is a type of artificial intelligence (AI) model designed to understand and generate human-like text based on the data it has been trained on. LLMs fall under the umbrella of deep learning models using recurrent neural networks or transformers. The Transformer architecture, introduced in 2017, revolutionized natural language processing with its unique self-attention mechanism. Unlike traditional recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, Transformers process data in parallel, enhancing training efficiency. This design underpins state-of-the-art models like Google’s BERT and OpenAI’s GPT, Meta’s BART and Baidu’s ERNIE, as well as many other less widely used LLMs. [2]

LLMs are termed "large" due to their billions of parameters. Parameters are the configuration variables internal to the model that are learned from the training data. More parameters often mean better understanding and more fluent, accurate responses. For instance, GPT-3 has 175 billion parameters, GPT-4 is estimated to have approximately 1.76 trillion. [3] GPT-5 is currently in the preparatory phase of development, details on capabilities and release date are currently unavailable, but it is expected to be much more powerful than its predecessors. [4]

LLMs are trained on vast amounts of text data from a large variety of written content. The idea is to expose the model to as much diverse language as possible so it can observe and learn grammar, reasoning abilities, and even some level of common sense. Tokens are the basic units of text like characters, words or phrases, they enable the model to break down and analyze language into manageable units for processing. The ability to dissect and understand language is fundamentally dependent on the effective use of tokens.[5] Once trained, LLMs can generate coherent and contextually relevant text over long passages, answer questions based on the information they've been trained on, summarize articles or documents and translate between languages to a similar level expected from a university freshman.

One of the powerful features of LLMs is their ability to be tuned for specific tasks. For instance, a model trained on a general dataset can be further trained on medical literature to answer medical-related questions.[6] At EclecticIQ we are developing an LLM-powered Cyber Threat Intelligence tool, EclecticIQ analysts are currently in the fine-tuning process and details will be released soon.

LLMs have limitations. They don't comprehend text in the same way humans do, they generate responses based on patterns in the data they've been fed. They can also produce incorrect or biased information if the training data had parallel biases or errors. [7] They do not have the ability to think critically or possess consciousness. [8]

2. Aligned Language Models

Image created with DALL·E 3

Guardrails in AI: Safeguarding Against Detrimental Content in Language Models

Aligned language models, a subset of these AI-driven systems, are engineered with a distinct emphasis on operating within predefined ethical and safety guidelines, often referred to as "guardrails." LLMs like OpenAI´s GPT and Google´s BERT are so-called Aligned Language Models. Guardrails encompass a comprehensive framework of principles, rules, and constraints, placed above the data training stack, that further steer the behavior of language models during the process of text generation. By focusing on generating content that is precise, respectful, unbiased, and compliant with human-defined values, aligned models are able to address the ethical concerns arising from AI-generated text.

The core principles these guardrails encompass are:

Accuracy: Aligned language models prioritize the accuracy of the information they generate. Through inherent fact-checking mechanisms and an understanding of reliable sources, these models strive to provide information that is trustworthy and factually correct.

Respectfulness: The importance of generating text that is respectful and inclusive of diverse perspectives cannot be understated. Aligned models are programmed to use language that respects cultural sensitivity, avoids derogatory language, and refrains from promoting any form of discrimination.

Unbiased Behavior: Bias is an intrinsic challenge in AI development, and aligned model developers take significant efforts to mitigate this issue. Training on diverse and balanced datasets minimizes the presence of biases in generated text, ensuring that responses are fair and unbiased.

Compliance with Human Values: Guardrails for aligned models are rooted in human-defined values that comprise ethical standards. These models are engineered to refrain from generating content that promotes hate speech, misinformation, or harmful ideologies; aligning with the values upheld by human societies. [9]

The establishment of guardrails for aligned language models is a nuanced process that involves two key phases: pre-training and fine-tuning. In pre-training, language models learn foundational patterns from datasets, without explicit ethical guidelines. [10] Fine-tuning trains the model on specific datasets to align its output with desired ethical guidelines by differentiating between appropriate and inappropriate content. [11]

At the moment, the governance of guardrails is left to the individual AI companies. Not having the correct guardrails in place can pose a variety of unexpected risks in the form of objectionable content that encourages violence and even death in some extreme cases. In late 2021, a man was encouraged by a chatbot created on Replica to kill Queen Elizabeth II, he later scaled the walls of Windsor Castle where he was apprehended with a loaded crossbow. [12] In March 2023, a Belgian man took his own life, encouraged by Eliza chatbot after a 6-week conversation. [13]

Figure 1 - Guardrails in action, GPT successfully refuses to answer an
inappropriate question (click on image to open in separate tab).

Ethical Boundaries Breached: The Risks of Manipulating Aligned Language Models

Jailbreaks are a type of adversarial attack and refer to scenarios where an aligned language model's behavior deviates from its intended ethical and safety constraints as a consequence of an adversarial attack. Adversarial attacks on AI involve the deliberate creation of specialized input data with the goal of causing AI models to produce incorrect or unintended outputs. This way, attackers can steer the model towards generating outputs that align with their malicious intent, thereby gaining an advantage in achieving their objectives.

These deviations can enable the model to generate content that is malicious, inappropriate, or otherwise undesirable. Jailbroken models escape the controlled boundaries established during fine-tuning and produce outputs that may pose risks to individuals, organizations, or society at large. Output risks can include generating harmful misinformation, hate speech, phishing emails, malicious code or other forms of malicious content.

Figure 2 - Successful GPT jailbreak, Napalm Bedtime Stories with Grandma
- now patched (click on image to open in separate tab).

Figure 3 - Post on a cybercrime forum, developer mode jailbreak used to
create ransomware (click on image to open in separate tab).

Adversarial attacks on AI represent a critical challenge for AI ethics and safety. The intentional crafting of inputs to manipulate model behavior poses risks to the responsible use of AI technology. As these models continue to play an increasingly central role in various applications, the development of robust mitigation strategies and ongoing research is essential to ensure their fair, ethical and safe deployment. [14]

While there aren't any reports of jailbreaking used in a cyberattack yet, there are multiple proof of concepts freely available on the internet [15] or on sale on the dark web [16], that provide ready-made prompts intended to jailbreak ChatGPT. An example of a real-world scenario could involve a hacker manipulating a company customer service chatbot into leaking sensitive information via jailbreaks.

Figure 4 – Jailbreak posts on a cybercrime forum
(click on image to open in separate tab).

Mitigating adversarial attacks on aligned language models requires a multi-faceted approach:

Robust Fine-Tuning: Ensuring that the fine-tuning process includes a diverse range of data and scenarios can improve the model's resistance to adversarial attacks.
Adversarial Training: Training models with adversarial examples can enhance their ability to recognize and respond appropriately to manipulated inputs.
Regular Evaluation: Continuously evaluating model outputs for potential deviations from ethical guidelines can help detect and address jailbreak scenarios.
Human Oversight: Incorporating human reviewers in the content generation process can provide an additional layer of safety checks.

3. New AI Powered Tactics: Exploitation of Hallucination To Deliver Malicious Code

AIhallucination
Image created with DALL·E 3

LLM hallucination refers to instances where the Large Language Model produces outputs that are not grounded in its training data, leading to factually incorrect or fictional statements. In this case, the hallucination is weaponized and exploited by attackers to spread malicious packages into developers' environments, i.e. attackers can use ChatGPT to recommend non-existent or "hallucinated" packages.

Researchers recently weaponized ChatGPT’s hallucination in a proof of concept. ChatGPT sometimes generates URLs, references, and even code libraries that don't exist. These hallucinations are possibly resulting from outdated training data. Researchers demonstrated a scenario where ChatGPT recommended a non-existent npm package named "arangodb". An attacker then published a malicious version of this package. When another user later asked ChatGPT about integrating with "arangodb" in node.js, ChatGPT recommended the now-existent malicious package. [17]

4. From Fake News to Cyber Threats: The Expanding Scope of Generative AI

MaliciousAI

Image created with DALL·E 3

Generative AI Exacerbates Social Divisions with Rapid Disinformation Content Production

The content created by generative AI tools has the potential to exacerbate social divisions. Hate speech, incitement to violence, and content that stirs discord are often produced at an alarming rate, contributing to the polarization of online communities.

Generative AI tools are proficient at generating content that can be used to spreads false information and misleading narratives. By automating the production of fake news articles, fabricated images, misleading videos, and provocative social media posts, these tools contribute to the manipulation of public opinion and erode trust in reliable sources. Notable instances include AI-generated news stories that perpetuate false claims and fabricated images that portray events that never occurred. [18] Other examples include AI-generated social media posts that promote extremist ideologies and contribute to harassment campaigns against targeted individuals or groups. [19]

The Role of Generative AI in the Disinformation Space During Recent Conflicts

The late events in Israel and Gaza have seen an overwhelming amount of content on social media, but there is currently some disagreement among experts as to how much AI-generated content was used and its actual impact on the target audience. A recent publication by the Harvard’s Misinformation Review suggests that generative AI might have a limited impact from a quantitative and qualitative standpoint due to, among other reasons, an already content-saturated social media environment and an inherently finite attention span of the target audience. [20] On the other hand, Hany Farid, professor at the UC Berkeley School of Information, states that the presence of Generative AI content in relation to the events in Israel and Gaza is “far more prevalent than it was at the outset of the Russian invasion of Ukraine.” [21]

Tackling the Rapid Rise of Malicious AI Content

Detecting and mitigating generative AI tools pose significant challenges. The line between genuine and AI-generated content is becoming increasingly blurred, making it difficult for traditional content moderation techniques to keep up. The speed and scale at which these tools produce content further compound the problem, as manual moderation becomes impractical.

The creation and deployment of malicious content via generative AI tools raise ethical concerns. The responsibility and accountability for the harm caused by such tools are complex issues. Striking a balance between freedom of expression and preventing the spread of harmful content is a delicate challenge. Efforts to curb malicious content should not infringe upon legitimate discourse.

To address the threat posed by generative AI tools, several strategies are being explored, from digital watermarking to the development of advanced/AI-driven content verification algorithms that can distinguish between genuine and AI-generated content. Additionally, educating users about the potential for AI-driven content manipulation and cultivating critical thinking skills can empower individuals to recognize and question the authenticity of the content they encounter. [22]

AI-Driven Acceleration: Simplifying the Cyber Threat Campaign Process

Large Language Models simplify the Cyber Threat Campaign process by supporting a variety of attack vectors, from crafting convincing phishing emails to creating malicious payloads.[23] AI can therefore lower the barrier to entry, allowing new actors to enter the cyber threat landscape. AI also works as a force multiplier for existing actors by making all the processes involved in the campaign easier and quicker to execute. Malicious AI-generated content is designed to bypass traditional security measures, making it harder for humans and machines to detect and stop.

2023 has seen the widespread development of malicious generative AI tools. Unlike Legitimate LLM tools with guardrails in place that need to be jailbroken to produce malicious content, malicious generative AI tools are designed specifically with the goal of enabling cyberattacks. Examples include: WormGPT, FraudGPT, PoisonGPT, Evil-GPT, XXXGPT, WolfGPT, DarkBART. [24]

The two most popular malicious generative AI tools are WormGPT and FraudGPT.

WormGPT is a blackhat alternative to GPT models, designed specifically for malicious activities. It's based on the GPTJ language model developed in 2021 and offers features like unlimited character support, chat memory retention, and code formatting. WormGPT was trained on various data sources, especially malware-related data. Tests revealed its potential to generate highly persuasive emails for phishing or BEC attacks, showcasing the significant threat posed by such generative AI technologies. [25]

FraudGPT is a new AI tool designed specifically for offensive actions, such as crafting spear phishing emails, creating cracking tools, carding, and more. The tool is currently available for purchase on various Dark Web marketplaces and on Telegram. Like WormGPT, FraudGPT allows threat actors to draft emails that have high likelihood of enticing recipients to click on malicious links. This capability is particularly useful for business email compromise (BEC) phishing campaigns targeting organizations. The tool aids in identifying the most targeted services/sites to further deceive victims. Some of the features of FraudGPT are: writing malicious code, creating hard to detect malware, finding non-VBV bins, creating phishing pages, creating hacking tools, finding groups/sites/markets, writing scam pages/letters, discovering leaks/vulnerabilities, learning to code/hack, finding cardable sites. [26]

As described in the jailbreak section, legitimate LLMs with strong ethical guardrails in place such as ChatGPT can still be manipulated via jailbreaks to create potentially harmful content, in essence turning a legitimate LLM into a malicious generative AI tool.

5. Hacking and AI: What to Expect in 2024 and Beyond

Image created with DALL·E 3

Countering Disinformation in the Age of AI Generative Tools: A Strategic Approach to Detection and Regulation

A holistic approach to counter the threat posed by AI generative tools suggests the use of multiple detection tools and applying Goodhart’s law “when a measure becomes a target, it ceases to be a good measure”. Goodhart's law posits that once a metric is used as a target, it loses its effectiveness as a measure. For instance, the detection of deep fakes often relies on spotting facial movement discrepancies. Initially, this is an effective method to differentiate authentic from manipulated content. However, as this metric becomes a focal point for detection, malicious creators refine their technology to specifically counteract these movement anomalies, rendering the original detection methods less effective. Consequently, those refining AI detection tools may fall into the trap of over-focusing on this single aspect, neglecting broader strategies necessary for identifying synthetic media, and thereby inadvertently facilitating the task of those aiming to deceive the detection systems. [27]

On a strategic level, addressing the threat of AI generative tools requires collaboration among AI developers, researchers, policymakers, and online platforms. All must work together to create a strong and utilitarian product. Establishing ethical guidelines and frameworks for the responsible development and deployment of AI tools is essential in order to benefit society at large. Ongoing discussions about the regulation of AI and its content aim to prevent malicious use while preserving the positive aspects of the technology. In his Executive Order of late October 2023 “New Standards for AI Safety and Security”, US President Biden mandated guidelines targeting, among other things, AI-generated content to “Protect Americans from AI-enabled fraud and deception by establishing standards and best practices for detecting AI-generated content and authenticating official content.” [28] The European Parliament is proposing similar legislation with the EU AI Act drafted in 2021 and expected to be approved soon. [29] This sets a strong precedent for other nations to follow.

Malicious Generative AI: A Force Multiplier in Cyberattacks

EclecticIQ analysts expect Malicious Generative AI tools designed to facilitate cyberattacks to become more widespread and popular. This further lowers the barrier to entry for new cybercriminals to craft better cyberattacks and helps scale quantitative and qualitative capabilities for existing cybercriminals and APT groups. [30] Analysts expect legitimate Generative AI tools to be jailbroken via adversarial attacks with the purpose of turning them into Malicious Generative AI tools.

The use of Malicious Generative AI tools works as a force multiplier, both from a quantitative and from a qualitative standpoint. It will pose a serious challenge to the defenders community, which will need to quickly adapt responses, likely using AI-powered security tools. Initiatives such as the DARPA AI Cyber Challenge, announced in August 2023 at Black Hat USA [31], will help bring together the brightest minds in Cybersecurity and Artificial Intelligence with the aim of building AI-driven cyber defense tools. [32]

Beyond Traditional Hacking: Unveiling The New Age of AI-Centric Adversarial Tactics

EclecticIQ analysts expect new attack vectors pivoting off of AI and exploiting vulnerabilities as AI systems are being integrated into the technology stack of organizations. [33]

EclecticIQ analysts have seen adversarial attacks against Large Language Models, but adversarial attacks can be launched against a variety of AI systems, not just LLMs like GPT. These attacks are designed to exploit vulnerabilities in machine learning models, causing them to make incorrect predictions or classifications. [34]

Common types of AI systems that can be targeted by adversarial attacks are as follows:

Image Classification Systems: One of the most studied areas for adversarial attacks is in the domain of image classification. Attackers can introduce small, often imperceptible perturbations to an image, causing a state-of-the-art image classifier to misclassify it. For instance, an image of a panda can be slightly altered to be misclassified as a gibbon.
Autonomous Vehicles: Adversarial attacks can be used to mislead the perception systems of autonomous vehicles. For example, subtle modifications to road signs can cause the vehicle's system to misinterpret them, potentially leading to unsafe behaviors.
Speech Recognition Systems: Adversarial examples can be crafted for audio data to deceive speech recognition systems. An audio clip can be modified in a way that it sounds the same to humans but is transcribed differently by the system.
Face Recognition Systems: Adversarial attacks can be used to fool face recognition systems, either by making two different faces appear similar or by making the same face appear different to the system.
Reinforcement Learning Agents: Agents trained via reinforcement learning can also be vulnerable to adversarial attacks. By introducing noise or perturbations into the environment, an attacker can cause the agent to take suboptimal or even harmful actions.
Generative Adversarial Networks (GANs): While GANs themselves are a form of adversarial training, they can also be susceptible to adversarial attacks. For instance, the generator can be tricked into producing specific kinds of outputs.
Anomaly Detection Systems: These systems, designed to detect outliers or anomalies in data, can be fooled by carefully crafted adversarial examples that appear normal to the system but are anomalous.
Transfer Learning: Models trained on one task and fine-tuned on another can inherit vulnerabilities from the pre-trained models, making them susceptible to adversarial attacks.

Researchers are actively working on developing defenses against adversarial attacks on AI, but it's a challenging arms race between improving model robustness and crafting more potent adversarial examples. As AI becomes the system center mass target for hackers, the future of hacking is going to revolve around AI-Centric hacking.

2024-2030 AI Predictions: The Rising Tide of AI in Cybersecurity Offense and Defense

Our predictions for 2024 and the following years cover a wide range of relevant aspects:

National Policy: EclecticIQ analysts expect national regulations on AI in multiple countries as well as new global strategic initiatives committing to international collaboration for the safe, ethical development of AI, such as the signing of the Bletchley Declaration at the AI Safety Summit in the UK in November 2023. [35]

AI-Driven Toolsets and Malware: In 2024, EclecticIQ analysts expect more AI-driven disinformation, more malicious generative AI tools, new jailbreaks and the exploitation of new AI vulnerabilities as malicious actors are learning to harness the power of AI for nefarious purposes while developing tailored tools or misusing existing Artificial Intelligence systems via adversarial attacks. As AI becomes more and more widespread, supporting a variety of use cases, its malicious use and its abuse is likely to continue and evolve exponentially in the coming years.
Cybersecurity Challenges Ahead: Analysts expect a rapidly developing AI defenders community, jumpstarted by initiatives such as the DARPA AI Cyber Challenge [36], focusing on countering these cyber threats using AI-powered technology. As it has been in the “legacy” cybersecurity world for more than half a century, AI-Powered Hacking and AI-Centric Hacking will be an arms race between offenders and defenders. This arms race will likely cause adversarial progress, i.e. innovation will arise as AI defenders and offenders learn from each other’s strategies, tactics and tools. The biggest challenge for organizations and defenders will likely consist in finding the right talent as there is currently “a 3.3 million [person] shortage of cybersecurity professionals” [37] around the world, while at the same time AI is lowering the barrier to entry for cybercriminals [38] and increasing capabilities, quantitatively and qualitatively, for APTs and nation states alike, as they experiment and develop new strategies and tactics using these new AI-powered tools.
Next Steps in AI, Artificial General Intelligence: AI companies such as OpenAI are actively working toward the creation of Artificial General Intelligence [39], which is estimated by industry experts to arrive in 5 to 10 years. [40] “Artificial General Intelligence (AGI) is the hypothetical future state of a computer system that can rival or exceed the ability of humans to perform any intellectual task.”[41] This is the next level in Artificial Intelligence research. Opportunities and threats will scale up with this new technology. AGI will work as an additional force multiplier for both defenders and offenders as we will have AGI powered synthetic analysts and synthetic threat actors supporting their human counterparts.

Structured Data

Find this and other research in our public TAXII collection for easy use in your security stack: https://cti.eclecticiq.com/taxii/discovery.

Please refer to our support page for guidance on how to access the feeds.

About EclecticIQ Intelligence & Research Team

EclecticIQ is a global provider of threat intelligence, hunting, and response technology and services. Headquartered in Amsterdam, the EclecticIQ Intelligence & Research Team is made up of experts from Europe and the U.S. with decades of experience in cyber security and intelligence in industry and government.

We would love to hear from you. Please send us your feedback by emailing us at research@eclecticiq.com.

You might also be interested in:

ChatGPT Vulnerability; LockBit Cyberattack On ICBC; US Water Authority Hacked

Sandworm Targets Ukraine's Critical Infrastructure; Overlooked AI Privacy Challenges

Welcoming EclecticIQ Intelligence Center 3.2

References

[1] “Number of ChatGPT Users (Nov 2023),” Exploding Topics. Accessed: Nov. 03, 2023. [Online]. Available: https://explodingtopics.com/blog/chatgpt-users

[2] A. Vaswani et al., “Attention Is All You Need.” arXiv, Aug. 01, 2023. doi: 10.48550/arXiv.1706.03762.

[3] M. Lubbad, “The Ultimate Guide to GPT-4 Parameters: Everything You Need to Know about NLP’s Game-Changer,” Medium. Accessed: Nov. 03, 2023. [Online]. Available: https://medium.com/@mlubbad/the-ultimate-guide-to-gpt-4-parameters-everything-you-need-to-know-about-nlps-game-changer-109b8767855a

[4] D. / J. A. Lanz, “GPT-5 Is Officially on the OpenAI Roadmap Despite Prior Hesitation,” Decrypt. Accessed: Nov. 21, 2023. [Online]. Available: https://decrypt.co/206044/gpt-5-openai-development-roadmap-gpt-4

[5] FHIRFLY, “Understanding Tokens in the Context of Large Language Models like BERT and T5,” Medium. Accessed: Dec. 04, 2023. [Online]. Available: https://medium.com/@fhirfly/understanding-tokens-in-the-context-of-large-language-models-like-bert-and-t5-8aa0db90ef39

[6] A. Toma, S. Senkaiahliyan, P. R. Lawler, B. Rubin, and B. Wang, “Generative AI could revolutionize health care — but not if control is ceded to big tech,” Nature, vol. 624, no. 7990, pp. 36–38, Dec. 2023, doi: 10.1038/d41586-023-03803-y.

[7] “Bias in AI: What it is, Types, Examples & 6 Ways to Fix it in 2023.” Accessed: Dec. 04, 2023. [Online]. Available: https://research.aimultiple.com/ai-bias/

[8] Matthew Sparkes, “AI shows no sign of consciousness yet, but we know what to look for,” New Scientist. Accessed: Dec. 04, 2023. [Online]. Available: https://www.newscientist.com/article/2388344-ai-shows-no-sign-of-consciousness-yet-but-we-know-what-to-look-for/

[9] “Evaluating Large Language Models,” Evaluating Large Language Models. Accessed: Nov. 03, 2023. [Online]. Available: https://toloka.ai/blog/evaluating-llms/

[10] C. Anderson, “Guardrails on Large Language Models, Part 1: Dataset Preparation.” Accessed: Nov. 03, 2023. [Online]. Available: https://avidml.org/blog/llm-guardrails-1/

[11] C. Anderson, “Guardrails on Large Language Models, Part 2: Model Fine Tuning.” Accessed: Nov. 03, 2023. [Online]. Available: https://avidml.org/blog/llm-guardrails-2/

[12] “Man ‘encouraged’ by an AI chatbot to kill Queen Elizabeth II jailed,” euronews. Accessed: Nov. 15, 2023. [Online]. Available: https://www.euronews.com/next/2023/10/06/man-encouraged-by-an-ai-chatbot-to-assassinate-queen-elizabeth-ii-receives-9-year-prison-s

[13] “AI chatbot blamed for ‘encouraging’ young father to take his own life,” euronews. Accessed: Nov. 15, 2023. [Online]. Available: https://www.euronews.com/next/2023/03/31/man-ends-his-life-after-an-ai-chatbot-encouraged-him-to-sacrifice-himself-to-stop-climate-

[14] A. Zou, Z. Wang, J. Z. Kolter, and M. Fredrikson, “Universal and Transferable Adversarial Attacks on Aligned Language Models.” arXiv, Jul. 27, 2023. doi: 10.48550/arXiv.2307.15043.

[15] “Jailbreak Chat.” Accessed: Nov. 03, 2023. [Online]. Available: https://roboreachai.com/chatgpt-jailbreak-25-proven-prompts-to-bypass-chatgpt

[16] N. N. C. Writer, D. R. September 12, and 2023, “ChatGPT Jailbreaking Forums Proliferate in Dark Web Communities,” Dark Reading. Accessed: Nov. 03, 2023. [Online]. Available: https://www.darkreading.com/application-security/chatgpt-jailbreaking-forums-dark-web-communities

[17] B. Lanyado, “Can you trust ChatGPT’s package recommendations?,” Vulcan Cyber. Accessed: Nov. 03, 2023. [Online]. Available: https://vulcan.io/blog/ai-hallucinations-package-risk/

[18] “How generative AI is boosting the spread of disinformation and propaganda,” MIT Technology Review. Accessed: Nov. 03, 2023. [Online]. Available: https://www.technologyreview.com/2023/10/04/1080801/generative-ai-boosting-disinformation-and-propaganda-freedom-house/

[19] D. O. Gordon Curt Devine,Allison, “How antisemitic hate groups are using artificial intelligence in the wake of Hamas attacks,” CNN. Accessed: Nov. 23, 2023. [Online]. Available: https://www.cnn.com/2023/11/14/us/hamas-israel-artificial-intelligence-hate-groups-invs/index.html

[20] F. M. Simon, S. Altay, and H. Mercier, “Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown,” Harv. Kennedy Sch. Misinformation Rev., Oct. 2023, doi: 10.37016/mr-2020-127.

[21] W. Bedingfield, “Generative AI Is Playing a Surprising Role in Israel-Hamas Disinformation,” Wired. Accessed: Nov. 03, 2023. [Online]. Available: https://www.wired.com/story/israel-hamas-war-generative-artificial-intelligence-disinformation/

[22] “Generative AI and deepfakes. How AI will create disinformation.” Accessed: Nov. 03, 2023. [Online]. Available: https://thumos.uk/generative-ai-and-deepfakes/

[23] “Threat Actors are Interested in Generative AI, but Use Remains Limited,” Mandiant. Accessed: Nov. 03, 2023. [Online]. Available: https://www.mandiant.com/resources/blog/threat-actors-generative-ai-limited

[24] “Malicious Generative AI Tools. Buzz, Threat, and Solution,” Heimdal Security Blog. Accessed: Nov. 03, 2023. [Online]. Available: https://heimdalsecurity.com/blog/malicious-generative-ai-tools-solution/

[25] “WormGPT - The Generative AI Tool Cybercriminals Are Using to Launch BEC Attacks | SlashNext,” SlashNext |. Accessed: Nov. 03, 2023. [Online]. Available: https://slashnext.com/blog/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks/

[26] “FraudGPT: The Villain Avatar of ChatGPT.” Accessed: Nov. 03, 2023. [Online]. Available: https://netenrich.com/blog/fraudgpt-the-villain-avatar-of-chatgpt

[27] “Applying Goodhart’s law to AI generated content detection: Misuse & Innovation.” Accessed: Nov. 03, 2023. [Online]. Available: https://www.unitary.ai//articles/applying-goodharts-law-to-ai-generated-content-detection-misuse-innovation

[28] T. W. House, “FACT SHEET: President Biden Issues Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence,” The White House. Accessed: Nov. 03, 2023. [Online]. Available: https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/

[29] “EU AI Act: first regulation on artificial intelligence | News | European Parliament.” Accessed: Nov. 03, 2023. [Online]. Available: https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

[30] “Digital Press Briefing with Anne Neuberger, Deputy National Security Advisor for Cyber and Emerging Technologies,” United States Department of State. Accessed: Nov. 03, 2023. [Online]. Available: https://www.state.gov/digital-press-briefing-with-anne-neuberger-deputy-national-security-advisor-for-cyber-and-emerging-technologies/

[31] DARPA Announces AI Cyber Challenge at Black Hat USA 2023, (Aug. 18, 2023). Accessed: Nov. 03, 2023. [Online Video]. Available: https://www.youtube.com/watch?v=N4XSg_5o8D4

[32] “DARPA AI Cyber Challenge Aims to Secure Nation’s Most Critical Software.” Accessed: Nov. 03, 2023. [Online]. Available: https://www.darpa.mil/news-events/2023-08-09

[33] A. Chuvakin and J. Stone, “Securing AI: Similar or Different?”.

[34] J. Pradeesh, “Council Post: Adversarial Attacks On AI Systems,” Forbes. Accessed: Nov. 03, 2023. [Online]. Available: https://www.forbes.com/sites/forbestechcouncil/2023/07/27/adversarial-attacks-on-ai-systems/

[35] “The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023,” GOV.UK. Accessed: Nov. 03, 2023. [Online]. Available: https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023

[36] “AIxCC.” Accessed: Nov. 03, 2023. [Online]. Available: https://aicyberchallenge.com/

[37] C. Combs, “Cyber security talent gap amid AI boom could be perfect storm, expert warns,” The National. Accessed: Nov. 03, 2023. [Online]. Available: https://www.thenationalnews.com/business/technology/2023/11/03/cyber-security-talent-gap-amid-ai-boom-could-be-perfect-storm-expert-warns/

[38] “New Survey From Abnormal Security Reveals 98% Of Security Leaders Worry About the Risks of Generative AI,” Yahoo Finance. Accessed: Nov. 03, 2023. [Online]. Available: https://finance.yahoo.com/news/survey-abnormal-security-reveals-98-100000906.html

[39] “About.” Accessed: Nov. 24, 2023. [Online]. Available: https://openai.com/about

[40] P. McGuinness, “When AGI?,” AI Changes Everything. Accessed: Nov. 24, 2023. [Online]. Available: https://patmcguinness.substack.com/p/when-agi

[41] “Artificial General Intelligence,” C3 AI. Accessed: Nov. 24, 2023. [Online]. Available: https://c3.ai/glossary/data-science/artificial-general-intelligence/

The Rapidly Evolving Landscape of Generative AI Tools, AI-Powered Cyber Threats and AI-Centric Adversarial Tactics

1. Artificial Intelligence, Cybersecurity and Large Language Models

Large Language Models

2. Aligned Language Models

Guardrails in AI: Safeguarding Against Detrimental Content in Language Models

Ethical Boundaries Breached: The Risks of Manipulating Aligned Language Models

3. New AI Powered Tactics: Exploitation of Hallucination To Deliver Malicious Code

4. From Fake News to Cyber Threats: The Expanding Scope of Generative AI

Generative AI Exacerbates Social Divisions with Rapid Disinformation Content Production

The Role of Generative AI in the Disinformation Space During Recent Conflicts

Tackling the Rapid Rise of Malicious AI Content

AI-Driven Acceleration: Simplifying the Cyber Threat Campaign Process

5. Hacking and AI: What to Expect in 2024 and Beyond

Countering Disinformation in the Age of AI Generative Tools: A Strategic Approach to Detection and Regulation

Malicious Generative AI: A Force Multiplier in Cyberattacks

Beyond Traditional Hacking: Unveiling The New Age of AI-Centric Adversarial Tactics

2024-2030 AI Predictions: The Rising Tide of AI in Cybersecurity Offense and Defense

Structured Data

About EclecticIQ Intelligence & Research Team

You might also be interested in:

References

Talk to one of our experts