Introduction
Large Language Models (LLMs) have swiftly become essential components of modern workflows, automating tasks traditionally performed by humans. Their applications span customer support chatbots, content generation, data analysis, and software development, thereby revolutionizing business operations by boosting efficiency and minimizing manual effort. However, their widespread and rapid adoption brings forth significant security challenges that must be addressed to ensure their safe deployment. In this blog, we give a few examples of the potential hazards of generative AI and LLM applications and refer to the Databricks AI Security Framework (DASF) for a comprehensive list of challenges, risks and mitigation controls.
One major aspect of LLM security relates to the output generated by these models. Shortly after LLMs were exposed to the publicity via chat interfaces, so-called jailbreak attacks emerged, where adversaries crafted specific prompts to manipulate the LLMs into producing harmful or unethical responses beyond their intended scope (DASF: Model Serving — Inference requests 9.12: LLM jailbreak). This led to models becoming unwitting assistants for malicious activities like crafting phishing emails or generating code embedded with exploitable backdoors.
Another critical security issue arises from integrating LLMs into existing systems and workflows. For instance, Microsoft’s Edge browser features a sidebar chat assistant capable of summarizing the currently viewed webpage. Researchers have demonstrated that embedding hidden prompts within a webpage can turn the chatbot into a convincing scammer that tries to elicit sensible data from users. These so-called indirect prompt injection attacks leverage the fact that the line between information and commands is blurred, when a LLM processes external information (DASF: Model Serving — Inference requests 9.1: Prompt inject).
In the light of these challenges, any company hosting or developing LLMs should be invested in assessing their resilience against such attacks. Ensuring LLM security is crucial for maintaining trust, compliance, and the safe deployment of AI-driven solutions.
The Garak Vulnerability Scanner
To assess the security of large language models (LLMs), NVIDIA’s AI Red Team introduced Garak, the Generative AI Red-teaming and Assessment Kit. Garak is an open-source tool designed to probe LLMs for vulnerabilities, offering functionalities akin to penetration testing tools from system security. The diagram below outlines a simplified Garak workflow and its key components.
- Generators enable Garak to send prompts to a target LLM and obtain its answer. They abstract the processes of establishing a network connection, authentication and processing the responses. Garak provides various generators compatible with models hosted on platforms like OpenAI, Hugging Face, or locally using Ollama.
- Probes assemble and orchestrate prompts aimed to exploit specific weaknesses or eliciting a particular behavior from the LLM. These prompts have been collected from different sources and cover different jailbreak attacks, generation of toxic and hateful content and prompt injection attacks amongst others. At the time of writing, the probe corpus consists of more than 150 different attacks and 3,000 prompts and prompt templates.
- Detectors are the final important component that analyzes the LLM’s responses to determine if the desired behavior has been elicited. Depending on the attack type, detectors may use simple string-matching functions, machine learning classifiers, or employ another LLM as a “judge” to assess content, such as identifying toxicity.
Together, these components allow Garak to assess the robustness of an LLM and identify weaknesses along specific attack vectors. While a low success rate in these tests doesn’t imply immunity, a high success rate suggests a broader and more accessible attack surface for adversaries.
In the next section, we explain how to connect a Databricks-hosted LLM to Garak to run a security scan.
Scanning Databricks Endpoints
Integrating Garak with your Databricks-hosted LLMs is straightforward, thanks to Databricks’ REST API for inference.
Installing Garak
Let’s start by creating a virtual environment and installing Garak using Python’s package manager, pip:
If the installation is successful, you should see a version number after executing the last command. For this blog, we used Garak with version 0.10.3.1 and Python 3.13.10.
Configuring the REST interface
Garak offers multiple generators that allow you to start using the tool right away with various LLMs. Additionally, Garak’s generic REST generator allows interaction with any service offering a REST API, including model serving endpoints on Databricks.
To utilize the REST generator, we have to provide a json file that tells Garak how to query the endpoint and how to extract the response as a string from the result. Databricks’ REST API expects a POST request with a JSON payload structured as follows:
The response typically appears as:
The most important thing to keep in mind is that the response of the model is stored in the choices list under the keywords message and content.
Garak’s REST generator requires a JSON configuration specifying the request structure and how to parse the response. An example configuration is given by:
Firstly, we have to provide the URL of the endpoint and an authorization header containing our PAT token. The req_template_json_object
specifies the request body we saw above, where we can use $INPUT
to indicate that the input prompt shall be provided at this position. Finally, the response_json_field
specifies how the response string can be extracted from the response. In our case we have to choose the content
field of the message
entry in the first entry of the list stored in the choices
field of the response dictionary. We can express this as a JSONPath given by $.choices[0].message.content
.
Let’s put everything together in a Python script that stores the JSON file on our disk.
Here, we assumed that the URL of the hosted model and the PAT token for authorization have been stored in environment variables and set the request_timeout
to 300 seconds to accommodate longer processing times. Executing this script creates the rest_json.json
file we can use to start a Garak scan like this.
This command specifies the DAN attack class, a known jailbreak technique, for demonstration. The output should look like this.
We see that Garak loaded 15 attacks of the DAN type and starts to process them now. The AntiDAN probe comprises a single probe that is sent five times to the LLM (to account for the non-determinism of LLM responses) and we also observe that the jailbreak worked every time.
Collecting the results
Garak logs the scan results in a .jsonl file, whose path is provided in the output. Each entry in this file is a JSON object categorized by an entry_type
key:
- start_run setup, and init: Appear once at the beginning, detailing run parameters like start time and probe repetitions.
- completion: Appears at the end of the log and indicates that the run has finished successfully.
- attempt: Represents individual prompts sent to the model, including the prompt
(prompt)
, model responses(output)
, and detector outcomes(detector)
. - eval: Provides a summary for each scanner, including the total number of attempts and successes.
To evaluate the target’s susceptibility, we can focus on the eval entries to determine the relative success rate per attack class, for example. For a more detailed analysis, it is worth examining the attempt entries in the report JSON log to identify specific prompts that succeeded.
Try it yourself
We recommend that you explore the various probes available in Garak and incorporate scans into your CI/CD pipeline or MLSecOps process using this working example. A dashboard that tracks success rates across different attack classes can give you a complete picture of the model’s weaknesses and help you proactively monitor new model releases.
It’s important to acknowledge the existence of various other tools designed to assess LLM security. Garak offers an extensive static corpus of prompts, ideal for identifying potential security issues in a given LLM. Other tools, such as Microsoft’s PyRIT, Meta’s Purple Llama, and Giskard, provide additional flexibility, enabling evaluations tailored to specific scenarios. A common challenge among these tools is accurately detecting successful attacks; the presence of false positives often necessitates manual inspection of results.
If you are unsure about potential risks in your specific application and suitable risk mitigation instruments, the Databricks AI Security Framework can help you. It also provides mappings to additional leading industry AI risk frameworks and standards. Also see the Databricks Security and Trust Center on our approach to AI security.