Databricks Assistant is a context-aware AI assistant natively available in the Databricks Data Intelligence Platform. It is designed to simplify SQL and data analysis by helping generate SQL queries, explain complex code, and automatically fix errors.
In this blog, we follow up on Databricks Assistant Tips & Tricks for Data Engineers, shifting our focus to SQL and data analysts. We’ll explore how the Assistant reinforces best practices, improves performance, and helps transform semi-structured data into usable formats. Stay tuned for future posts covering data scientists and more, as we explore how Databricks Assistant is democratizing data by simplifying complex workflows and making advanced analytics more accessible to everyone.
Best Practices
Below are a few best practices to help analysts use the Assistant more effectively, ensuring more accurate responses, smoother iterations, and improved efficiency.
- Use @ mention table names: Be as specific as possible in your prompts and @ mention tables to ensure the Assistant references the correct catalog and schema. This is especially helpful in workspaces with multiple schemas or catalogs containing similarly named tables.
- Add row-level examples in UC comments: As of today, the Assistant only has access to metadata, not actual row-level values. By including representative row-level examples in Unity Catalog comments, analysts can provide the Assistant with additional context, leading to more precise suggestions for tasks like generating regex patterns or parsing JSON structures.
- Keep table descriptions up to date: Regularly refining table descriptions in Unity Catalog enhances the Assistant’s understanding of your data model.
- Use Cmd+I for quick iteration: The inline Assistant is ideal for making targeted adjustments without unnecessary rewrites. Pressing Cmd + I at the end of a cell ensures the Assistant only modifies the code below the cursor, unless specified otherwise. This allows users to iterate quickly on prompts, refine responses, and adjust suggestions without disrupting the rest of their code. Additionally, users can highlight specific lines to fine-tune the Assistant’s focus.
- Get examples of advanced functions: When documentation provides only basic use cases, the Assistant can offer more tailored examples based on your specific needs. For instance, if you’re working with batch streaming struct aggregation in DLT, you can ask the Assistant for a more detailed implementation, including guidance on applying it to your data, adjusting parameters, and handling edge cases to ensure it works in your workflow.
Common Use Cases
With these best practices in mind, let’s take a closer look at some of the specific challenges SQL and data analysts face daily. From query optimization and handling semi-structured data to generating SQL commands from scratch, the Databricks Assistant simplifies SQL workflows, making data analysis less complex and more efficient.
Converting SQL Dialects
SQL dialects vary across platforms, with differences in functions, syntax, and even core concepts like DDL statements and window functions. Analysts working across multiple environments—such as migrating from Hive to Databricks SQL or translating queries between Postgres, BigQuery, and Unity Catalog—often spend time adapting queries manually.
For example, let’s take a look at how the Assistant can generate a Hive DDL into Databricks-compatible SQL. The original query will result in errors because SORTED_BY
doesn’t exist in DBSQL. As we can see here the Assistant seamlessly replaced the broken line and replaced it with USING DELTA,
ensuring the table is created with Delta Lake, which offers optimized storage and indexing. This allows analysts to migrate Hive queries without manual trial and error.
Refactoring Queries
Long, nested SQL queries can be difficult to read, debug, and maintain—especially when they involve deeply nested subqueries or complex CASE WHEN
logic. Luckily with Databricks Assistant, analysts can easily refactor these queries into CTEs to improve readability. Let’s take a look at an example where the Assistant converts a deeply nested query into a more structured format using CTEs.
Writing SQL window functions
SQL window functions are traditionally used for ranking, aggregation, and calculating running totals without collapsing rows, but they can be tricky to use correctly. Analysts often struggle with the PARTITION BY and ORDER BY clauses, choosing the right ranking function (RANK, DENSE_RANK, ROW_NUMBER), or implementing cumulative and moving averages efficiently.
The Databricks Assistant helps by generating the correct syntax, explaining function behavior, and suggesting performance optimizations. Let’s see an example where the Assistant calculates a rolling 7-day fare total using a window function.
Converting JSON into Structured Tables
Analysts often work with semi-structured data like JSON, which needs to be transformed into structured tables for efficient querying. Manually extracting fields, defining schemas, and handling nested JSON objects can be time-consuming and error-prone. Since the Databricks Assistant does not have direct access to raw data, adding Unity Catalog metadata, such as table descriptions or column comments, can help improve the accuracy of its suggestions.
In this example, there is a column containing genre data stored as JSON, with both genre IDs and names embedded. Using the Databricks Assistant, you can quickly flatten this column, extracting individual fields into separate columns for easier analysis.
To ensure accurate results, you should first check the JSON structure in Catalog Explorer and provide a sample format that the Assistant could reference in a column comment. This extra step helped the Assistant generate a more tailored, accurate response.
A similar approach can be used when attempting to generate regex expressions or complex SQL transformations. By first providing a clear example of the expected input format—whether it’s a sample JSON structure, text pattern, or SQL schema—analysts can guide the Assistant to produce more accurate and relevant suggestions.
Optimizing SQL Queries
In last year’s Databricks Assistant Year in Review blog, we highlighted the introduction of /optimize, which helps refine SQL queries by identifying inefficiencies like missing partition filters, high-cost joins, and redundant operations. By proactively suggesting improvements before running a query, /optimize ensures that users minimize unnecessary computation and improve performance upfront.
Now, we’re expanding on that with /analyze—a feature that examines query performance after execution, analyzing run statistics, detecting bottlenecks, and offering intelligent recommendations.
In the example below, the Assistant analyzes the amount of data being read and suggests an optimal partitioning strategy to improve performance.
Try Databricks Assistant Today!
Use the Databricks Assistant today to describe your task in natural language and let the Assistant generate SQL queries, explain complex code and automatically fix errors.
Also, check out our latest tutorial on EDA in Databricks Notebooks, where we demonstrate how the Assistant can streamline data cleaning, filtering, and exploration.