Over the last few months, I’ve been exploring different ways to pair LLMs with structured and unstructured data workflows. The goal is simple: make data analysis faster, more intuitive, and accessible (beyond technology) – without sacrificing quality. Here’s a quick breakdown of the three core patterns I keep coming back to, along with where things are headed – GenBI Agentic Analytics.

Pattern 1: LLM-to-SQL

Turn plain-English questions into SQL logic

This pattern is all about translating natural language into SQL, but it’s more than just basic filters or keyword matches. When you give it structured data, the model can handle fairly complex logic—joins, filters, aggregations, even time windows—and actually write the kind of queries you’d expect from an experienced data analyst. I’ve used this to accelerate exploratory work, quickly test hypotheses, and even prototype dashboard logic before handing it off to engineering. What’s exciting is that you can also expose this capability through a chatbot interface, giving marketers and business users a way to ask questions like “Which campaigns are driving conversions under 10 minutes?” and get meaningful answers without ever touching SQL. It’s a powerful way to bridge the gap between data and decision-making, especially when paired with tools like Fabric, Synapse, or Databricks.


 Sample Sentiment Data:

| name          | email               | comment                          |
|---------------|---------------------|----------------------------------|
| Alice Johnson | alice@example.com   | Great product, loved it!         |
| Bob Smith     | bob@example.com     | Had some issues with the setup.  |
| Charlie Ray   | charlie@example.com | Customer service was excellent.  |
| Dana Lee      | dana@example.com    | Would recommend to others.       |

Sample Question: “Classify the sentiment of the customer comment as Positive, Neutral, or Negative”
LLM-Generated SQL Query:

SELECT
    name,
    email,
    CASE
        WHEN comment RLIKE '(?i)excellent|awesome|great|fantastic|love|good'
            THEN 'Positive'
        WHEN comment RLIKE '(?i)bad|terrible|horrible|hate|awful'
            THEN 'Negative'
        ELSE 'Neutral'
    END AS sentiment
FROM customer_comments
LIMIT 100;

Output:

| name          | email               | sentiment |
|---------------|---------------------|-----------|
| Alice Johnson | alice@example.com   | Positive  |
| Bob Smith     | bob@example.com     | Neutral   |
| Charlie Ray   | charlie@example.com | Positive  |
| Dana Lee      | dana@example.com    | Neutral   |


This approach focuses on generating queries, not analyzing data. It’s fast and works well for structured, rule-based logic – but it misses nuance since the LLM isn’t reasoning over the data itself.

Code Sample:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.utilities.spark_sql import SparkSQL
from langchain_openai import AzureChatOpenAI

print(f"\n{'='*100}")
print(f" Example 1 - Sentiment labeling (LLM to SQL)")
print(f"{'='*100}")

# ------------------------------------------------------
# 1. Create Sample Customer Comment Dataset
# ------------------------------------------------------
# Each record contains a name, email, and a comment for analysis.
schema = StructType([
    StructField("name", StringType(), True),
    StructField("email", StringType(), True),
    StructField("comment", StringType(), True)
])

data = [
    ("Alice Johnson", "alice@example.com", "Great product, loved it!"),
    ("Bob Smith", "bob@example.com", "Had some issues with the setup."),
    ("Charlie Ray", "charlie@example.com", "Customer service was excellent."),
    ("Dana Lee", "dana@example.com", "Would recommend to others."),
]

df = spark.createDataFrame(data, schema)

# Preview schema and data
print(" Original DataFrame Schema:")
df.printSchema()
print("\n Sample Data:")
df.show(truncate=False)


df.createOrReplaceTempView("customer_comments")  # Register for SQL querying


# ------------------------------------------------------
# 2. Initialize Azure OpenAI LLM
# ------------------------------------------------------
# Replace deployment_name and other config with your own Azure OpenAI setup.
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    openai_api_version="2024-12-01-preview",
    openai_api_key=AZURE_OPENAI_API_KEY,
    deployment_name="o1"
)
# ------------------------------------------------------
# 3. Define Prompt Template for Spark SQL Generation
# ------------------------------------------------------
# This tells the LLM how to convert natural language into Spark SQL.
prompt = PromptTemplate.from_template("""
You are a data expert working with Spark SQL.

Given the table schema:
{schema}

Write a Spark SQL query to answer the following question:
{question}

Use **Spark SQL syntax**.
- DO NOT use `TOP`, use `LIMIT`.
- DO NOT use square brackets around column or table names.

Return only the SQL query.

SQLQuery:
""")

# ------------------------------------------------------
# 4. Prepare LangChain Pipeline (Runnable)
# ------------------------------------------------------
# This connects user question → prompt → LLM → SQL output
# We pass the static schema description as part of the input.
schema_description = """
Table: customer_comments
Columns:
- name (STRING): Customer's full name
- email (STRING): Customer's email address
- comment (STRING): Free-form feedback text
"""

custom_chain = (
    {"schema": lambda _: schema_description, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# ------------------------------------------------------
# 5. Ask Question in Natural Language
# ------------------------------------------------------
# The LLM will return a valid SQL query against the Spark table.
question = "Classify the sentiment of the customer comment as Positive, Neutral, or Negative."

response = custom_chain.invoke(question)

# ------------------------------------------------------
# 6. Extract and Execute Generated SQL
# ------------------------------------------------------
generated_sql = response.strip().split("SQLQuery:")[-1].strip()
print("\n Generated SQL:\n", generated_sql)

# Run the query and display results
try:
    result = spark.sql(generated_sql)
    result.show(truncate=False)
except Exception as e:
    print(" Query failed:", e)

Pattern 2: LLM-on-Rows

Use LLMs to label or enrich each row like a human would


This pattern takes it a step further by having the LLM analyze each row individually, using full-text reasoning instead of relying on simple keywords. It’s closer to how a human would interpret sentiment – looking at context, tone, and nuance. I’ve found this approach especially useful when you need more accuracy and depth, like distinguishing between constructive feedback and actual complaints. It comes with a higher compute cost, but the boost in quality is worth it for use cases like customer feedback analysis, content moderation, or tagging nuanced behaviors across datasets.

Notice how Bob Smith’s feedback — previously labeled Neutral with regex – is now correctly flagged as Positive due to nuance the LLM can interpret.

Output:

| name          | comment                          | sentiment |
|---------------|----------------------------------|-----------|
| Alice Johnson | Great product, loved it!         | Positive  |
| Bob Smith     | Had some issues with the setup.  | Negative  |
| Charlie Ray   | Customer service was excellent.  | Positive  |
| Dana Lee      | Would recommend to others.       | Positive  |

Code:

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import AzureChatOpenAI

print(f"\n{'='*100}")
print(f" Example 2 - Sentiment labeling (LLM to SQL + row level LLM)")
print(f"{'='*100}")

# ------------------------------------------------------
# 1. Create Example Dataset with User Comments
# ------------------------------------------------------
schema = StructType([
    StructField("name", StringType(), True),
    StructField("email", StringType(), True),
    StructField("comment", StringType(), True)
])

data = [
    ("Alice Johnson", "alice@example.com", "Great product, loved it!"),
    ("Bob Smith", "bob@example.com", "Had some issues with the setup."),
    ("Charlie Ray", "charlie@example.com", "Customer service was excellent."),
    ("Dana Lee", "dana@example.com", "Would recommend to others."),
]

df = spark.createDataFrame(data, schema)

# Preview schema and data
print("Original DataFrame Schema:")
df.printSchema()
print("\nSample Data:")
df.show(truncate=False)

# ------------------------------------------------------
# 2. Initialize Azure OpenAI Client
# ------------------------------------------------------
llm = AzureChatOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    openai_api_version="2024-12-01-preview",
    openai_api_key=AZURE_OPENAI_API_KEY,
    deployment_name="o1"
)
# ------------------------------------------------------
# 3. Define Sentiment Classification Prompt
# ------------------------------------------------------
prompt = ChatPromptTemplate.from_template("""
You are a sentiment analysis assistant. 
Classify the sentiment of the customer comment as Positive, Neutral, or Negative.

Comment: "{comment}"

Sentiment:
""")

# ------------------------------------------------------
# 4. Build LangChain Pipeline for Inference
# ------------------------------------------------------
sentiment_chain = (
    {"comment": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# ------------------------------------------------------
# 5. Analyze Each Comment and Classify Sentiment
# ------------------------------------------------------
rows = df.select("name", "email", "comment").collect()
results = []

for row in rows:
    name = row["name"]
    email = row["email"]
    comment = row["comment"]
    try:
        sentiment = sentiment_chain.invoke(comment)
    except Exception as e:
        sentiment = f"Error: {e}"
    results.append((name, email, comment, sentiment))

# ------------------------------------------------------
# 6. Create a New DataFrame with Sentiment Results
# ------------------------------------------------------
sentiment_schema = StructType([
    StructField("name", StringType(), True),
    StructField("email", StringType(), True),
    StructField("comment", StringType(), True),
    StructField("sentiment", StringType(), True),
])

result_df = spark.createDataFrame(results, sentiment_schema)

# Preview result schema and content
print("\n Final Result Schema with Sentiment:")
result_df.printSchema()
print("\n Sentiment Classification Output:")
result_df.show(truncate=False)

 Pattern 3: Chained Retrieval-Augmented Generation (RAG)

Add context to the model by retrieving related examples or rules

This pattern combines semantic search with LLM reasoning, allowing the model to retrieve relevant examples and generate insights based on meaning, not just keywords. I use FAISS to embed data, and when a question is asked, the model pulls semantically related records to provide a context-rich response. For instance, asking, “Show me positive, negative, and neutral reviews,” triggers a search based on sentiment, not just matching terms. This is powerful when building chat interfaces or LLM agents that can act as analysts—retrieving, reasoning, and delivering answers with real data context.

Output:


Based on the provided comments:
• Positive reviews:
  – “Customer service was excellent.”  
  – “Would recommend to others.”  
  – “Great product, loved it!”

• Negative review:
  – “Had some issues with the setup.”

There are no clearly neutral comments in the set you provided.
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType

from langchain.vectorstores import FAISS
from langchain_openai import AzureOpenAIEmbeddings, AzureChatOpenAI
from langchain.chains import RetrievalQA
from langchain_core.documents import Document

print(f"\n{'='*100}")
print(f" Example 3 - Sentiment labeling, semantic retrieval (LLM + embedding)")
print(f"{'='*100}")

# ----------------------------------------
# 1. Define schema and create sample DataFrame
# ----------------------------------------
schema = StructType([
    StructField("name", StringType(), True),
    StructField("email", StringType(), True),
    StructField("comment", StringType(), True),
])

data = [
    ("Alice Johnson", "alice@example.com", "Great product, loved it!"),
    ("Bob Smith", "bob@example.com", "Had some issues with the setup."),
    ("Charlie Ray", "charlie@example.com", "Customer service was excellent."),
    ("Dana Lee", "dana@example.com", "Would recommend to others."),
]

df = spark.createDataFrame(data, schema)

# Preview schema and data
print("📄 Original DataFrame Schema:")
df.printSchema()
print("\n📊 Sample Data:")
df.show(truncate=False)

# ----------------------------------------
# 2. Convert comments to LangChain documents
# ----------------------------------------
rows = df.select("name", "email", "comment").collect()

documents = [
    Document(
        page_content=row["comment"],
        metadata={"name": row["name"], "email": row["email"]}
    )
    for row in rows
]

# ----------------------------------------
# 3. Initialize Azure OpenAI Embeddings and FAISS store
# converts each document into a vector that captures its semantic meaning.
# FAISS (Facebook AI Similarity Search) used for efficient similarity search and clustering of dense vectors
# ----------------------------------------
embedding_model = AzureOpenAIEmbeddings(
    azure_deployment="text-embedding-ada-002",  # Replace with your Azure deployment name
    openai_api_version="2023-05-15",             # Use the version your resource supports
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    openai_api_key=AZURE_OPENAI_API_KEY
)

vectorstore = FAISS.from_documents(documents, embedding=embedding_model)

# ----------------------------------------
# 4. Setup RAG-style QA chain with Azure OpenAI LLM
# ----------------------------------------

llm = AzureChatOpenAI(
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    openai_api_version="2024-12-01-preview",
    openai_api_key=AZURE_OPENAI_API_KEY,
    deployment_name="o1"
)
retriever = vectorstore.as_retriever()

rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)




# Debugging the retrieval process with the correct method
try:
    # Perform the retrieval using the query
    response = retriever.get_relevant_documents("Show me positive, negative and neutral customer reviews")
    
    # Check if we get any documents back
    if response:
        print(f"\n\nEmbedded Documents: {len(response)} documents:")
        
        # Print the content of the retrieved documents
        for doc in response:
            print(f"Content: {doc.page_content}")
            print(f"Metadata: {doc.metadata}")

    else:
        print("No documents retrieved.")
    
except Exception as e:
    # Catch any errors that occur during retrieval
    print(f"Error during retrieval: {str(e)}")
# ----------------------------------------
# 5. Ask sentiment-focused question
# ----------------------------------------
question = "Show me positive, negative and nutural customer reviews"

response = rag_chain({"query": question})

# ----------------------------------------
# 6. Output result
# ----------------------------------------



print(f"Query: {question}\n")
print("Answer:")
print(response["result"])



Final Thoughts

Agentic Analytics Workflows are the next evolution – combining logic, automation, and simple interfaces so business teams can do more with data. These workflows don’t just analyze; they act. They surface insights, trigger follow-ups, and connect across systems and platforms.

What makes it practical is that it’s accessible through everyday tools like dashboards or chatbots. No waiting on dev teams – just direct access to real-time, automated decisions. With the right AI foundation, this is how GenBI starts to deliver real business value.