Ask, Don't Query: How WorkWhile Built an Analyst-for-Everyone

January 29, 2026

By Karim Ezzedeen, Tan Nguyen & Alan Armen

At WorkWhile, data is at the core of every decision we make – from optimizing fill rate to tracking and monitoring KPIs. But as our datasets and teams grew, we found a recurring friction point: SQL knowledge wasn't evenly distributed across the organization.

Operations managers, customer success reps, and even product analysts often needed insights locked behind SQL queries only the data team could write. This created a constant Slack back-and-forth – small data questions piling up and analysts spending more time writing boilerplate queries than actually exploring insights and setting strategies forward.

To solve this, we built WorkWhileGPT – an intelligent, context-aware agent that translates plain English questions into validated SQL queries, executes them safely, and returns results and visualizations in seconds.

1. Problem, Motivation, and Objectives

Before WorkWhileGPT, our data workflow looked like this:

Someone asks: "Hey, how many shifts were completed in Atlanta last week?"
A data analyst translates it into SQL.
The query is reviewed, executed, shared, and often forgotten.

This model didn't scale. It centralized knowledge in a handful of people and produced dozens of ad-hoc queries a week with no long-term reuse.

We wanted to:

Democratize data access – empower everyone to query responsibly.
Preserve accuracy and context – ensure queries reflect current schema and business rules.
Educate by osmosis – show users the actual SQL, helping them learn over time.
Govern requests centrally – track every data pull in one searchable place.

2. Architecture Overview

WorkWhileGPT's design emphasizes modularity, transparency, and safety. It is a Slack-native agent – built to operate entirely within Slack as its primary interface – powered by specialized tools that handle reasoning and execution.

When a user mentions @WorkWhileGPT in Slack, the flow is:

Message captured → passed to the Agentic Layer (LLM core)
LLM fetches contextual metadata from the Semantic Layer
Generates a draft SQL query
Runs validation through the Representation Layer
Returns results + Query summary + Metabase visualization link

This architecture allows fine-grained logging at each step and makes the system auditable and debuggable.

WorkWhileGPT Architecture

3. User-GPT Interface Layer: Slack

Slack is WorkWhile's communication backbone, so we built WorkWhileGPT as a Slack bot with minimal permissions – it only reads messages where it's explicitly mentioned and replies in-thread.

This decision provides:

Contextual isolation: Every data conversation stays in its own Slack thread.
Governance: Centralized logging of all questions, SQL outputs, and responses.
Collaboration: Data engineers can review and refine queries transparently.

Over time, this single Slack channel has become our company-wide data request hub – searchable and traceable.

4. The Tool-Centric Framework: Three Layers of Intelligence

We organized WorkWhileGPT's components into three functional layers – each responsible for a distinct class of tasks: semantics, reasoning, and representation.

Note: While the Agentic Layer drives the orchestration at runtime, we describe the Semantic Layer first below for clarity – since it defines the contextual foundation that powers WorkWhileGPT's reasoning.

4.1 Semantic Layer – Context-Aware Understanding

The semantic layer grounds the model in WorkWhile's data reality. It provides metadata, business mappings, and historical patterns.

1. Database Schema Tool

At the heart of this layer lies our structured JSON schema dictionary. Each table is described with field-level granularity, data types, key relationships, and cardinalities.

Example excerpt:


{
  "tables": [
    {
      "name": "address",
      "description": "Provides address details on all parties using WorkWhile (workers, companies, shifts, etc.)",
      "columns": [
        {
          "name": "id",
          "description": "Unique address ID",
          "data_type": "integer",
          "primary_key": true
        },
        {
          "name": "city",
          "description": "City name",
          "data_type": "text"
        },
        {
          "name": "country",
          "description": "Country ISO code",
          "data_type": "text",
          "distinct_values": ["US", "CA"]
        },
        {
          "name": "created_at",
          "description": "Timestamp when address was created",
          "data_type": "timestamp"
        }
      ],
      "foreign_keys": [{"table": "company", "field": "company_id"}],
      "deprecated": false
    }
  ]
}

Each update to the production schema automatically refreshes this metadata. This ensures the model always references the most recent structure – and never joins or filters on deprecated fields.

2. Market / Company / Position Mappings

When users mention business entities – "in Chicago", "for ACME Co.", or "for Delivery Associates" – WorkWhileGPT performs real-time lookups against our database to retrieve canonical IDs and codes.

Example live mapping:


{
  "market_mapping": {"Atlanta": 12, "Chicago": 15},
  "company_mapping": {"ACME": 104, "HomeSupply": 118},
  "position_mapping": {"Delivery Associate": 101, "Shift Lead": 203}
}

By resolving these entities dynamically, WorkWhileGPT always works with up-to-date, canonical identifiers instead of relying on text strings from user input.

This design choice dramatically improves query robustness and accuracy. Rather than filtering or joining on potentially inconsistent names (e.g., "Acme", "ACME Co", "ACME INC"), WorkWhileGPT translates them into their stable numeric keys before constructing the SQL. That makes generated queries:

Resilient to typos, capitalization, and naming variations
Safer and faster, since joins on integer IDs are less error-prone and more performant
Future-proof, because ID mappings are refreshed directly from source-of-truth tables at runtime

In practice, this approach has eliminated an entire class of human-prone errors – stale joins, invalid filters, or mismatched entity names – while keeping queries tightly aligned with WorkWhile's master data model.

3. Historical Data Requests

To keep the model consistent with our analytical conventions, WorkWhileGPT maintains a curated set of prior data requests. Each entry includes the original prompt, summary, and validated SQL.

Example:


DataRequest(
    name="2025.01.23 - Number of shifts / spots filled in the last week",
    prompt="How many shifts did we fill last week?",
    summary="Counts filled shift spots within the last 7 days; filters out internal company_id (3,86,151), locationless, and cancelled shifts.",
    query="""SELECT count(assoc.*)
              FROM worker_shift_assoc assoc
              LEFT JOIN work_shift ws ON ws.id = assoc.shift_id
              WHERE ws.starts_at >= (now() - interval '7 days')
                AND ws.cancelled_at IS NULL
                AND ws.locationless IS FALSE
                AND ws.company_id NOT IN (3,86,151)
                AND assoc.status IN ('paid','completed','approved','admin_review','employer_review','employer_approved');"""
)

These act as few-shot examples, letting the LLM mimic validated query logic and adopt consistent filtering patterns.

4.2 Agentic Layer – The Conversational Brain

This layer orchestrates how WorkWhileGPT reasons, communicates, and applies WorkWhile's business rules.

Default System Prompt

The system prompt defines both capabilities and boundaries. It includes detailed behavioral instructions, such as:

Always validate SQL syntax before execution.
Never expose raw user or internal company data unless explicitly permitted.
Always filter out internal company IDs (3, 86, 151) by default.
Use correct field names and prefer indexed columns for filters.
Return outputs in the format: Summary → SQL → Metabase Link.

This deterministic scaffolding ensures predictable and compliant behavior – essential when the agent interfaces directly with production data.

4.3 Representation Layer – From SQL to Insight

Once the reasoning layer produces a candidate query, the representation layer validates, executes, and formats it into a consumable insight.

SQL Generation + Validation

WorkWhileGPT generates SQL via the LLM, then runs it through a validation module. This module checks:

Table and column existence
Join consistency
Limit clauses for safety
Syntax correctness

If validation fails, the agent regenerates until success. The result: every query that reaches a user is guaranteed to run safely.

Metabase Integration

Validated queries are persisted as Metabase "questions" using its REST API. This enables immediate visualization and sharing: users can click the returned link to view graphs, modify filters, export results, and even build or update dashboards. Typical Slack response:

Here's your answer:
Summary: We filled 412 shifts in the past 7 days (excluding internal companies).
SQL: [View query →]
Visualization: Open in Metabase

5. OpenRouter: Modular Model Access

Flexibility was a core design goal. We didn't want to hard-code ourselves to a single LLM vendor. Using OpenRouter, we can seamlessly switch among OpenAI, Anthropic, Google, xAI, and open-source models without rewriting code.

Here's the core of our integration:


api_key = config.OPENROUTER_API_KEY
provider = OpenRouterProvider(api_key=api_key)

model = OpenAIChatModel(
    model_name=model_name,
    provider=provider,
)

model_settings = OpenAIChatModelSettings(
    temperature=0.4,           # lower = more deterministic
    top_p=1,                   # use nucleus sampling
    max_tokens=30000,          # long queries supported
    frequency_penalty=0.8,     # reduce repetition
    presence_penalty=-1.2,     # encourage new topics
    stop=None,
    parallel_tool_calls=True,  # allows multi-tool reasoning
)

This configuration layer abstracts away model differences while giving us fine-grained control over generation behavior. We currently default to Gemini-2.5-pro, but routinely test ChatGPT 4-o and Gemini 1.5 Pro for cost–performance balance.

6. Evals: Quantifying Reliability

Since WorkWhileGPT directly interfaces with production databases, reliability isn't optional - it's enforced. To maintain accuracy and safety, we integrated an automated evaluation suite directly into our CI/CD pipeline.

Each deployment runs five randomized test questions selected from a curated evaluation set of 10-15 canonical queries. Every test entry in this set has:

A known gold-standard SQL query, prevalidated by our data team.
A deterministic expected answer, representing ground truth.

The evaluation process measures not only whether WorkWhileGPT's SQL compiles, but also whether the semantic meaning and resulting output align with the gold-standard answer.

Metrics Tracked

Validation pass rate: Checks for SQL syntax, safety clauses, and table integrity (no missing or deprecated columns).
Execution success: Confirms the query runs without exceptions or performance timeouts.
Semantic accuracy: Compares the actual result returned by WorkWhileGPT's generated SQL with the expected gold-standard output.
Latency: Measures total round-trip time from prompt ingestion to validated response.

Go/No-Go Logic

Each eval run yields a binary go/no-go outcome based on answer comparison. We classify a test as a pass (go) if:

The SQL query validates and executes successfully, and
The result matches the expected output within an acceptable margin of error.

All queries in the eval set are designed to return a single-value answer (e.g., an aggregate count, ratio, or ID). This enables precise, programmatic comparison between the test output and gold-standard answer.

For Text-Based Answers

Questions like "Who is our biggest customer?" are expected to yield an exact text match (e.g., "ACME"). For these, there is zero tolerance - any mismatch is considered a failure.

For Numeric Answers

For quantitative questions - e.g., "How many shifts were filled last week?" - we allow a ±10% margin of error. This small tolerance accounts for the transient nature of operational data, where counts can fluctuate minute-to-minute (due to ongoing shifts, late updates, or delayed job completions). By comparing results within this range, we mitigate the risk of marking a valid WorkWhileGPT response as incorrect simply because the database state changed between the time the gold query and test query executed.

This approach strikes a balance between strict validation and practical realism, ensuring that WorkWhileGPT is accurate where it matters and resilient where slight temporal drift is expected.

Case ID	Inputs	Metadata	Expected Output	Output	Scores	Metrics	Duration (s)
1	Avg Fill Rate Company X for the past 30 days	Tags: Fill, Company	75.58%	53.32%	Syntax: 1.00, Accuracy: 0.00	Input tokens: 94,672, Output tokens: 14,740, Reasoning tokens: 10,302, Requests: 6	87.6
2	Number of workers blocked by company X	Tags: Worker, Company	5,010	5,010.0	Syntax: 1.00, Accuracy: 1.00	Input tokens: 86,802, Output tokens: 6,262, Reasoning tokens: 5,764, Requests: 5	45.7
3	Customer with highest revenue from W2 Shifts	Tags: Revenue, Company	X	X	Syntax: 1.00, Accuracy: 1.00	Input tokens: 161,362, Output tokens: 10,676, Reasoning tokens: 8,266, Requests: 4	53.4

Regression Baseline

All eval results are logged and compared to previous runs, forming a longitudinal regression baseline. A new deployment is approved only if:

No new semantic or validation failures are introduced, and
The average accuracy remains within historical variance bounds.

Over time, this has evolved into a quantitative health check for WorkWhileGPT's reasoning layer, ensuring that each release is at least as trustworthy – and often more performant – than the last.

7. Results and Impact

Since rollout:

Over 90% of routine data questions are now self-served.
Even for requests needing manual analysis, WorkWhileGPT reduces the time it takes to prepare initial queries for the task by 55% on average.
SQL literacy across teams has risen – many users now copy and modify the generated queries to explore further.
Centralized Slack logs have become a living catalog of business metrics and definitions.

Unexpectedly, WorkWhileGPT also became an onboarding ally, serving as a knowledgebase for new hires to ask questions such as:

"What does the worker_shift_assoc table represent?"
"How do I join shifts to locations?"

The agent answers with schema context and example joins, effectively serving as an interactive data guide.

8. Broader Vision

WorkWhileGPT started as an internal unlock: give every team member at WorkWhile a fast, reliable way to ask questions about our data without waiting in line for SQL experts. However, internally, we've always viewed this product as step one of a much larger shift - giving every person in our ecosystem their own data analyst.

For WorkWhile HQ employees, that means turning Slack into a self-serve analytics interface. For our flex workers, the same idea becomes more personal - and more empowering. Imagine a worker being able to ask:

"How did my hourly earnings trend over the last 6 weeks?"
"Which shift position is best for my schedule for the next 3 days?"
"If I want to earn $X next month, how many shifts like the ones I usually take would I need?"

This is not about turning workers into analysts, but rather giving workers agency through clarity. A worker-facing "analyst" can summarize patterns, quantify tradeoffs, and answer "what changed?" in plain language - with the same emphasis on transparency and trust that shaped WorkWhileGPT internally.

9. Conclusion and Next Steps

WorkWhileGPT has evolved from a time-saving experiment into a foundational part of WorkWhile's data ecosystem.

It not only democratized SQL but also unlocked new ways to learn, document, and explore data. What began as a tool to reduce analyst load now accelerates onboarding, enforces governance, and cultivates data fluency company-wide.

Next on our roadmap:

Lightweight RAG layer for dynamic schema updates.
Query optimization learning, where the agent learns from historical runtime metrics.
Embedding-based semantic memory, allowing users to reference "similar past questions."
Richer Visualizations, enabling WorkWhileGPT to go beyond static tables and generate adequate visual representations of its output data.
File import and export workflows to support ad-hoc data enrichment and manipulation.

In short, WorkWhileGPT has turned Slack into WorkWhile's conversational data warehouse - a bridge between human intuition and structured insight.

WorkWhile Careers