Vibe Coding in Microsoft Fabric

Can ChatGPT actually help you build a complete data platform in Microsoft Fabric if you pretend to know absolutely nothing? That’s exactly what I set out to test. I gave myself a simple scenario: I’m a data engineer with a SQL database that needs to be transformed into a proper medallion architecture with bronze, silver, and gold layers—but I’m going to rely entirely on ChatGPT to guide me through it.

Spoiler alert: it actually worked way better than I expected. But there are some important lessons here about vibe coding versus proper engineering. Let’s dive in.

The Starting Point: A SQL Database in Fabric

I started by creating a SQL database in Microsoft Fabric and loading it with sample data. If you go to your Fabric workspace and create a new SQL database, you’ll find an option to fill it with sample data. I used the Worldwide Importers dataset—you know, the one with bikes, frames, and all that good stuff.

The database had about 10 tables with customer information, products, addresses, and sales orders. Pretty standard stuff for a demo. Now the question was: could ChatGPT guide me from this raw SQL database all the way to a dimensional model ready for Power BI?

Step 1: Building the Bronze Layer with Pipelines

I asked ChatGPT: “I have no clue what to do. Please help me create a solution.” And to its credit, it came back with a pretty reasonable plan: create a lakehouse, build a pipeline, use copy data activities to ingest the data.

Now mind you, I would personally use notebooks with Python or Spark for this kind of work. But ChatGPT suggested pipelines, and for someone who doesn’t know Fabric, that’s actually not a bad recommendation. It’s low-code and fairly intuitive.

Creating the Bronze Lakehouse

Following ChatGPT’s instructions, I created a new lakehouse called “bronze lakehouse” and then built a data pipeline. The pipeline used a copy data activity to pull tables from the SQL database into the lakehouse.

Here’s where things got a bit tricky. ChatGPT told me to use a “destination folder” in the lakehouse, but that option wasn’t showing up in the interface. After some back and forth (and a screenshot sent to ChatGPT), it corrected itself and told me to create a new table in a schema called “bronze” instead.

Pipeline configuration showing copy data activity
Setting up the copy data activity to load the first table

After running the pipeline, I refreshed my lakehouse and there it was—a bronze schema with a product table. Success! But now I had 9 more tables to go.

The Challenge: Loading All 10 Tables

When I asked ChatGPT how to get all the other tables, it initially suggested just copying the copy data activity multiple times. No. Absolutely not. This is where I had to push back, because that violates one of my core principles: DRY (Don’t Repeat Yourself).

ChatGPT then suggested parameterizing the pipeline, which is much better. We created a parameter called “table_list” (though it kept changing the parameter name, which was annoying) and set up a ForEach loop to iterate through all the tables.

[
  "Product",
  "Customer", 
  "Address",
  "SalesOrderHeader",
  "SalesOrderDetail"
]

I had to work through a few errors here. The ForEach loop initially wasn’t configured correctly, and I made a typo that caused the source to look for “@item” instead of the actual table name. But after fixing those issues, the pipeline ran successfully and loaded all 10 tables into the bronze schema.

Was this the best solution? No. Ideally, you’d query the SQL database metadata (using sys.tables or similar) to automatically discover which tables exist. But for vibe coding? This was actually pretty decent.

Step 2: The Silver Layer (Sort Of)

ChatGPT suggested creating a separate lakehouse for silver, but I pushed back. For a small project like this, I’d rather use schema separation within the same lakehouse. So I created a “silver” schema and a “gold” schema in the same lakehouse.

Here’s where ChatGPT went a bit off the rails. It started suggesting data cleaning operations—removing duplicates, standardizing formats, and so on. But when I looked at the actual code it generated, it was basically just copying data from bronze to silver without any real transformations.

For this dataset (which was already clean sample data), there wasn’t much cleaning needed anyway. So I decided to skip most of the silver layer work and jump straight to creating the gold dimensional model. That’s where things got interesting.

Step 3: The Gold Layer—Where ChatGPT Actually Impressed Me

I asked ChatGPT to help me create a dimensional model with facts and dimensions. And honestly? It nailed it.

ChatGPT correctly identified that I needed:

  • A fact table (FactSales) with one row per sales order line item
  • Dimension tables for Customer, Product, Address, and Date
  • Proper surrogate keys and relationships

It even understood that the fact table should include things like OrderDateKey, DueDateKey, and ShipDateKey (though the actual field names in the source were slightly different).

Getting the Schema Information

Before ChatGPT could generate the dimensional modeling code, it needed to know what fields existed in my bronze tables. So I created a notebook to list all the schemas. ChatGPT provided code that looped through the tables and printed out their column structures.

from pyspark.sql import SparkSession

# Loop through tables and print schema
tables = spark.catalog.listTables("bronze")
for table in tables:
    df = spark.table(f"bronze.{table.name}")
    print(f"\n{table.name}:")
    df.printSchema()

I copied all that schema information and fed it back to ChatGPT, asking it to generate a notebook that would create my dimensional model.

The Dimensional Modeling Notebook

ChatGPT generated a complete PySpark notebook that created all my dimension tables and the fact table. I’m not going to lie—the code wasn’t pretty. It was repetitive, definitely not metadata-driven, and violated every DRY principle I believe in.

But here’s the thing: I just copy-pasted it and ran it. And it worked. In less than a minute, I had a complete dimensional model in my gold schema.

# Example of what ChatGPT generated (simplified)
from pyspark.sql.functions import *

# Load bronze tables
df_product = spark.table("bronze.Product")
df_customer = spark.table("bronze.Customer")

# Create DimProduct
dim_product = df_product.select(
    col("ProductID").alias("ProductKey"),
    col("Name").alias("ProductName"),
    col("Color"),
    col("Size")
).dropDuplicates()

# Write to gold
dim_product.write.format("delta").mode("overwrite").saveAsTable("gold.DimProduct")

The notebook created dimensions for customers, products, and addresses. It also attempted a date dimension, though that only captured three dates from the actual data (not ideal for Power BI, which needs a proper calendar table).

Most impressively, it created a fact table that joined the sales order headers and details, bringing in the necessary foreign keys. ChatGPT genuinely understood dimensional modeling concepts and applied them correctly.

Gold schema showing dimensional tables
The completed gold layer with fact and dimension tables

The Reality Check: Should You Actually Do This?

Let’s be clear about something: this experiment worked, but it’s not how you should build production data platforms.

The code ChatGPT generated has major issues:

  • It’s extremely repetitive (violates DRY principles)
  • It’s not metadata-driven or configurable
  • It’s hard to maintain and scale
  • Some solutions (like the date dimension) are incomplete

In a real project, you’d want to create reusable patterns, use configuration files, implement proper error handling, and build incremental load processes. You definitely wouldn’t copy-paste the same transformation logic for every single table.

But—and this is important—if you’re just learning Microsoft Fabric, or if you need to quickly prototype something, or if you’re testing an idea, ChatGPT can actually be incredibly helpful. It got me from zero to a working dimensional model in about half an hour. A junior data engineer might take a full day to do the same thing manually.

What I Learned About Vibe Coding with ChatGPT

Here’s what surprised me during this experiment:

ChatGPT actually understands dimensional modeling. It correctly identified which tables should become dimensions, how to structure a fact table, and what fields to include. That’s legitimately impressive.

It can recover from errors. When I sent screenshots of error messages, ChatGPT usually figured out what went wrong and suggested fixes. Though it did have some connectivity issues during my recording, which was frustrating.

The UI changes too fast. ChatGPT’s knowledge of specific menu locations and interface elements was sometimes outdated. Microsoft Fabric’s UI changes every week, so this isn’t surprising.

You still need to understand the fundamentals. While ChatGPT generated working code, you need to know enough to recognize when something doesn’t make sense. If you blindly trust everything it suggests, you’ll end up with a brittle, hard-to-maintain system.

My Take: Learn the Proper Way, But Don’t Ignore the Tools

I don’t advocate for vibe coding as your primary development approach. I really believe you should learn Microsoft Fabric properly, understand data engineering principles, and build maintainable solutions. That’s why I create these videos every week—to teach you the right way to do things.

But ChatGPT is a powerful tool for learning and prototyping. It can help you understand concepts, generate starter code, and get unstuck when you’re not sure what to do next. Just don’t ship that generated code to production without refactoring it properly.

Think of ChatGPT like training wheels on a bike. They’re useful when you’re learning, but eventually you need to take them off and ride properly. And please, for the love of all that is holy, always implement DRY principles and make your code metadata-driven before deploying anything to production 🙂

Have you tried using ChatGPT or other AI tools to help build your Fabric solutions? What worked well and what didn’t? Let me know in the comments below!

Leave a Comment