What are the three levels of data modeling?

Conceptual, logical, and physical. A conceptual model captures the business entities and how they relate, with no database detail. A logical model adds attributes, primary and foreign keys, and structure, but stays independent of any specific system. A physical model translates that into the actual tables, column types, indexes, and constraints of a particular database such as PostgreSQL, Snowflake, or BigQuery.

What is the difference between conceptual, logical, and physical data models?

They differ in level of detail and audience. The conceptual model is for business stakeholders — it names entities and relationships in plain terms. The logical model is for data designers — it specifies attributes, keys, and normalization without committing to a platform. The physical model is for engineers — it is the implementable schema with types, indexes, and partitioning tuned to one database engine.

What is the difference between entity-relationship and dimensional modeling?

Entity-relationship (ER) modeling normalizes data into entities and relationships to avoid duplication, which suits transactional systems that write a lot. Dimensional modeling organizes data into fact tables (events and measurements) surrounded by dimension tables (the context you slice by), which suits analytics and reporting because it makes aggregation fast and queries intuitive. Most analytics stacks normalize in source systems and use dimensional or star-schema models in the warehouse.

How does data modeling relate to a semantic layer?

A semantic layer is where analytics data modeling lives as the shared source of truth. Instead of each BI tool, app, or query re-encoding what 'revenue' means, you model metrics, dimensions, join paths, and access rules once in the semantic layer, and every consumer requests them from there. So data modeling in 2026 isn't just designing tables — it's defining the governed business model that a semantic layer serves to every downstream tool and AI agent.

Is data modeling the same as database design?

Database design is one application of data modeling — turning a logical model into a physical schema for a specific database. But data modeling is broader: it includes the conceptual and logical work upstream of any database, and in analytics it extends downstream into modeling metrics and dimensions in transformation tools and the semantic layer. A model can outlive any single database it's implemented in.

Does data modeling still matter for AI and natural-language analytics?

More than ever. Pointed at raw tables, a large language model has to re-derive your join paths and metric definitions on every prompt, so the same question can return different numbers. A modeled semantic layer lets the agent select from certified metrics by name instead of inventing SQL, which is what makes its answers consistent, governed, and explainable. As Brex summarized it, the semantic layer is what makes the AI useful.

What tools are used for data modeling?

It depends on the layer. Conceptual and logical modeling use ER and diagramming tools. Physical modeling lives in the database and migration tooling. In the analytics stack, transformation tools like dbt model and shape data in the warehouse, while a semantic layer such as Cube models the metrics, dimensions, joins, and access rules on top — Cube reads dbt models, so you model in dbt and serve through Cube.

What is dimensional modeling used for?

Analytics and data warehousing. Dimensional modeling arranges data into fact tables — the numeric events you measure, like orders or page views — surrounded by dimension tables that hold the descriptive context you group and filter by, like date, product, or region. This star-schema shape makes aggregations fast and the model easy to reason about, which is why it underpins most reporting warehouses and, by extension, the metrics defined in a semantic layer.

What are common data modeling mistakes?

Treating the model as a one-time artifact instead of something that evolves with the business; over- or under-normalizing for the workload; and — the costliest one in analytics — letting every tool re-model the same metrics, so 'active users' means three different things in three places. The fix for the last one is to model metrics once in a semantic layer and have every BI tool, embedded app, and AI agent read from it.

What Is Data Modeling?

Last updated June 14, 2026

Key takeaways

Data modeling is the process of defining your data's entities, attributes, relationships, and — in analytics — the metrics and join paths between them, so data is stored, related, and queried consistently.
It runs across three levels: conceptual (the business entities and how they relate), logical (attributes, keys, and structure independent of any database), and physical (the actual tables, types, and indexes in a specific system).
Two traditions dominate: entity-relationship modeling for normalized transactional databases, and dimensional modeling (facts and dimensions) for analytics and warehousing — most stacks use both at different layers.
In the analytics stack, modeling happens twice: transformation tools like dbt model and shape the data in the warehouse, and a semantic layer models the metrics, dimensions, joins, and access rules on top of it.
The semantic layer is where data modeling becomes the source of truth for every consumer — BI, embedded apps, spreadsheets, and AI agents — defining each metric once so the same question returns the same number everywhere.
Cube is the agentic analytics platform built on a semantic layer; its open-source core, Cube Core (Apache 2.0), is where you model metrics, dimensions, joins, and access rules as code — the layer that makes AI answers trustworthy. As Brex put it, the semantic layer is what makes the AI useful.

"Data modeling" is one of those foundational terms that everyone in data uses and few stop to define precisely. This is a plain-language explainer for data leaders and practitioners: what data modeling actually is, the levels and traditions it spans, where it happens in a modern analytics stack, and why — in the AI era — modeling decisions made once in a semantic layer are what decide whether you can trust an agent's answer.

TL;DR

Data modeling is the process of defining how your data is structured, related, and accessed — the entities your business cares about, their attributes, the relationships between them, and, in analytics, the metrics and join paths that turn raw tables into business-ready numbers. It runs across three levels (conceptual, logical, physical) and two traditions (entity-relationship for transactional systems, dimensional for analytics). In a modern stack it happens twice: transformation tools like dbt model the data in the warehouse, and a semantic layer models the metrics, dimensions, and access rules on top — defining each metric once so every BI tool, embedded app, spreadsheet, and AI agent returns the same number. Cube is the agentic analytics platform built on a semantic layer; its open-source core, Cube Core, is where that model lives.

A working definition

Data modeling is the process of defining how data is organized, related, and accessed. You identify the entities a business cares about — customers, orders, products, sessions — describe their attributes, specify the relationships between them, and, for analytics, define the metrics and join paths that turn those raw entities into numbers people can reason about. The output is a blueprint: something the rest of the stack designs, builds, and queries against.

The payoff is consistency. A good model means data is stored once, related correctly, and interpreted the same way wherever it's used — in the application database, the warehouse, the BI tool, and the spreadsheet. The alternative is the familiar failure mode: every system re-encodes the same business logic in its own way, and "active customer" ends up meaning one thing in the product, another in finance, and a third in a one-off query.

Data modeling is defined by what it produces, not where it lives. A model is not a database — it's the specification a database (or a semantic layer, or a report) is built from. The same conceptual model can be implemented in PostgreSQL today and Snowflake tomorrow without the underlying business meaning changing.

The three levels of data modeling

Data modeling is usually described across three levels of abstraction, moving from business meaning to implementation detail:

Conceptual model. The highest level. It names the core entities and how they relate — "a customer places many orders; an order contains many line items" — in terms a business stakeholder can read. No attributes, keys, or database specifics yet.
Logical model. Adds detail without committing to a platform: attributes for each entity, primary and foreign keys, cardinality, and normalization decisions. It's precise enough to review for correctness but still independent of any particular database.
Physical model. Translates the logical model into a concrete schema for a specific system: real tables, column types, indexes, partitioning, and constraints, tuned to the engine you're running — PostgreSQL, Snowflake, BigQuery, Redshift, or Databricks.

The three aren't rival approaches; they're stages of the same work, each with a different audience. Conceptual is for the business, logical for data designers, physical for the engineers who implement it. Skipping the upper levels is how teams end up with schemas that are technically valid but don't match how the business actually thinks.

Two traditions: entity-relationship and dimensional modeling

Underneath those levels sit two long-standing modeling traditions, each optimized for a different workload.

Entity-relationship (ER) modeling, formalized by Peter Chen in 1976, represents data as entities with attributes and the relationships between them. Its instinct is normalization — store each fact once, reference it by key — which minimizes duplication and keeps writes consistent. That makes ER modeling the natural fit for transactional (OLTP) systems: the order-entry database behind your app.

Dimensional modeling, associated with Ralph Kimball, organizes data for analysis instead of transactions. It splits data into fact tables — the numeric events you measure, like orders or page views — surrounded by dimension tables that hold the descriptive context you slice by, like date, product, or region. This star-schema shape denormalizes deliberately so that aggregations are fast and the model is easy to reason about, which is why it underpins most reporting warehouses.

	Entity-relationship modeling	Dimensional modeling
Optimized for	Transactions (OLTP), consistent writes	Analytics and reporting, fast reads
Shape	Normalized entities and relationships	Fact tables surrounded by dimensions (star schema)
Goal	Avoid duplication, protect integrity	Make aggregation fast and queries intuitive
Typical home	Application and source databases	Data warehouse and analytics models
Trade-off	Slower, multi-join analytical queries	Some redundancy by design

These aren't mutually exclusive. A typical stack normalizes in the source systems and then builds dimensional models in the warehouse for analytics — the same underlying facts, modeled differently for different jobs.

How data modeling works in practice

Designing a model follows a recognizable arc, regardless of tradition:

Requirement analysis. Understand what questions the data has to answer and what the business actually means by its key terms.
Conceptual modeling. Sketch the main entities and relationships at a level a stakeholder can confirm.
Logical modeling. Add attributes, keys, and structure; decide how far to normalize.
Physical modeling. Implement the schema in a specific database, with the types, indexes, and constraints that engine needs.

In an analytics stack, though, modeling doesn't stop at the warehouse schema — it happens again, one layer up. Transformation tools like dbt model and shape the raw data in the warehouse into clean, tested tables. Then a semantic layer models the metrics and dimensions on top of those tables: what "revenue" means, which dimensions you slice it by, the join paths that connect entities, and the access rules that govern who sees what. The first kind of modeling produces trustworthy tables; the second produces trustworthy answers.

Where data modeling meets the semantic layer

This second modeling step is the one that's changed the most. For years, metric definitions lived wherever they were convenient — inside a BI tool's reports, in handwritten SQL, in a spreadsheet formula. Each was a small, local data model, and none of them agreed with the others. That's the "three different numbers for active users in one meeting" problem, and it's a modeling problem, not a tooling problem.

A semantic layer fixes it by making the metric model the shared source of truth. You model your metrics, dimensions, join paths, and access rules once, as code, and every consumer — a BI dashboard, an analytics feature embedded in your product, a spreadsheet, or an AI agent — requests them from that single model. Modeling stops being something each tool does privately and becomes a governed asset the whole stack reads from. Because it's defined as code, the model gets the same rigor as the rest of your engineering: version control, code review, CI/CD, and isolated environments.

So in 2026, "data modeling" for analytics increasingly means modeling the semantic layer. The entity and dimensional thinking is still there underneath — facts, dimensions, keys, join paths — but the deliverable is a governed business model that serves many tools, not a schema for one database.

Why data modeling matters more in the AI era

When the consumer asking a question is an AI agent answering on behalf of a person, the quality of your data model stops being a back-office concern and becomes load-bearing.

Here's the structural reason. Point a large language model at raw, unmodeled tables and it has to re-derive your business on every prompt. A table named orders doesn't encode whether revenue is gross or net, includes tax, or excludes refunds; the join graph has fan-outs and three tables that all look like "the customer"; and nothing in a SELECT distinguishes a correct query from one that leaks another tenant's data. So "what was revenue last quarter?" can return three different numbers across three sessions. No amount of prompt engineering fixes that — it's a missing model.

A modeled semantic layer is that model. The agent selects from certified metrics by name instead of authoring SQL from scratch, so answers are consistent, governed, and explainable — you can see which named metrics produced a number rather than auditing a wall of generated SQL. This is the foundation of agentic analytics: AI-native BI where agents do the analytical work over a governed model. It's also not theoretical — Brex evaluated approaches for grounding AI on their data, chose Cube, and built Brex Spaces, an embedded AI financial analyst, on top of it. Their one-line summary is the cleanest case for modeling done right: the semantic layer is what makes the AI useful.

Common misconceptions about data modeling

A few myths are worth retiring:

"It's a one-time, up-front artifact." Models should evolve as the business does. A metric definition or entity relationship that was right last year may not be right now; treating the model as living, version-controlled code is how you keep it accurate.
"There's one correct technique." There isn't. Normalized ER models suit transactional systems; dimensional models suit analytics. Most stacks use both at different layers, and the right choice depends on the workload.
"It's only for big projects." Even a small system benefits from a clear model — and the moment more than one tool reads the same metrics, defining them once becomes the difference between consistent numbers and a debugging session.

Where Cube fits

Cube is the agentic analytics platform built on a semantic layer. Its open-source foundation, Cube Core (Apache 2.0), is where the analytics data modeling happens: you model metrics, dimensions, joins, and access rules once, as code, and serve them over SQL, REST, GraphQL, an MCP server for AI agents, and DAX/MDX for spreadsheet tools. Row-level, multi-tenant security is applied at compile time, pre-aggregation caching keeps queries fast, and the model is SQL-first and extensible at query time — governed definitions stay fixed while tools and agents build ad-hoc calculations on top. On top of that foundation, the platform adds AI agent interfaces, workbooks, dashboards, and embedded surfaces, so the same model powers both internal business intelligence for your teams and embedded analytics for your customers. That's why 400+ companies build on Cube across both use cases.

Two clarifications that come up immediately. dbt is a partner, not something the semantic layer replaces: dbt models and transforms the data; the semantic layer models the metrics and serves them — model in dbt, serve via Cube, which reads dbt models. (Only the dbt Semantic Layer, MetricFlow, is an alternative — and to Cube Core, not the platform.) And the semantic layer does not replace your warehouse: it sits on top of Snowflake, BigQuery, Redshift, or Databricks, which stay your storage and compute.

Our verdict

Data modeling is the process of defining your data's entities, attributes, relationships, metrics, and join paths so data is stored consistently and interpreted the same way everywhere. It spans three levels — conceptual, logical, physical — and two traditions — entity-relationship for transactional systems, dimensional for analytics. In a modern stack the highest-leverage modeling now lives in the semantic layer, where metrics are defined once and served to BI, embedded apps, spreadsheets, and AI agents from a single governed model. That's what turns an AI that demos well into one you can trust in production — and it's where Cube, built on the open-source Cube Core, fits.

Methodology

This explainer describes data modeling as the term is used in 2026, weighted toward the parts that matter when many tools — and increasingly AI agents — consume the same data: the conceptual/logical/ physical levels, the entity-relationship and dimensional traditions, and the shift of analytics modeling into a governed semantic layer that defines metrics once. As the publisher, Cube builds a semantic layer and an agentic analytics platform on top of it, so we have an obvious interest here; we've tried to define the concept neutrally and be explicit about where Cube fits versus the broader category. Product-specific capabilities move quickly — treat them as version-dependent and confirm against current documentation.

Frequently asked questions

What is data modeling?: Data modeling is the process of defining how data is structured, related, and accessed. You identify the entities your business cares about (customers, orders, products), their attributes, the relationships between them, and — in analytics — the metrics and join paths that turn raw tables into business-ready numbers. The result is a blueprint that keeps data stored consistently and interpreted the same way across applications, databases, and analytics tools.
What are the three levels of data modeling?: Conceptual, logical, and physical. A conceptual model captures the business entities and how they relate, with no database detail. A logical model adds attributes, primary and foreign keys, and structure, but stays independent of any specific system. A physical model translates that into the actual tables, column types, indexes, and constraints of a particular database such as PostgreSQL, Snowflake, or BigQuery.
What is the difference between conceptual, logical, and physical data models?: They differ in level of detail and audience. The conceptual model is for business stakeholders — it names entities and relationships in plain terms. The logical model is for data designers — it specifies attributes, keys, and normalization without committing to a platform. The physical model is for engineers — it is the implementable schema with types, indexes, and partitioning tuned to one database engine.
What is the difference between entity-relationship and dimensional modeling?: Entity-relationship (ER) modeling normalizes data into entities and relationships to avoid duplication, which suits transactional systems that write a lot. Dimensional modeling organizes data into fact tables (events and measurements) surrounded by dimension tables (the context you slice by), which suits analytics and reporting because it makes aggregation fast and queries intuitive. Most analytics stacks normalize in source systems and use dimensional or star-schema models in the warehouse.
How does data modeling relate to a semantic layer?: A semantic layer is where analytics data modeling lives as the shared source of truth. Instead of each BI tool, app, or query re-encoding what 'revenue' means, you model metrics, dimensions, join paths, and access rules once in the semantic layer, and every consumer requests them from there. So data modeling in 2026 isn't just designing tables — it's defining the governed business model that a semantic layer serves to every downstream tool and AI agent.
Is data modeling the same as database design?: Database design is one application of data modeling — turning a logical model into a physical schema for a specific database. But data modeling is broader: it includes the conceptual and logical work upstream of any database, and in analytics it extends downstream into modeling metrics and dimensions in transformation tools and the semantic layer. A model can outlive any single database it's implemented in.
Does data modeling still matter for AI and natural-language analytics?: More than ever. Pointed at raw tables, a large language model has to re-derive your join paths and metric definitions on every prompt, so the same question can return different numbers. A modeled semantic layer lets the agent select from certified metrics by name instead of inventing SQL, which is what makes its answers consistent, governed, and explainable. As Brex summarized it, the semantic layer is what makes the AI useful.
What tools are used for data modeling?: It depends on the layer. Conceptual and logical modeling use ER and diagramming tools. Physical modeling lives in the database and migration tooling. In the analytics stack, transformation tools like dbt model and shape data in the warehouse, while a semantic layer such as Cube models the metrics, dimensions, joins, and access rules on top — Cube reads dbt models, so you model in dbt and serve through Cube.
What is dimensional modeling used for?: Analytics and data warehousing. Dimensional modeling arranges data into fact tables — the numeric events you measure, like orders or page views — surrounded by dimension tables that hold the descriptive context you group and filter by, like date, product, or region. This star-schema shape makes aggregations fast and the model easy to reason about, which is why it underpins most reporting warehouses and, by extension, the metrics defined in a semantic layer.
What are common data modeling mistakes?: Treating the model as a one-time artifact instead of something that evolves with the business; over- or under-normalizing for the workload; and — the costliest one in analytics — letting every tool re-model the same metrics, so 'active users' means three different things in three places. The fix for the last one is to model metrics once in a semantic layer and have every BI tool, embedded app, and AI agent read from it.

Upgrade your data stack today

Try Free Talk to sales