When I was at IBM, Anna Gonzales was the thought leader on the token architecture for the Carbon Design System. It was tight, disciplined, and opinionated. And it worked. The structure encoded decisions the team had already fought for and made: what the system constrained, what it left open, where a component’s responsibility ended and a product team’s began.
You can read the system’s philosophy in how the tokens are organized.
The tokens aren’t what made the system work. The convictions are what made it work. The design tokens were an artifact of that conviction.
That distinction is the argument.
The W3C Design Tokens Community Group published its first stable spec at the end of 2025. It’s genuinely good work. Years of collaboration to solve a hard coordination problem. How to share design decisions without everything fracturing every time someone changes a color.
The spec is deliberately minimal. It defines tokens, their types, and a reference system that lets one token point to another — so color.text.primary can reference color.palette.black, and changing the palette propagates everywhere. It adds $extends for group inheritance, a Color module with modern color space support, and a Resolver module for theming and context.
It’s a sensible, focused infrastructure that solves real problems. It’s the kind of foundation that would underpin exactly the discipline Anna built at IBM.
What practitioners have built around that foundation is something else.
The spec is agnostic about tiers. It defines how to express token relationships, not how many layers you should have.
The community is far less agnostic. Three-tier has become doctrine: primitive tokens at the base, semantic tokens that make purposeful claims about use, component tokens scoped to specific components. This is “mature token architecture” in most design systems discourse.
The gap between doctrine and practice is instructive.
Three-tier is what gets taught and recommended. Two tiers is what the major systems actually implement.
Carbon’s architecture doesn’t map neatly onto the primitive/semantic/component model - it has its own layering logic built around UI depth. Polaris has moved away from its component token layer. Material Design 3 publishes reference tokens and system tokens, and stops there.
Three-tier is the aspiration. Two-tier is what survives contact with a real system.
That gap should be a signal. The canonical systems couldn’t fully sustain the doctrine. And yet the doctrine keeps getting taught as the definition of maturity.
The problem isn’t the spec.
The spec - unavoidably - creates a thing to make. “We’re implementing the W3C spec” can start to feel like a north star, when a real north star is missing.
At J.P. Morgan there was always a tension between a debate on token naming strategy and architecture coming before a simpler question was answered: what is this design system for?
Naming debates aren’t a path to that clarity. They can be a replacement for it.
For Anna at IBM, the tokens were downstream. Philosophy came first. Tokens encoded the philosophy.
Where I’ve seen more struggle - at JPM, at some of the design systems I’ve worked with at Knapsack - is when that order is inverted. Taxonomy comes first, the thinking is supposed to emerge from it. Sometimes it does. Often the taxonomy becomes the only explicit structure of the system, and so it becomes load-bearing.
Which leads to a design system whose deepest held opinion is how to name its hover state.
Run naming conventions workshops because mature systems have naming conventions. Produce token JSON because good systems produce token JSON. That’s a cargo cult pattern. The mechanism becomes the mission.
There is a failure mode you can identify: token counts scaling linearly with component complexity.
The Tetrisly design system acknowledged this problem. Their button component reached over 500 tokens. It enumerated every property of every state of every variant: background, border, text, icon, default, hover, focus, active, disabled, primary, secondary, danger, ghost, large, medium, small, dark mode, high contrast.
Before long you end up with button-background-color-primary-large-hover-dark, and hundreds of siblings.
The spec supports this. But at this point the abstraction provides no value over well-organized CSS. The overhead is real: tooling dependency, Figma sync, governance process. But there’s no additional leverage when your variable names map one-to-one to CSS properties you have to write anyway.
The promise of tokens is leverage - fewer, powerful constructs that express more than just flat specifics. 500 tokens is a clear failure of that promise - you may as well be writing CSS. Tetrisly acknowledged this, and they’ve very deliberately thinned their “component tier” so their model is much closer to a two-tier than three-tier model.
Phillip Lovelace recently argued that tokens are even more important in an AI-driven workflow - token taxonomy can be an API the AI agent consumes. Semantic naming lets AI stop guessing your brand.
This is a worthwhile floor argument. AI generating UI from a token file produces more consistent output than generating from nothing. The W3C spec makes that even more reliable.
A floor isn’t a ceiling.
AI can traverse a token graph and resolve a name to a hex value. It can’t tell you why that color is right for a primary hover state, or if a destructive action should use the same token. Should a payment confirmation defer to stricter contrast constraints?
Those aren’t AI limitations.
Those are limitations of the information that tokens are supposed to carry.
Tokens encode what things look like. Not why. Not when. Not the conditions that change the answer.
The convictions in the best systems come from the decisions that precede them. An AI with access to those decisions - rules, intent, context - can do more interesting things than resolve color aliases.
Tight token architecture delivers real value. Tokens are a powerful artifact of thinking, but not a substitute for it. The W3C spec describes something of genuine worth, when it’s built in the right order.
The design systems that work treat tokens as output. Philosophy first, constraints second, governance third. Tokens encode the decisions that have been made. But only if those decisions have been made.
Systems that struggle have the sequence backwards. And the quality of the spec actually makes the inversion easier. It’s a rigorous blueprint for the mechanism, and the foundation is left implicit.
Tokens with system conviction are infrastructure. Tokens that substitute for it are dogma. The difference is everything.
Further reading:
Frost, B. The Many Faces of Themeable Design Systems. bradfrost.com
Gonzales, A. Introducing Figma variables and a consolidated “All themes” library! Carbon Design Blog, Aug 2023.
Design Tokens Technical Reports. W3C Community Group, Oct 2025.
Lovelace, P. Design Systems Are Having Their Moment. Design Systems Collective, Feb 2026.
