The model landscape: families & providers

E1
Concepts · The AI Model & Tooling Ecosystem

The model landscape: families, providers, and how to think about them.

This entry gives you a vendor-neutral map of the large-model landscape as it stands in early 2026: who builds the major model families, what differentiates a "family" from a "model," and a durable mental model for placing any new release without memorizing a leaderboard that is stale within weeks.

STEP 1

A family is a lineage, not a single model.

Providers ship model families: a named lineage that is periodically retrained and re-released, usually with size or capability tiers inside each generation. When you read "Claude," "GPT," "Gemini," "Llama," or "Mistral," you are reading a family name, not a specific artifact. The thing you actually call over an API is a dated, versioned snapshot inside that family — e.g. a particular generation, a particular tier (small / mid / frontier), and often a particular date suffix.

This matters because almost every durable statement about "which model is best" is really a statement about a specific snapshot on a specific task at a specific date. The family is stable; the snapshot is not. Build your mental model around families and tiers, and treat the specific best-on-benchmark snapshot as a volatile detail you re-check when you actually deploy.

Throughout this section, named models and rankings are illustrative and time-stamped. The field re-baselines roughly every few months. Treat any specific "X beats Y" claim as perishable and re-verify against current provider docs and independent evaluations before you commit.

STEP 2

The major providers, grouped by how they ship.

As of early 2026 the landscape sorts into a few recognizable groups. This is a map, not a ranking.

Frontier closed-weight labs

Anthropic (Claude family), OpenAI (GPT family), and Google DeepMind (Gemini family) each ship proprietary frontier models accessed primarily through hosted APIs. They compete at the top of capability, invest heavily in safety and alignment work, and release tiered line-ups (a small/fast tier, a balanced tier, a top reasoning tier) within each generation. Weights are not published; you consume them as a service.

Open-weight leaders

Meta (Llama family) and Mistral popularized capable models whose weights you can download and run yourself, under licenses ranging from permissive to source-available-with-conditions. A broader ecosystem — including labs such as those behind the Qwen, DeepSeek, and Gemma lines — publishes strong open-weight models, several of which are competitive with closed frontier models on many tasks. "Open weight" is now a first-class part of the landscape, not a fringe.

Platform and infrastructure players

Cloud providers and independent inference companies do not necessarily train frontier models but make many of them available, sometimes with their own fine-tunes or hosting guarantees. They are part of the access layer (covered in the inference-providers entry) rather than the model layer, but in practice you often choose a model and a host together.

STEP 3

The axes that actually differentiate models.

Instead of memorizing names, learn the axes. Any model — today's or next quarter's — sits somewhere on each of these, and the rest of this section is essentially one entry per axis.

  • Open vs closed weights. Can you download and run it, or only call it as a service? Drives control, cost structure, privacy, and lock-in.
  • Modality. Text only, or also vision, audio, and other inputs/outputs? Drives what problems it can even attempt.
  • Size / tier. Frontier vs mid vs small. Drives the cost / quality / latency trade-off that dominates production economics.
  • Reasoning vs direct. Does it spend extra inference-time compute "thinking" before answering? Drives accuracy on hard multi-step problems and cost on easy ones.
  • Specialization. General-purpose vs tuned for code, embeddings, retrieval, or a domain.

A useful habit: when a new model launches, do not ask "is it the best?" Ask "where does it sit on these five axes, and does that combination fit the job I have?" That question stays answerable even as the leaderboard churns.

STEP 4

Why the landscape moves so fast — and what stays stable.

The volatile layer is the frontier capability ranking: which family currently holds the top of a given benchmark changes on a months-long cadence, and benchmark leadership rarely translates cleanly to your specific workload anyway. The stable layer is the structure: the existence of tiered families, the open/closed split, the modality and reasoning axes, and the separation between who trains a model and who serves it.

Practical consequence: write your systems against the stable structure. Abstract the specific model behind a thin interface so swapping a snapshot is a config change, keep an evaluation set that reflects your task, and re-run it whenever you consider a new release. The teams that stay current are not the ones who memorize every launch — they are the ones whose architecture makes a model swap cheap and whose evals tell them whether the swap was actually an improvement.

The single most transferable skill in this whole section: maintain a small, representative evaluation set for your own use case. It outlasts every individual model and is the only reliable arbiter of "is the new one better for me."