A 2024 SEC review found funds running AI-driven strategies outperformed peers by an average of twelve percent, and a PwC survey put the alpha uplift from alternative data and machine learning at roughly twenty percent for the same year. Both numbers are real, but they describe four different layers of a stack — signal, sizing, execution, risk — and each layer runs a different model with a different failure mode. Mistake the layers for one black box and every "AI hedge fund" headline reads as magic; read them as a stack and the same headline becomes a thirty-second triage.
At a glance
The trading stack splits cleanly into four decisions: what to trade, how much, how to get filled, and what would blow up. Each owns a different model class. None of them is "the AI."
| Layer | Decision | Dominant ML | Press-reported users |
|---|---|---|---|
| Signal | What to buy or sell, when | Deep learning on price + NLP on text + alt data | Renaissance, Two Sigma, DE Shaw, Man GLG |
| Sizing | How much capital to allocate | Convex optimisation + ML-conditioned weights | AQR, DE Shaw |
| Execution | How to fill a parent order | Reinforcement learning over the order book | Jane Street, HRT, Citadel Securities |
| Risk | What positions to cut or hedge | Stress models + multi-agent rebalancers | Citadel, Jane Street |
The headline "AI hedge fund" almost always refers to two or three of these layers, not all four. A press release naming an LSTM is talking about Signal; one bragging about implementation shortfall is talking about Execution; one citing the EU AI Act is usually worried about Risk. The reader who knows which layer is in scope reads the rest of the release accurately.
The four-layer stack
Each layer below answers exactly one question. The reason the layers matter — rather than the union of all the models — is that the failure modes do not generalise: a model that hallucinates a stock thesis fails at the Signal layer; a model that over-allocates fails at Sizing; a model that walks the book fails at Execution. The risk team's job is to keep those failures local.
Signal — picking the trade
Deep learning on order books and prices
The dominant signal model on short horizons is some flavour of recurrent or attention-based network — LSTMs and GRUs on stationary transforms of limit-order-book features, and increasingly transformer variants on tick-level price sequences. Press accounts describe Renaissance's Medallion fund using deep learning to estimate factor effectiveness rather than raw price moves: when does momentum work, when does value, conditioned on regime. Two Sigma is reported to correlate the same factor signals with macro indicators for multi-asset portfolios; DE Shaw is reported to use NLP on CEO tone in earnings transcripts to predict earnings beats. The constant across these accounts is that the model is a feature extractor, not a policy.
NLP on transcripts, news, and analyst flow
Large language models — and not just frontier general models — read earnings transcripts, sell-side notes, and news feeds at a scale that would have required a research analyst per ticker a decade ago. The publicly reported example most often cited is Man GLG's monitoring of Chinese news sentiment about Versace: when sentiment shifted from +0.4 to -0.7 in the wake of a 2018 backlash, the parent's stock dropped roughly 14% within days; Man GLG's signal led the move. The interesting detail is that the signal was not "the model predicted the drop" — it was "the model read the news the day it was published instead of in the next morning's analyst summary." Latency on text is where this layer wins.
Alternative data
By 2025, the Lowenstein Sandler Alternative Data Report put adoption of alt data at 90% of surveyed funds, up from 62% in 2023; Morgan Stanley's rule of thumb is roughly $1M of alt-data spend per $1B AUM in year one, scaling to $3M by year three. The categories that actually move money — credit-card panels, satellite imagery, geolocation, web scrape — are inputs to the same kind of signal models, not models themselves. The ML job is to turn a noisy panel of consumer transactions into a same-store-sales forecast, two to three weeks ahead of the company's own pre-announcement. The challenge is universal coverage: a credit-card panel that only sees one demographic produces a biased forecast everywhere it is applied. See document parsing for RAG for the same problem shape in a different domain.
Execution — filling the order without leaking
Once Signal has decided to buy a million shares, Execution decides how to actually get filled without moving the price against itself. The classical baseline is the Almgren–Chriss model from 2000, which assumes a linear market-impact function and solves for the optimal liquidation trajectory in closed form. Almgren–Chriss is still the benchmark every RL paper compares to, twenty-six years later, because the closed-form solution is honest about its assumptions.
Recent research papers — arXiv 2507.06345 from mid-2025 on combined market and limit orders, and a 2025 A3C+LSTM framework from the International Conference on Digital Society and Intelligent Computing — show RL agents beating the Almgren–Chriss benchmark in simulated order books featuring noise traders, tactical responders, and a strategic counterparty. The reward is implementation shortfall, the agent's actions are order placements at varying depths, and the agent learns to wait when the book is thin and to lift when the book deepens. The same papers are blunt about the limit: trained models tend to memorise their training environment instead of learning a generalisable policy. RL for tool use covers the underlying credit-assignment problem; reward design and reward hacking covers what goes wrong when the simulator does not match production.
Production usage at the firms most often named — Jane Street, Hudson River Trading, Citadel Securities — sits behind a wall of secrecy, but the public signal is consistent: market makers run RL-derived execution policies under hard rule-based guards, not raw RL policies in the wild. The guard rails are the load-bearing safety; the RL is the optimisation.
Sizing and risk — the layer regulators care about
Sizing
Sizing is where ML-derived signals meet a convex optimiser. The model output from Signal becomes one of many inputs to a portfolio-construction problem with constraints on gross exposure, sector concentration, factor neutrality, and turnover. The ML contribution at this layer is usually signal-conditioned weights — when the signal is in a high-confidence regime, the optimiser is allowed to take more risk; when the regime is uncertain, gross is cut. In 2025, AQR and DE Shaw were reported to use RL-derived sizing on momentum strategies to short overvalued AI-sector names, which is sizing more than signal — the signal was old, the size response was new.
Risk
Risk is the layer most likely to be described in the press as multi-agent. The pattern reported at Citadel and Jane Street through the 2025 vol spikes was one agent that watches tail-risk metrics and triggers hedges, and a second agent that rebalances positions on drawdowns. The architecture, when described publicly, is essentially the supervisor–worker pattern: a risk supervisor sets the policy envelope, specialised workers execute within it. The shared blackboard of positions and exposures is essentially what shared memory and blackboard describes for general agent systems.
This is also the layer regulators want explainability for. The EU AI Act's high-risk classification of trading models, taking full effect in 2026, demands documentation of training data, model behaviour under stress, and human-override paths. None of the firms above will publish their stress models, but the audit trail is going to exist in 2026 in a way it did not in 2022.
Cross-cutting comparison: same four axes, different answers
Latency separates the layers more than the model class does. Signal is allowed to think for seconds to days depending on horizon; Execution operates in microseconds; Risk is in the seconds-to-minutes regime, fast enough to react to a drawdown but slow enough to reason about exposures. The same neural network architecture deployed at Signal speed (a daily-horizon LSTM) and at Execution speed (a microsecond-budget feed-forward net) is genuinely two different deployments, not one.
Interpretability inverts the latency ordering. Signal models can be opaque — they only need to be backtested and risk-checked — and Execution models can be opaque as long as the guards around them are not. Risk models are the layer where opaque is no longer acceptable, because the audit trail attaches at that layer. That is also why Signal teams hire ML researchers and Risk teams hire actuaries.
Data dependency tracks adversarial pressure: the layer most dependent on data quality is the layer most attractive to adversaries. Signal lives or dies by alt-data integrity; Execution lives or dies by feed integrity; Risk lives or dies by accurate position reporting. Each layer has its own data-quality discipline, and the disciplines do not transfer cleanly between layers — a Signal-grade alt-data team has different competencies from a Risk-grade position-keeping team. Regulatory exposure mirrors the same gradient.
When AI is the wrong answer
The most common AI-in-trading failure is the one the press covers least: a backtested signal that worked on history and dies the day the regime shifts. RL papers on execution are open about this — generalisation across order-book conditions is the active research frontier, not a solved problem. The same dynamic at the Signal layer is why a credit-card panel that worked through the 2024 consumer-spending environment can stop working when 2026's environment differs in unmodeled ways. See reward design and reward hacking for the analogous problem in trained agents — the proxy reward and the proxy backtest fail for the same reason.
Data leakage is the failure mode that, per recent industry incident retrospectives, is roughly ten times more likely to reach the catastrophic tier than the more discussed LLM hallucination — partly because leakage failures only surface in production. A signal that inadvertently used a future field during training will look perfect in every backtest and lose money on day one of live deployment. The Execution layer is least exposed to leakage (the order book is the source of truth) and most exposed to feed corruption; the Signal layer is most exposed to leakage and least exposed to feed corruption.
Audit-time opacity is the failure mode that did not exist before 2026 and now dominates the Risk layer. A model that the regulator cannot understand is no longer a deployable model; the EU AI Act treats trading models as high-risk by default. The technical answer is the same as it is for AI systems in general — interpretable surrogate models, documented training data, replayable evaluation pipelines — but the legal answer changes the calculus on whether opacity is worth the alpha.
FAQ
Do funds let AI pull the trigger autonomously?
At the Execution layer, yes — once Signal has approved a parent order, the order-routing policy is mostly autonomous, often RL-derived, under hard rule-based guards on volume, price band, and time. At the Signal layer, much less so — most production systems require a human-readable thesis or factor decomposition before a position is taken, and many shops require a human approval on novel trades. The autonomy gradient runs Execution → Sizing → Risk → Signal, from most to least.
Why do quant funds still hire PhDs if AI does this?
Because the model is not the moat — the data, the labelling discipline, the simulator, and the risk model are. Building a competitive Signal model requires choosing what to predict, finding the data that predicts it, cleaning it without leaking, and proving the result is not a backtest artifact. The PhDs do the choosing, finding, cleaning, and proving; the ML library is a commodity.
Is the alpha edge from the model or the data?
Almost always the data, and almost always at the Signal layer. The press tends to credit the model because the model is glamorous; the firms credit the data because the data is hard to replicate. A signal generated from public price feeds and a public ML library is, by definition, low alpha — anyone can run the same model on the same input. The defensible edge is unique data, treated with discipline.
What happens to AI strategies when the regime shifts?
The honest answer is that most of them stop working, the way human strategies do. The mitigation is at the Risk layer, not the Signal layer: regime detectors that cut gross when conditions diverge from the training distribution, and ensemble strategies whose components are diverse enough that not all of them break at once. See debate, voting and ensembles for the general result that engineered diversity matters more than the number of components.
Further reading
On this wiki
- RL for Tool Use & Multi-Step Tasks — the credit-assignment problem that RL execution agents inherit.
- Reward Design & Reward Hacking — why an execution simulator can train the wrong policy.
- Supervisor / Worker Orchestration — the pattern Risk teams use in production.
- Debate, Voting & Ensembles — why ensembled signals survive regime shifts that single signals do not.
- Evals 101 — the discipline behind a backtest you can trust.
- Guardrails 101 — the hard rule-based wrappers around an RL policy.
- FinRL vs TensorTrade vs ABIDES-Gym vs ElegantRL — the four open-source RL-for-trading frameworks that sit inside the execution layer of this stack, and what the simulation-contract framing reveals about which one is honest under stress.
External sources
- arXiv 2507.06345 — Reinforcement Learning for Trade Execution with Market and Limit Orders (2025).
- arXiv 2412.20138 — TradingAgents: Multi-Agents LLM Financial Trading Framework.
- IOSCO CR/01/2025 — Artificial Intelligence in capital markets, regulatory report.
- CFA Institute Research Foundation, Chapter 5: Deep Learning — 2025 survey of deep-learning use in investment management.