AI Blog

AI in the Trading Stack: What Hedge Funds Actually Run on the Decision

AI in trading is not one bot; it is a four-layer stack — signal, sizing, execution, risk — and each layer runs a different model with different failure modes. Map the layers and any "AI hedge fund" headline becomes legible in thirty seconds.

By Agentic AI Wiki 16 min read

A 2024 SEC review found funds running AI-driven strategies outperformed peers by an average of twelve percent, and a PwC survey put the alpha uplift from alternative data and machine learning at roughly twenty percent for the same year. Both numbers are real, but they describe four different layers of a stack — signal, sizing, execution, risk — and each layer runs a different model with a different failure mode. Mistake the layers for one black box and every "AI hedge fund" headline reads as magic; read them as a stack and the same headline becomes a thirty-second triage.

At a glance

The trading stack splits cleanly into four decisions: what to trade, how much, how to get filled, and what would blow up. Each owns a different model class. None of them is "the AI."

LayerDecisionDominant MLPress-reported users
SignalWhat to buy or sell, whenDeep learning on price + NLP on text + alt dataRenaissance, Two Sigma, DE Shaw, Man GLG
SizingHow much capital to allocateConvex optimisation + ML-conditioned weightsAQR, DE Shaw
ExecutionHow to fill a parent orderReinforcement learning over the order bookJane Street, HRT, Citadel Securities
RiskWhat positions to cut or hedgeStress models + multi-agent rebalancersCitadel, Jane Street
Reported alpha uplift for AI-driven funds in 2024 Horizontal bar chart comparing three reference numbers for 2024: PwC's alternative-data alpha uplift at twenty percent (accent), the SEC's AI-fund outperformance over peers at twelve percent (accent-soft), and the S&P 500 baseline at zero percent (neutral). Each bar is labelled at its right edge. Reported 2024 alpha uplift (annualised, vs flat baseline) 0% 5% 10% 15% 20% 25% PwC alt-data funds 2024 alpha uplift +20% SEC AI funds 2024 outperformance vs peers +12% S&P 500 baseline 2024 reference 0%
The two often-cited 2024 numbers — PwC's +20% alpha for alt-data funds and the SEC's +12% outperformance for AI-driven funds — sit on different axes; the bar chart puts them next to a flat baseline for scale.

The headline "AI hedge fund" almost always refers to two or three of these layers, not all four. A press release naming an LSTM is talking about Signal; one bragging about implementation shortfall is talking about Execution; one citing the EU AI Act is usually worried about Risk. The reader who knows which layer is in scope reads the rest of the release accurately.

The four-layer stack

The four-layer AI trading stack A four-layer architecture diagram. From top to bottom: Signal (deep learning, NLP, alternative data); Sizing (portfolio optimisation conditioned on ML); Execution (reinforcement learning over the limit order book); Risk (stress models and rebalancing agents). A parent order flows top-down on the right; risk signals flow bottom-up. Signal What to buy or sell, when Deep learning · NLP · alt data LSTM / GRU on order books NLP on transcripts & news Alt data: cards / sat / web Sizing How much capital to allocate ML-conditioned convex optimisation Factor & sector constraints Signal-confidence gross targets RL-derived response (2025) Execution How to fill without leaking RL over the limit order book Almgren–Chriss baseline A3C / PPO over LOB state Hard rule-based guard rails Risk What positions to cut or hedge Stress models · multi-agent rebalance Supervisor / worker pattern Tail-risk + drawdown agents EU AI Act 2026 audit trail parent order risk-signal feedback
The four layers and the ML technique that dominates each. The parent order flows top-down; risk signals flow back up.

Each layer below answers exactly one question. The reason the layers matter — rather than the union of all the models — is that the failure modes do not generalise: a model that hallucinates a stock thesis fails at the Signal layer; a model that over-allocates fails at Sizing; a model that walks the book fails at Execution. The risk team's job is to keep those failures local.

Signal — picking the trade

Deep learning on order books and prices

The dominant signal model on short horizons is some flavour of recurrent or attention-based network — LSTMs and GRUs on stationary transforms of limit-order-book features, and increasingly transformer variants on tick-level price sequences. Press accounts describe Renaissance's Medallion fund using deep learning to estimate factor effectiveness rather than raw price moves: when does momentum work, when does value, conditioned on regime. Two Sigma is reported to correlate the same factor signals with macro indicators for multi-asset portfolios; DE Shaw is reported to use NLP on CEO tone in earnings transcripts to predict earnings beats. The constant across these accounts is that the model is a feature extractor, not a policy.

NLP on transcripts, news, and analyst flow

Large language models — and not just frontier general models — read earnings transcripts, sell-side notes, and news feeds at a scale that would have required a research analyst per ticker a decade ago. The publicly reported example most often cited is Man GLG's monitoring of Chinese news sentiment about Versace: when sentiment shifted from +0.4 to -0.7 in the wake of a 2018 backlash, the parent's stock dropped roughly 14% within days; Man GLG's signal led the move. The interesting detail is that the signal was not "the model predicted the drop" — it was "the model read the news the day it was published instead of in the next morning's analyst summary." Latency on text is where this layer wins.

Alternative data

By 2025, the Lowenstein Sandler Alternative Data Report put adoption of alt data at 90% of surveyed funds, up from 62% in 2023; Morgan Stanley's rule of thumb is roughly $1M of alt-data spend per $1B AUM in year one, scaling to $3M by year three. The categories that actually move money — credit-card panels, satellite imagery, geolocation, web scrape — are inputs to the same kind of signal models, not models themselves. The ML job is to turn a noisy panel of consumer transactions into a same-store-sales forecast, two to three weeks ahead of the company's own pre-announcement. The challenge is universal coverage: a credit-card panel that only sees one demographic produces a biased forecast everywhere it is applied. See document parsing for RAG for the same problem shape in a different domain.

Execution — filling the order without leaking

Reinforcement-learning trade-execution loop A two-box loop. The RL agent on the left receives the limit-order-book state and emits an action — market order, limit order at some depth, or wait. The order-book environment on the right ticks forward and returns the next state and a reward equal to the negative implementation shortfall. RL agent policy π(a | s) A3C · PPO · DQN trained against the simulator below guarded by hard rules Limit-order-book env noise + tactical + strategic counterparties market-impact model tick → tick transition Almgren–Chriss baseline action a — place market / limit / wait state s' — book snapshot reward r = − implementation shortfall The agent learns a placement policy; the closed-form Almgren–Chriss solution remains the published benchmark. Generalisation across regimes is the active research frontier — agents tend to memorise the simulator. RL loop · trade execution parent order arrives from the Signal layer; child orders go to the venue.
The RL execution loop. The state is the order book, the action is the order placement, the reward is the negative implementation shortfall.

Once Signal has decided to buy a million shares, Execution decides how to actually get filled without moving the price against itself. The classical baseline is the Almgren–Chriss model from 2000, which assumes a linear market-impact function and solves for the optimal liquidation trajectory in closed form. Almgren–Chriss is still the benchmark every RL paper compares to, twenty-six years later, because the closed-form solution is honest about its assumptions.

Recent research papers — arXiv 2507.06345 from mid-2025 on combined market and limit orders, and a 2025 A3C+LSTM framework from the International Conference on Digital Society and Intelligent Computing — show RL agents beating the Almgren–Chriss benchmark in simulated order books featuring noise traders, tactical responders, and a strategic counterparty. The reward is implementation shortfall, the agent's actions are order placements at varying depths, and the agent learns to wait when the book is thin and to lift when the book deepens. The same papers are blunt about the limit: trained models tend to memorise their training environment instead of learning a generalisable policy. RL for tool use covers the underlying credit-assignment problem; reward design and reward hacking covers what goes wrong when the simulator does not match production.

Production usage at the firms most often named — Jane Street, Hudson River Trading, Citadel Securities — sits behind a wall of secrecy, but the public signal is consistent: market makers run RL-derived execution policies under hard rule-based guards, not raw RL policies in the wild. The guard rails are the load-bearing safety; the RL is the optimisation.

Sizing and risk — the layer regulators care about

Sizing

Sizing is where ML-derived signals meet a convex optimiser. The model output from Signal becomes one of many inputs to a portfolio-construction problem with constraints on gross exposure, sector concentration, factor neutrality, and turnover. The ML contribution at this layer is usually signal-conditioned weights — when the signal is in a high-confidence regime, the optimiser is allowed to take more risk; when the regime is uncertain, gross is cut. In 2025, AQR and DE Shaw were reported to use RL-derived sizing on momentum strategies to short overvalued AI-sector names, which is sizing more than signal — the signal was old, the size response was new.

Risk

Risk is the layer most likely to be described in the press as multi-agent. The pattern reported at Citadel and Jane Street through the 2025 vol spikes was one agent that watches tail-risk metrics and triggers hedges, and a second agent that rebalances positions on drawdowns. The architecture, when described publicly, is essentially the supervisor–worker pattern: a risk supervisor sets the policy envelope, specialised workers execute within it. The shared blackboard of positions and exposures is essentially what shared memory and blackboard describes for general agent systems.

This is also the layer regulators want explainability for. The EU AI Act's high-risk classification of trading models, taking full effect in 2026, demands documentation of training data, model behaviour under stress, and human-override paths. None of the firms above will publish their stress models, but the audit trail is going to exist in 2026 in a way it did not in 2022.

Cross-cutting comparison: same four axes, different answers

Cross-cutting comparison of Signal, Execution, and Risk layers Three columns — Signal, Execution, Risk — compared across four axes: latency, interpretability required, data dependency, and regulatory exposure. Cells are shaded according to where the layer sits on each axis. Signal Execution Risk Latency budget per decision sec → days µs → ms sec → min Interpretability required Low Medium (guards opaque) High Data dependency what kills it Alt-data integrity Feed integrity Position accuracy Regulatory EU AI Act (2026) Low Medium High-risk by default Low Medium High
The same four axes — latency, interpretability, data dependency, regulatory exposure — land in radically different places per layer.

Latency separates the layers more than the model class does. Signal is allowed to think for seconds to days depending on horizon; Execution operates in microseconds; Risk is in the seconds-to-minutes regime, fast enough to react to a drawdown but slow enough to reason about exposures. The same neural network architecture deployed at Signal speed (a daily-horizon LSTM) and at Execution speed (a microsecond-budget feed-forward net) is genuinely two different deployments, not one.

Interpretability inverts the latency ordering. Signal models can be opaque — they only need to be backtested and risk-checked — and Execution models can be opaque as long as the guards around them are not. Risk models are the layer where opaque is no longer acceptable, because the audit trail attaches at that layer. That is also why Signal teams hire ML researchers and Risk teams hire actuaries.

Data dependency tracks adversarial pressure: the layer most dependent on data quality is the layer most attractive to adversaries. Signal lives or dies by alt-data integrity; Execution lives or dies by feed integrity; Risk lives or dies by accurate position reporting. Each layer has its own data-quality discipline, and the disciplines do not transfer cleanly between layers — a Signal-grade alt-data team has different competencies from a Risk-grade position-keeping team. Regulatory exposure mirrors the same gradient.

When AI is the wrong answer

Failure-mode heatmap by trading-stack layer Heatmap matrix. Rows are the four stack layers — Signal, Sizing, Execution, Risk. Columns are four failure modes — overfitting, regime shift, data leakage, audit-time opacity. Each cell is shaded strong, medium, or weak according to how severely the failure mode hits the layer, and labelled with the qualitative level. Failure mode × stack layer — severity Overfitting Regime shift Data leakage Audit opacity Signal Strong Strong Strong Weak Sizing Medium Medium Weak Medium Execution Strong Medium Medium (feed) Medium Risk Weak Strong Medium Strong Weak — peripheral concern Medium — actively managed Strong — dominant failure mode
Where each failure mode bites hardest. The Signal layer takes overfitting and regime shift; Execution takes data leakage; Risk takes the audit-time opacity hit.

The most common AI-in-trading failure is the one the press covers least: a backtested signal that worked on history and dies the day the regime shifts. RL papers on execution are open about this — generalisation across order-book conditions is the active research frontier, not a solved problem. The same dynamic at the Signal layer is why a credit-card panel that worked through the 2024 consumer-spending environment can stop working when 2026's environment differs in unmodeled ways. See reward design and reward hacking for the analogous problem in trained agents — the proxy reward and the proxy backtest fail for the same reason.

Data leakage is the failure mode that, per recent industry incident retrospectives, is roughly ten times more likely to reach the catastrophic tier than the more discussed LLM hallucination — partly because leakage failures only surface in production. A signal that inadvertently used a future field during training will look perfect in every backtest and lose money on day one of live deployment. The Execution layer is least exposed to leakage (the order book is the source of truth) and most exposed to feed corruption; the Signal layer is most exposed to leakage and least exposed to feed corruption.

Audit-time opacity is the failure mode that did not exist before 2026 and now dominates the Risk layer. A model that the regulator cannot understand is no longer a deployable model; the EU AI Act treats trading models as high-risk by default. The technical answer is the same as it is for AI systems in general — interpretable surrogate models, documented training data, replayable evaluation pipelines — but the legal answer changes the calculus on whether opacity is worth the alpha.

FAQ

Do funds let AI pull the trigger autonomously?

At the Execution layer, yes — once Signal has approved a parent order, the order-routing policy is mostly autonomous, often RL-derived, under hard rule-based guards on volume, price band, and time. At the Signal layer, much less so — most production systems require a human-readable thesis or factor decomposition before a position is taken, and many shops require a human approval on novel trades. The autonomy gradient runs Execution → Sizing → Risk → Signal, from most to least.

Why do quant funds still hire PhDs if AI does this?

Because the model is not the moat — the data, the labelling discipline, the simulator, and the risk model are. Building a competitive Signal model requires choosing what to predict, finding the data that predicts it, cleaning it without leaking, and proving the result is not a backtest artifact. The PhDs do the choosing, finding, cleaning, and proving; the ML library is a commodity.

Is the alpha edge from the model or the data?

Almost always the data, and almost always at the Signal layer. The press tends to credit the model because the model is glamorous; the firms credit the data because the data is hard to replicate. A signal generated from public price feeds and a public ML library is, by definition, low alpha — anyone can run the same model on the same input. The defensible edge is unique data, treated with discipline.

What happens to AI strategies when the regime shifts?

The honest answer is that most of them stop working, the way human strategies do. The mitigation is at the Risk layer, not the Signal layer: regime detectors that cut gross when conditions diverge from the training distribution, and ensemble strategies whose components are diverse enough that not all of them break at once. See debate, voting and ensembles for the general result that engineered diversity matters more than the number of components.

Further reading

On this wiki

External sources