How to backtest a portfolio: a methodologically clean tutorial
A backtest is only as honest as the procedure that produced it. The 8-step methodology MEDGE Capital uses to avoid look-ahead bias, survivorship bias and curve fitting.
To backtest a portfolio honestly, follow 8 steps: define the universe as a rule (not a list), match the data frequency to the rebalance frequency, include realistic transaction costs (~5 bps round-trip), eliminate look-ahead bias, fix rebalancing rules, report the metric constellation (not a single Sharpe), split in-sample and out-of-sample, and stress test on labelled crisis windows (2008 GFC, 2020 COVID, 2022 Bear). The literature on backtest overfitting (Bailey et al. 2014, López de Prado 2018) shows ~80% of published quant strategies fail to replicate out-of-sample, and the typical cause is procedural, not market regime change.
Below is the methodology MEDGE Capital uses internally and exposes to users in the Portfolio Analyzer — opinionated about what matters (rebalancing frequency, costs, in-sample vs out-of-sample disclosure) and silent about what does not (broker UI, "favourite formula").
1. Define the universe before you look at performance
Decide which assets are eligible — and write it down — before you compute a single number. If the universe is curated by post-hoc performance (e.g. "the 10 best-performing US sector ETFs of the last 10 years") the backtest is a foregone conclusion.
- ·Universe should be a rule, not a list: "any S&P 500 sector ETF with TER < 0.20%" is a rule; "VTI / QQQ / GLD" is a list.
- ·Survivorship bias: include funds that no longer exist over the window if your data vendor supports it. Most retail tools (and MarketStack EOD) do not — disclose this.
- ·Inclusion bias: a "global ex-US ETF" launched in 2018 cannot be in a 2008-2026 backtest. Either start the window in 2018 or use a proxy with a longer history.
2. Choose the frequency to match the strategy, not the data
Daily data is widely available and seductive. It is also massive over-sampling for a portfolio that rebalances monthly or quarterly. Backtest at the frequency you intend to operate at, plus one notch finer for cost simulation.
- ·Monthly rebalance → daily data for return path + monthly snapshots for rebalance decision.
- ·Quarterly rebalance → daily data with quarterly snapshots.
- ·Annual rebalance → monthly data is fine; daily is overkill.
- ·Daily / weekly rebalance → daily data, and please re-examine the strategy — over-trading is a higher-conviction reason your Sharpe is good than the strategy itself.
3. Bake in realistic transaction costs from day one
A 5 bps round-trip per rebalance kills more "Sharpe 2.0 strategies" than any regime change. Use a flat round-trip estimate calibrated on your broker — MEDGE defaults to 5 bps which is roughly the median for European discount brokers on liquid ETFs.
A backtest with zero costs is not a backtest. It is a stylised idea. State the cost assumption next to every reported number.
4. Eliminate look-ahead bias
The rebalance decision at time t must use only information available at t — close-of-business t-1 at the latest. The two most common bugs:
- ·Using current-period return to score the period: the rebalance happens AT the open of the period, the return is the close-to-close over the period.
- ·Using a slow-published metric as if it were live: SLOOS publishes 6 weeks after quarter end; using it at quarter end is look-ahead.
5. Decide rebalancing rules and stick to them
Calendar-based (first business day of month/quarter), threshold-based (>5% drift from target) or hybrid. MEDGE supports all three but the default is monthly calendar — it is the dullest and most reproducible.
6. Report the metrics that matter — together
Every single-number metric hides at least one failure mode. Report the constellation:
- ·CAGR + annualised vol + Sharpe — the headline triple.
- ·Maximum Drawdown + Calmar — the discomfort metric.
- ·CVaR 95 + CVaR 99 — the tail.
- ·Sortino + Omega — the asymmetry.
- ·Monthly heatmap + rolling 63d Sharpe — the consistency.
7. Split in-sample / out-of-sample, and disclose which is which
The single biggest source of overfitting is calibrating a strategy on the same window where you measure performance. The remedy is to split: estimate parameters on the first 70% of the window, measure on the last 30%, and report both. MEDGE's Compare module supports manual in-sample / out-of-sample splits.
8. Stress test on labelled crisis windows
A "regime-neutral" strategy should survive 2008, 2020 and 2022. The MEDGE Crisis Library presets — 2008 GFC, 2020 COVID, 2022 Bear, 2018 Q4 — backtest your live weights on the labelled windows with one click. The honest test is not whether the strategy beats 60/40 on average; it is whether it beats 60/40 across the three regimes.
In-sample optimisation will always beat 60/40. Out-of-sample, across three regimes, far fewer strategies do.
What MEDGE Capital does
The Portfolio Analyzer wires every step above as a default: monthly rebalance, 5 bps costs, in-sample window equals the backtest window with an opt-in 70/30 split, the metric constellation above on every report, and the Crisis Library accessible from the same module. The full methodology lives at /methodology and the per-metric definitions at /glossary.
Related glossary terms
Sharpe Ratio
The Sharpe Ratio measures a portfolio's excess return over the risk-free rate per unit of total volatility, annualised.
Maximum Drawdown
Maximum Drawdown (MDD) is the largest peak-to-trough decline in a portfolio's cumulative value over a measurement window.
Monte Carlo Simulation
Monte Carlo simulation generates a large number of random portfolio return paths to estimate the probability distribution of future outcomes given a return model.
CVaR (Conditional Value at Risk)
CVaR (Conditional Value at Risk) is the average loss conditional on the VaR threshold being breached at a given confidence level.
Keep reading
Why CVaR should replace VaR in retail
Value-at-Risk is the standard but ignores the tail. Conditional VaR measures how much is lost when VaR is breached — and is the only coherent measure by definition.
Read MacroFocus Engine: how we map 14 macro events to geographic footprints
A technical overview of the event-geolocation engine that powers Focus and Risk Map: 0–100 scoring, regime classification and integration with proxy ETFs.
ReadInterested in MEDGE Capital's approach?
Open the platform