Tutorial·02 Jun 2026·9 min

How to backtest a portfolio: a methodologically clean tutorial

A backtest is only as honest as the procedure that produced it. The 8-step methodology MEDGE Capital uses to avoid look-ahead bias, survivorship bias and curve fitting.

To backtest a portfolio honestly, follow 8 steps: define the universe as a rule (not a list), match the data frequency to the rebalance frequency, include realistic transaction costs (~5 bps round-trip), eliminate look-ahead bias, fix rebalancing rules, report the metric constellation (not a single Sharpe), split in-sample and out-of-sample, and stress test on labelled crisis windows (2008 GFC, 2020 COVID, 2022 Bear). The literature on backtest overfitting (Bailey et al. 2014, López de Prado 2018) shows ~80% of published quant strategies fail to replicate out-of-sample, and the typical cause is procedural, not market regime change.

Below is the methodology MEDGE Capital uses internally and exposes to users in the Portfolio Analyzer — opinionated about what matters (rebalancing frequency, costs, in-sample vs out-of-sample disclosure) and silent about what does not (broker UI, "favourite formula").

1. Define the universe before you look at performance

Decide which assets are eligible — and write it down — before you compute a single number. If the universe is curated by post-hoc performance (e.g. "the 10 best-performing US sector ETFs of the last 10 years") the backtest is a foregone conclusion.

·Universe should be a rule, not a list: "any S&P 500 sector ETF with TER < 0.20%" is a rule; "VTI / QQQ / GLD" is a list.
·Survivorship bias: include funds that no longer exist over the window if your data vendor supports it. Most retail tools (and MarketStack EOD) do not — disclose this.
·Inclusion bias: a "global ex-US ETF" launched in 2018 cannot be in a 2008-2026 backtest. Either start the window in 2018 or use a proxy with a longer history.

2. Choose the frequency to match the strategy, not the data

Daily data is widely available and seductive. It is also massive over-sampling for a portfolio that rebalances monthly or quarterly. Backtest at the frequency you intend to operate at, plus one notch finer for cost simulation.

·Monthly rebalance → daily data for return path + monthly snapshots for rebalance decision.
·Quarterly rebalance → daily data with quarterly snapshots.
·Annual rebalance → monthly data is fine; daily is overkill.
·Daily / weekly rebalance → daily data, and please re-examine the strategy — over-trading is a higher-conviction reason your Sharpe is good than the strategy itself.

3. Bake in realistic transaction costs from day one

A 5 bps round-trip per rebalance kills more "Sharpe 2.0 strategies" than any regime change. Use a flat round-trip estimate calibrated on your broker — MEDGE defaults to 5 bps which is roughly the median for European discount brokers on liquid ETFs.

A backtest with zero costs is not a backtest. It is a stylised idea. State the cost assumption next to every reported number.

4. Eliminate look-ahead bias

The rebalance decision at time t must use only information available at t — close-of-business t-1 at the latest. The two most common bugs:

·Using current-period return to score the period: the rebalance happens AT the open of the period, the return is the close-to-close over the period.
·Using a slow-published metric as if it were live: SLOOS publishes 6 weeks after quarter end; using it at quarter end is look-ahead.

5. Decide rebalancing rules and stick to them

Calendar-based (first business day of month/quarter), threshold-based (>5% drift from target) or hybrid. MEDGE supports all three but the default is monthly calendar — it is the dullest and most reproducible.

6. Report the metrics that matter — together

Every single-number metric hides at least one failure mode. Report the constellation:

·CAGR + annualised vol + Sharpe — the headline triple.
·Maximum Drawdown + Calmar — the discomfort metric.
·CVaR 95 + CVaR 99 — the tail.
·Sortino + Omega — the asymmetry.
·Monthly heatmap + rolling 63d Sharpe — the consistency.

7. Split in-sample / out-of-sample, and disclose which is which

The single biggest source of overfitting is calibrating a strategy on the same window where you measure performance. The remedy is to split: estimate parameters on the first 70% of the window, measure on the last 30%, and report both. MEDGE's Compare module supports manual in-sample / out-of-sample splits.

8. Stress test on labelled crisis windows

A "regime-neutral" strategy should survive 2008, 2020 and 2022. The MEDGE Crisis Library presets — 2008 GFC, 2020 COVID, 2022 Bear, 2018 Q4 — backtest your live weights on the labelled windows with one click. The honest test is not whether the strategy beats 60/40 on average; it is whether it beats 60/40 across the three regimes.

In-sample optimisation will always beat 60/40. Out-of-sample, across three regimes, far fewer strategies do.

What MEDGE Capital does

The Portfolio Analyzer wires every step above as a default: monthly rebalance, 5 bps costs, in-sample window equals the backtest window with an opt-in 70/30 split, the metric constellation above on every report, and the Crisis Library accessible from the same module. The full methodology lives at /methodology and the per-metric definitions at /glossary.

Related glossary terms

Sharpe Ratio

The Sharpe Ratio measures a portfolio's excess return over the risk-free rate per unit of total volatility, annualised.

Maximum Drawdown

Maximum Drawdown (MDD) is the largest peak-to-trough decline in a portfolio's cumulative value over a measurement window.

Monte Carlo Simulation

Monte Carlo simulation generates a large number of random portfolio return paths to estimate the probability distribution of future outcomes given a return model.

CVaR (Conditional Value at Risk)

CVaR (Conditional Value at Risk) is the average loss conditional on the VaR threshold being breached at a given confidence level.

Keep reading

Methodology

Why CVaR should replace VaR in retail

Value-at-Risk is the standard but ignores the tail. Conditional VaR measures how much is lost when VaR is breached — and is the only coherent measure by definition.

Read Macro

Focus Engine: how we map 14 macro events to geographic footprints

A technical overview of the event-geolocation engine that powers Focus and Risk Map: 0–100 scoring, regime classification and integration with proxy ETFs.

Read

Interested in MEDGE Capital's approach?

Open the platform