Why intraday data?
Only a fraction of the huge amount of information produced in financial markets will inform investment decisions. When assessing and estimating risk, end-of-day data is the standard. However, with the objective of extracting a maximal amount of information from available price movements, investors are increasingly using intraday data. This is expected to increase the accuracy of estimates and, in turn, produce better long-term investment results.
To illustrate the impact of various data frequencies, the below graph displays a snapshot of price movements for the March 2022 S&P 500 futures contract with observation frequencies from end-of-day to 1-minute bars. 10-minute snapshots will result in 138 data points per day compared to 1 data point per day for end-of-day close prices. The green area in the graph indicates when the underlying US cash equity market is open, and the red area indicates the hour when the S&P 500 futures contract is closed for trading.
Intraday distribution of volatility
To further emphasize the point that relevant information might be missed when focusing solely on end-of-day data, we have broken down volatility at intraday intervals. The graph below displays the proportion of the realized variance of S&P 500 futures contracts over the previous 15 years attributable to each 30-minute interval during the day.
The graph shows that the variance changes throughout the trading day, with the largest price movements around the underlying cash market open and close. Of note is that 37% of the variance occurs outside of official US stock market trading hours. This is partly attributable to many price moving events occurring outside of official trading hours, with a larger proportion of volatility arising in the hours prior to the US market open than in the hours following its close. This could be explained by trading activity in Europe and Asia.
What is the true measure of volatility?
Before presenting various estimation methods and measuring their respective accuracies, we first need to define precisely what is meant by volatility.
Following the work of researchers such as Andersen and Bollersev, we assume that the log returns of assets can be described as an expression of the form: rt = σtzt, where σt is the unobservable underlying asset volatility that varies with time and zt is a source of random noise. We will forecast σt.
A trading day can be split into time intervals: 1 hour, 5 minutes, etc. Log returns can be calculated over these intervals. For example, when considering the 30-minute interval we calculate the returns from 9:00-9:30, 9:30-10:00, and so on. This framework shows that as the frequency at which returns are sampled increases, assuming certain conditions hold, the sum of squared returns converges to the realized variance. In symbols, letting rt,k,N denote the log return over bucket k on day t, which has been split into N buckets, we have:
It follows that a good proxy for the unobservable volatility of an asset can be obtained by considering returns over highly refined time buckets. Two guiding principles are:
Higher frequency returns should provide better estimates
Returns should not display strong autocorrelation
Principle 1 suggests that the obvious choice of a proxy for realized volatility would be 1 minute returns as this is the highest frequency used in our analysis. However, due to the ‘bid-ask bounce’ - where trade prices oscillate between fixed bid and ask levels - there is considerable negative autocorrelation at 1-minute frequency. Thus, violating principle 2.
How do these two principles hold up in our example? The below chart shows the lag 1 autocorrelation for S&P 500, 10Y Treasury Note and WTI Crude Oil futures. In each case, the magnitude of the autocorrelation is higher at the 1-minute frequency than at the 5-minute frequency, and considerably so for the 10Y Treasury Note futures. Based on these observations, we define the volatility based on 5-minute intervals as the proxy for the true measure of volatility. Therefore we have chosen the 5-minute frequency as reference point to measure the accuracy of the various volatility estimates.
Methodologies to estimate volatility
Below we present 14 volatility estimates and calculations based on the returns of the S&P 500 futures from 2007 to 2022. The methodologies can be split into three broad segments and seven specific categories.
How to best estimate volatility
When evaluating the various estimates we use the realized volatility as proxied by the one based on 5-minute intervals. The below chart shows this realized volatility, annualized, over the full sample period. As expected, we see the largest spikes in volatility occurring during larger sell-offs, such as in March 2020 and during the Global Financial Crisis in 2008.
To assess the accuracy of the various estimation methodologies, we make use of standard techniques; comparisons of both mean squared error and mean absolute errors, as well as regression models. Interestingly, the hierarchy of the results across the methods are very similar.
Below are the results for the mean absolute errors. The high-frequency intraday models all outperform the end-of-day and GARCH models by a considerable margin. The end-of-day methodologies ‘Constant’ and ‘Expanding Window’ are, as expected, rather poor. The EWMA improves upon simple rolling windows, GARCH(1, 1) offers a better estimation method and GJR-GARCH further improves the accuracy.
The hierarchy of the results is almost exactly as would be predicted based on the assumptions that more sophisticated end-of-day models produce better estimates, and intraday models extract more information and hence produce better results.
Furthermore, the hierarchy of the intraday models also corresponds to the theory. The estimates of the higher frequency forecast are generally more accurate but the negative autocorrelation at the highest available frequency has an adverse impact on the results.
What implications do these results have for investors? Many would agree that it is not prudent to proclaim any quantitative method to be universally superior. Any chosen approach needs to fit the investor’s specific situation and investment objective. However, our results show that the accuracy of volatility estimates can be improved by extracting information from intraday data. This in particular holds true when comparing the intraday results to traditional end-of-day methods.
However, making efficient use of intraday data is often easier said than done. Investors need to ensure both access to high quality and operationally ready data, as well as cutting-edge technological infrastructure to take full advantage of data-driven investment processes.
 ES futures are open 23 hours per day, meaning 6 x 23 = 138 data points at the 10m frequency
 Cash market hours are 9:30-16:00 Eastern Time (ET) - called ‘America/New_York’ in the chart
 17:00-18:00 Eastern Time (ET)
 Various academic studies show that ignoring drift may lead to more accurate volatility estimates due to the inherent difficulty in estimating expected returns