On the last week I've downloaded historical data by few intervals (5, 10, 15, 30 minutes, 1 hour and 4 hours) and instruments (FXCM, COMEX, S&P500). When I started to check the completeness of the market data I found a huge amount of lost quotes by every instruments and intervals.
For example, by symbol="AUDCAD.FXCM" and interval=15 minutes there is 2 hours loss:
...
HIX_15_AUDCAD.FXCM,2010-01-10 19:45:00,0.95594,0.95499,0.95559,0.95594,3286,176,\r\n
HIX_15_AUDCAD.FXCM,2010-01-10 22:00:00,0.95581,0.95509,0.95575,0.95538,3531,245,\r\n
...
Or two losses of data by symbol="COP" and interval=15 minutes:
...
HIX_15_COP,2012-03-30 16:45:00,76.0100,76.0100,76.0100,76.0100,6801405,246,\r\n
HIX_15_COP,2012-03-30 18:30:00,76.1000,76.1000,76.1000,76.1000,6887502,350,\r\n
HIX_15_COP,2012-03-30 19:00:00,75.8900,75.8900,75.8900,75.8900,6890502,3000,\r\n
...
And many other.
I've created a csv file (80 MB) with all the losses (from 30 minutes to 6 hours) of market data by 15 minutes intervals on my dropbox. Here is a link:
https://www.dropbox.com/s/l6aujnggobsiurn/export_diff.csv?dl=0In the file there are four columns: symbol's name ("code"), size of loss ("difference"), date and time of second quote of the loss ("trade_timestamp"), date and time of first quote of the loss ("lag_date").
In addition, there are no interval data (only daily data) by the following symbols from COMEX market:
"@VGA#"
"@VGQ#"
"@VGS#"
"@VSQ#"
"@VSS#"
"@VSY#"
So my questions are:
1. Why there are so many losses of market data?
2. Is it possible fix all the losses of interval data by 5, 10, 15, 30 minutes, 1 hour and 4 hours?
3. Why there are no data by COMEX symbols defined above?
Best regards,
Evgeniy