I have recently tried to benchmark market data quality of Zerodha data collected via Kiteconnect Python API vs another vendor eSignal (1 minute bucket open, low, high, close data).
I used 0.5 seconds pulsed trade data for 1 day for 85 stocks and using that created 1 minute open, high, low and close from zerodha data (lets call it z). I already had the same from e-Signal but they compute it using all the ticks in the minute (unlike data streamed by zerodha which is pulsed, so might miss or delay information between two pulses).
Please see the table below for for overall results of this matching experiment. The numbers are percentage of minutes where z and e matched exactly, or one was lower or higher as per the column name.
Good thing is high, low and close match exactly more than 80% of times. I knew pulsed data can miss ticks between the pulses, so I was not expecting 100% match.
But what caught my attention was that z lows were lower than e lows around 4.88 % of times! If one misses ticks in a minute, one can have a higher low than "true" low, and lower high than "true" high, but how can one go beyond the true high and low of a minute?
So, I got curious and wanted to see if these time-points where z lows are lower and z highs are higher than their e counterparts, have a certain pattern. In particular, I had this thought in mind that if the time stamps between the two sources are not synchronized, it would be possible for z to capture the low of previous minute which actually might be lower than low of this minute.
Here is what I got as the Histogram of location of these lower lows and higher highs, within a minute.
This shows a decaying graph which squares with the hypothesis that this is because of previous minute's data spilling over into this minute. Likelihood of data spilling over to "n seconds" into the next minute reduces as n increases because it is plausible to assume in any real system probability of extreme delays falls with delay.
Note the consistency of both the histograms. These spillovers happen about 5-7% of times, but when they do, they can run into several seconds and even 10s of seconds as per this graph.
Can someone from Kiteconnect tech team look into this? I am happy to provide exact data where I have measured so you can tally with your own records of that time.
Regards, Siddharth PS: The timestamps of zerodha data are self timestamped on receiving it. Please let me know if they send exchange time-stamp or zerodha time-stamp of the creation time of the data packet.
Thanks for the in-depth analysis, Siddarth. As I mentioned, 10s is indeed very anomalous. Our team is looking into this.
In the meanwhile, we have a new version of the WebSocket API coming out soon (with breaking changes) that'll include a timestamp field. That should help detect deviations immediately at the receiving end.
Thanks a lot Kailash. Happy to share any more detail with your team, if they might require. This timestamp field will be stupendous! Is there a way to subscribe to this annoucement via an email so I don't miss it?
In the meanwhile, we have a new version of the WebSocket API coming out soon (with breaking changes) that'll include a timestamp field. That should help detect deviations immediately at the receiving end.
This timestamp field will be stupendous! Is there a way to subscribe to this annoucement via an email so I don't miss it?
We will announce it on the forum by next week.
The timestamp feature will be added in future releases.