Delay in Zerodha streaming data quantified accurately using co-lo tick-data!

sidverm
Hi all,

In my efforts to understand how delayed is the zerodha's KiteConnect API captured data, consider this as my final submission.

I managed to get hold of a sample one day, a few stock data captured at a co-located (co-lo) server near NSE.

This study has been done on 13 Nifty 50 stocks on 06-Oct-2017. On one hand I had Zerodha's data captured via Python API, which is known to be a 0.5 second pulsed data i.e. they only send latest available order-book and last trade information every 0.5 seconds in each instrument (only if it has changed relative to previous snapshot they sent).

On the other hand I had access to co-lo data which has delays in micro-seconds as the datacenters are located right next to the exchange. Using the time-stamps of co-lo data and that of zerodha tick-data one can compute the delays.

The challenge is, there is no unique identifier of a market data snapshot that is present in each dataset which could act as a foreign key to join on. So one has to rely on assumptions. Here is the algorithm I followed:
  1. Take each zerodha orderbook snapshot and including the last trade information
  2. Go backwards in time in the co-lo data from the 'receive_time' of zerodha data to find a tick that matches exactly in 22 fields! (the exact same bid and ask prices and quantities at all levels of order book as well as exact trade price and quantity)
  3. Sometimes there may be multiple ticks in co-lo data that match with zerodha tick. It is impossible to pin down which one was picked up by zerodha. So we note down the earliest match as well as the latest match.
  4. We then draw a Kernel Density Estimate (advanced smoothed histogram) of delay (in second) for both the earliest match as well as latest match.
Good relief is, I was able to finding a match for each of 800,000 ticks zerodha data among at least one of 8 million ticks in co-lo data! (Zerodha data is certainly accurate in the sense that not a single bit is corrupted in this vast exercise).

Here is the plot that we obtained:


This gives a pretty accurate picture of the distribution of delay. Delay varies from tick to tick. And you can see that it frequently goes up to 2 seconds. Average appears to be under 1 second.
  • nithin
    Thanks @sidverm for the detailed analysis :smile: . Approximate 1 second or lesser is the benchmark considering connectivity over internet and bandwidth restrictions we have at Indian data centers. We are doing a bunch of optimizations that should make it faster in the next couple of months.
  • sidverm
    sidverm edited November 2017
    Thanks @nithin, until I measure myself, I don't feel comfortable in using any data. Kind of obsessive compulsive disorder :)

    It will certainly help if you optimize further and bring not only average but say 99th percentile delay under 1 second.
  • sauravkedia
    sauravkedia edited November 2017
    @sidverm great job. It sorts out a lot of gray areas for all of us.

    @nitin, Many of us operate in intraday markets and for us speed of execution is important. From what it appears, currently even the users who are hosted on cloud, despite having access to high speed internet are facing this latency.

    Perhaps, as a next step in your evolution, you could consider setting up a datacenter (not co-located one) so as to host/co-locate users within your infrastructure - to further cut down on latencies. I am talking of a middle ground between co-location at exchange and web based trading here.

    I had a similar setup earlier, and the speed was much higher.
  • sidverm
    @sauravkedia Having a datacenter to host users within zerodha infrastructure would be awesome! Thanks for the idea. Hope they are listening.
Sign In or Register to comment.