streaming data captured by two server locations significantly different!

sidverm
Market Data Streaming Team,

This is to bring to your attention the experiment I conducted yesterday.

I captured live streaming market data by running python based API on two servers - one in Taiwan and one in Mumbai.

I was expecting a delay in tick data receive time in Taiwan of the order of 10s of milliseconds after that in Mumbai. But it so appears that the delay is of the order of 100s of millisecond typically, and can run into seconds too! Not only that, the "delay" is more likely to be negative than positive! i.e. a tick gets received first in Taiwan than in Mumbai more than 50% of times!

But the most disappointing finding was that a significant fraction of ticks received in Taiwan were never received in Mumbai and vice versa!

This experiment was conducted over 2 hours of data on 06-Nov-2017. I used only the trade data i.e. last_price and last_quantity for over 1500 instruments. Since currently there is no "send time timestamp" with the streamed data, I relied on time-stamping done at the server. Both servers are of Google Cloud, so expect system clocks to be synchronized closely. Also, since there is no unique identifier of a tick, it is an imperfect exercise to infer which tick in Mumbai matches with which tick in Taiwan.

I tried two matching algorithms. One in which I take a tick received in Taiwan, and look for a tick in Mumbai of the same instrument, that matches exactly in last_price and last_size looking backwards in time from receive time in Taiwan. I call the first such tick as "matched". Second algorithm is when I give a leeway that a tick in Mumbai, may also be received later than in Taiwan, therefore I allow to search both backwards and forward in time and find the nearest that matches.

Despite putting no restriction on how far forward or backward in time to run this matching algorithm, there were significant number of ticks in Tokyo, that found no match in Mumbai!

This table summarizes the ticks not found statistics:


If we search only backwards more than 10% of ticks in Tokyo were never found in Mumbai! That prompted me to search 'nearest'. Still 2.3% of ticks are lost! Then I tried ignoring last_quantity, only then the match is quite close, but this is clearly not a matching because if last_quantity is different then it is certainly a different trade.

Then I wanted to see the delay between Taiwan and Mumbai of receiving the "matched" tick. Following is the histogram for "backward" matching algorithm.




But because we saw that algo drops infers a tick drop rate of more than 10%, it is more likely that sometimes ticks arrive in Taiwan earlier than Mumbai. We stick with matching both price and quantity while we search "nearest".



Note that the area under the chart on the negative side looks larger than that on the positive side, at least visually! I would assume zerodha's data centers lie close to the exchanges in Mumbai, so would arrive at Google Cloud servers in Mumbai within 1 or 2 ms and take 20-30 ms to reach Taiwan, but maybe geographical delays have a small part to play compared to some larger "noise" in tick data delay. (See this thread I posted last week that suggests there is a huge 100s of ms of delay typically and it varies into 10s of seconds at times!)

This calls for a deeper understanding of how market data is disseminated by Kite Connect API to its subscribers. Is it a level playing field agnostic of location of the subscriber? Is there a unreasonable delay in tick data that too randomly varying with time?
  • revendar
    This is awesome!
    Something to do with Zeroda DC locations?!
  • sidverm
    sidverm edited November 2017
    I think this can only be explained if Zerodha servers are
    1) Multiple
    2) not synchronized with each other
    3) their system capacity is not designed to carry the load they do
Sign In or Register to comment.