Kite Connect API - Volume Discrepancy between WebSocket Ticks and Historical API for NFO Futures

Subhro
Hello Zerodha Team / Community,

I am using the Kite Connect Python API (v3) and KiteTicker WebSocket to receive live market data and build candles for Nifty Futures & Options (NFO) contracts. I have encountered consistent discrepancies between the candle data (specifically Volume, but also sometimes OHLC) calculated from the WebSocket tick stream and the data retrieved using the historical_data API endpoint for the same completed candle interval.

Context:

API: Kite Connect Python API (v3)

WebSocket: KiteTicker (Python library)

Instrument Type: NFO Futures (Example: HAL25MAYFUT, Instrument Token: 14647554)

Intervals: 10-minute and 1-hour (aligned to 9:15 AM market open)

Goal: Log accurate OHLCV data for strategy backtesting and potential live trading, requiring consistent data between real-time aggregation and historical verification.

Methodology:

WebSocket Candle Aggregation (ws_... values):

Subscribe to instrument tokens via KiteTicker in MODE_FULL.

Receive ticks via the on_ticks callback.

For each interval:

ws_open: ltp of the first tick in the interval.

ws_high/ws_low: Max/Min ltp seen during the interval.

ws_close: ltp of the last tick processed before the next interval begins.

ws_volume: Calculated as EndVol - StartVol, where StartVol is the cumulative volume_traded from the last tick before the interval started, and EndVol is the cumulative volume_traded from the last tick before the next interval started.

API Candle Fetch (api_... values):

Shortly after an interval completes (e.g., after 11:25:00 for the 11:15 candle), call the kiteconnect_instance.historical_data() endpoint.

Request data for the specific instrument_token, the specific interval (e.g., '10minute'), and set from_date and to_date to cover only that completed interval (e.g., from_date='YYYY-MM-DD 11:15:00', to_date='YYYY-MM-DD 11:24:59').

Extract the open, high, low, close, and volume fields from the returned candle data.

Observed Discrepancy:

When comparing the ws_... values and the api_... values for the same completed interval, I consistently observe differences:

Volume: The ws_volume (calculated from tick deltas) is almost always higher than the api_volume. This is the most significant and consistent discrepancy.

OHLC: Differences are also observed in OHLC values. Often ws_open differs slightly from api_open, and ws_close differs from api_close. High and Low match more frequently but can also differ occasionally.

Example Volume Data (HAL25MAYFUT - 14647554 on 2025-04-28):

Time ws_volume (Calculated) api_volume (Fetched) Difference
11:05:00 83700 58350 25350
11:15:00 75450 73800 1650
11:25:00 54300 52500 1800
11:35:00 74400 72450 1950
11:45:00 41250 41100 50


Could you kindly help me to understand why I am facing this discrepancy in data and what could be the best possible way to fix this please.
Sign In or Register to comment.