I am using the Kite Connect Python API (v3) and KiteTicker WebSocket to receive live market data and build candles for Nifty Futures & Options (NFO) contracts. I have encountered consistent discrepancies between the candle data (specifically Volume, but also sometimes OHLC) calculated from the WebSocket tick stream and the data retrieved using the historical_data API endpoint for the same completed candle interval.
Intervals: 10-minute and 1-hour (aligned to 9:15 AM market open)
Goal: Log accurate OHLCV data for strategy backtesting and potential live trading, requiring consistent data between real-time aggregation and historical verification.
Methodology:
WebSocket Candle Aggregation (ws_... values):
Subscribe to instrument tokens via KiteTicker in MODE_FULL.
Receive ticks via the on_ticks callback.
For each interval:
ws_open: ltp of the first tick in the interval.
ws_high/ws_low: Max/Min ltp seen during the interval.
ws_close: ltp of the last tick processed before the next interval begins.
ws_volume: Calculated as EndVol - StartVol, where StartVol is the cumulative volume_traded from the last tick before the interval started, and EndVol is the cumulative volume_traded from the last tick before the next interval started.
API Candle Fetch (api_... values):
Shortly after an interval completes (e.g., after 11:25:00 for the 11:15 candle), call the kiteconnect_instance.historical_data() endpoint.
Request data for the specific instrument_token, the specific interval (e.g., '10minute'), and set from_date and to_date to cover only that completed interval (e.g., from_date='YYYY-MM-DD 11:15:00', to_date='YYYY-MM-DD 11:24:59').
Extract the open, high, low, close, and volume fields from the returned candle data.
Observed Discrepancy:
When comparing the ws_... values and the api_... values for the same completed interval, I consistently observe differences:
Volume: The ws_volume (calculated from tick deltas) is almost always higher than the api_volume. This is the most significant and consistent discrepancy.
OHLC: Differences are also observed in OHLC values. Often ws_open differs slightly from api_open, and ws_close differs from api_close. High and Low match more frequently but can also differ occasionally.
Example Volume Data (HAL25MAYFUT - 14647554 on 2025-04-28):
ws_volume: Calculated as EndVol - StartVol, where StartVol is the cumulative volume_traded from the last tick before the interval started, and EndVol is the cumulative volume_traded from the last tick before the next interval started.
Volume: The ws_volume (calculated from tick deltas) is almost always higher than the api_volume. This is the most significant and consistent discrepancy.
It should be the same. We use the same logic(summation of tick volume, which is the same as end volume_traded - start volume_traded for the minute) in our Kite-charts/Historical data APIs. Make sure, you are using the correct field volume_traded, and for the correct exchange_timestamp, along with the correct token.
ws_open: ltp of the first tick in the interval. ws_high/ws_low: Max/Min ltp seen during the interval. ws_close: ltp of the last tick processed before the next interval begins.
OHLC: Differences are also observed in OHLC values. Often ws_open differs slightly from api_open, and ws_close differs from api_close. High and Low match more frequently but can also differ occasionally.
Logic is correct for the OHLC formation. Make sure you are parsing/consuming all ticks properly in WS. Maybe log all errors and disconnections, and inspect. Go through this thread to learn more about tick formation. You can go through this open-source library to get more ideas.
@rakeshr Thanks for checking and replying back to this. I have made some changes in my code as per the your recommendations and there has been improvement. I see now OHLC matching for the same intervals when compared to Intraday API vs web socket. However still I see there is slight difference in volume. I would like to observe the data logs for today and may be tomorrow as well and will update if the issue has been really fixed or not.
@rakeshr As mentioned earlier, I have: Ensured my WebSocket candle aggregation logic uses the volume_traded field and calculates interval volume as EndVol - StartVol. Confirmed my OHLC derivation from ticks matches standard methodology (first/last LTP, min/max LTP). Corrected my timestamp handling to properly process exchange_timestamp and align intervals. These changes have successfully resolved the previous issues with incorrect timestamps and major OHLC discrepancies. I'm now observing that the OHLC data from my WebSocket aggregation matches the historical_data API almost perfectly for NFO Futures contracts (e.g., [Instrument Symbol like TRENT25MAYFUT or HAL25MAYFUT that you last showed me]).
However, a consistent, though usually small, discrepancy in the volume persists. Typically, the volume calculated from WebSocket ticks is slightly higher than the volume reported by the historical_data API for the same interval.
Please see the attached file for comparison for [Instrument Symbol, e.g., TRENT25MAYFUT] on May 6, 2025, for 10-minute candles.
As you can see, while OHLC is largely identical, the volume from WebSocket aggregation is mostly a bit higher. Interestingly, for the 13:05 and 13:15 intervals, the volumes matched perfectly, suggesting the calculation methodology can align.
Could you please provide any further technical insight into why this small, persistent volume difference might occur, even when the aggregation logic is consistent with what was suggested for historical data? Understanding this is crucial for the accuracy of our systems.
Thank you for your continued support. Regards Subhro
volume_traded
, and for the correctexchange_timestamp
, along with the correct token.Logic is correct for the OHLC formation. Make sure you are parsing/consuming all ticks properly in WS. Maybe log all errors and disconnections, and inspect. Go through this thread to learn more about tick formation. You can go through this open-source library to get more ideas.
Thanks for checking and replying back to this.
I have made some changes in my code as per the your recommendations and there has been improvement.
I see now OHLC matching for the same intervals when compared to Intraday API vs web socket.
However still I see there is slight difference in volume.
I would like to observe the data logs for today and may be tomorrow as well and will update if the issue has been really fixed or not.
Regards
Subhro
As mentioned earlier, I have:
Ensured my WebSocket candle aggregation logic uses the volume_traded field and calculates interval volume as EndVol - StartVol.
Confirmed my OHLC derivation from ticks matches standard methodology (first/last LTP, min/max LTP).
Corrected my timestamp handling to properly process exchange_timestamp and align intervals.
These changes have successfully resolved the previous issues with incorrect timestamps and major OHLC discrepancies. I'm now observing that the OHLC data from my WebSocket aggregation matches the historical_data API almost perfectly for NFO Futures contracts (e.g., [Instrument Symbol like TRENT25MAYFUT or HAL25MAYFUT that you last showed me]).
However, a consistent, though usually small, discrepancy in the volume persists. Typically, the volume calculated from WebSocket ticks is slightly higher than the volume reported by the historical_data API for the same interval.
Please see the attached file for comparison for [Instrument Symbol, e.g., TRENT25MAYFUT] on May 6, 2025, for 10-minute candles.
As you can see, while OHLC is largely identical, the volume from WebSocket aggregation is mostly a bit higher. Interestingly, for the 13:05 and 13:15 intervals, the volumes matched perfectly, suggesting the calculation methodology can align.
Could you please provide any further technical insight into why this small, persistent volume difference might occur, even when the aggregation logic is consistent with what was suggested for historical data? Understanding this is crucial for the accuracy of our systems.
Thank you for your continued support.
Regards
Subhro