Need the logic for resampling volume.

ks_unni
I am resampling OHLC for 5min candle from the tick data, and its working fine.
however, I don't understand the logic for adding volume to each 5 min candle.
I tried to get the maximum of "volume" from ticks during the 5 mins - but its a very large value..
the I tried to get the sum of "last_quantity" from the ticks of each 5 mins - but its too small value.

I'm comparing the 5min volume of my candle with the same period candle data from historical data.. its is nowhere near..
So, can anyone tell me the logic for calculating volume for a 5 min candle from the streaming ticks?
  • RED
    Correct me if I am wrong,
    Your case: You are resampling a 1-min OHLC data to 5-min interval.
    Solution: Convert timestamp to minutes format, add all volumes in between two timeframes, where minutes is divisible by 5.
    Example:
    9:15 -> start counter and store volume data from 15+16+17+18+19 to a var.
    9:20 -> clear the counter and start fresh.

    The colume which u get @ 9:20 is the volume for 5-minute candle starting @9:15.
    ---------------------------------------------------------------------------------------------------------------------------------
    For websocket stream you get cumulative volumes for every tick, meaning every second. In this scenario follow the example given below.
    @ 9:15 -> volume = 5000
    @ 9:16 -> volume = 10000
    so volume for @9:15 candle becomes = 10000 - 5000 = 5000

    do this for every minute and sum it up for 5 candles to form the volume, use this in conjunction with the minutes%5 provided for ohlc.

    Sorry if I misunderstood your question.
  • ks_unni
    ks_unni edited May 2021
    I'm not resampling 1min to 5min.
    I'm resampling ticks to 5min, I'm taking maximum of "volume" from ticks during each 5 min.
    example:
    for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59
    and its a too big value than it is supposed to be,
    like - 9:15 -> 10000 (max of all ticks during the minute)
    9:16: -> 12000
    9:17-> 9000
    9:18-> 8500
    9:19 -> 11000
    in this scenario I'm taking maximum i.e 12000..

    and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.

    and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that.
    My question is how to achieve the same volume or any acceptable value near to the historical data api using tick data?

  • ks_unni
    ks_unni edited May 2021
    I have python class that represents each instruments..
    I'm separating ticks for each instruments and pass to this class.. and here is the code that makes my ohlc..
    everything is working fine in this code except volume.. please help me..


    def insert_tick(self,tick):
    if tick['timestamp'].date() == now.date():
    ts = tick['timestamp'] #local_tz.localize(tick['timestamp'])
    price = tick['last_price']
    volume = tick['volume']
    cursor = self.cursor
    cursor.execute(f"SELECT * FROM tbl_{self.token} ORDER BY ROWID desc limit 1")
    last_row = cursor.fetchone()
    if last_row != None:
    prev_ts = dt.datetime.fromtimestamp(last_row[0])
    if ts.day != prev_ts.day or (ts.minute % 5 == 0 and ts.minute != prev_ts.minute):
    ts = dt.datetime.timestamp(ts.replace(second = 0, microsecond =0))
    cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")
    else:
    new_close = price
    if price > last_row[2]:
    new_high = price
    else:
    new_high = last_row[2]
    if price < last_row[3]:
    new_low = price
    else:
    new_low = last_row[3]
    if volume > last_row[5]:
    new_vol = volume
    else:
    new_vol = last_row[5]
    cursor.execute(f"UPDATE tbl_{self.token} SET high = {new_high}, low = {new_low}, close = {new_close}, vol = {new_vol} where ts = {last_row[0]}")
    else:
    ts = dt.datetime.timestamp(ts.replace(minute = ts.minute - (ts.minute % 5),second = 0, microsecond =0))
    cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")

    print(f"{self.symbol}:{self.token} tick inserted")

    else:
    print("it might be a holiday")
  • ks_unni
    Ok @RED, Thanks.. I read your answer again and I think I understood about the cumulative volume, so what about the very first tick get at 9:15:00 Does that contain the previous days last tick volume or is it just the volume of that second?
  • RED
    @ks_unni
    Code changes:
    1. Check this link to arrange tick values-> https://kite.trade/forum/discussion/2604/convert-ticks-to-candle#latest
    2. Never access database in main loop, seperate reading and updating database activities from the main code, dump them in a queue and execute them using multithreading. Modern day systems are capable of handling large amounts of data, so you can either use pandas or dictionaries to salvage time.
    3. You can find youtube videos made by @Imran (aka Tradehull) on how to process tick data
    Volume ambiguity:
    example:
    for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59
    and its a too big value than it is supposed to be,
    like - 9:15 -> 10000 (max of all ticks during the minute)
    9:16: -> 12000
    9:17-> 9000
    9:18-> 8500
    9:19 -> 11000
    in this scenario I'm taking maximum i.e 12000..

    and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.

    and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that.
    You are getting data from websockets, in which volumes are cumulative(except some random cases, negligible count). you need to calculate the volume_difference and add them to respective minute candles.

    check the image below.

  • RED
    RED edited May 2021
    @ks_unni
    Ok @RED, Thanks.. I read your answer again and I think I understood about the cumulative volume, so what about the very first tick get at 9:15:00 Does that contain the previous days last tick volume or is it just the volume of that second?
    Every day starts fresh, you dont get previous day volume to todays volume count.

    But significant amount of people tend to trade in pre-market using AMO orders(Check this in order types), meaning orders are placed before 9:15 using AMO orders. So these get executed @ 9:07 and you will get this volume @ 9:15

    Ref Links: https://zerodha.com/z-connect/queries/stock-and-fo-queries/pre-marketpost-marketafter-market-orders
  • ks_unni
    Thanks @RED ,
    This code is being executed from a worker thread.
    From the websockets, I simply puts the tick to a queue and worker thread separates the tick for each instruments and calls the function of each instrument object.. So, it works fine with 150 instruments simultaneously without any delay or 1006 error. (may be I can add even more instruments ).

    And thanks for the clarification about cumulative volume, that makes sense.
    as you said, volume difference is to be calculated, I was taking the max of the volume as it is..
    Thank you very much.
  • ks_unni
    two more questions, if the 9:15:00 tick contains cumulative volume of pre-market minutes, how do we extract that particular ticks volume? what do we subtract?

    if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
  • RED
    RED edited May 2021
    @ks_unni
    two more questions, if the 9:15:00 tick contains cumulative volume of pre-market minutes, how do we extract that particular ticks volume? what do we subtract?
    Try to omit pre-market volume, set your volume at market open as your pivot and compute from that for every order. Pre-market orders only make sense if they are of huge volumes. You cant surmise the direction from them.
    if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
    In this case nothing much can be done, try to fetch data from historical for the day and continue your websocket stream for the remaining hours. Better to stay away from market if your websocket starts after 11AM
  • ks_unni
    ks_unni edited May 2021
    I'll comment after analysing the initial ticks tomorrow. unfortunately today is a holiday.
    I also think someone from zerodha , should look into this discussion, because there must be some straight forward way to fetch the volume at any time of the day.. an api should be like that..
  • ks_unni
    Here is the first minute tick data for RELIANCE..


    and the same minutes, 1 minute ohlc retried through historical data.


    you can see that the total of volume differnces during 9:15 is 96138
    and the volume got for 9:15 through historical data is 132887

    the figures are significantly different, my question is how to get the 132887 or any acceptable figure nearby
    using the tick data?


  • sujith
    Are you trying to capture the volume of each tick?
  • ks_unni
    ks_unni edited May 2021
    not capturing all ticks. just to ask this question.. I dumped it.
    I just need the logic to calculate volume of a 1 minute, or 5 minute which is same or similar to the historical data of same period..
    logic for 1 minute is enough, since 5 min volume is just sum of the five 1 minutes..
  • sujith
    sujith edited May 2021
    Volume of the minute = volume of last tick - volume of first tick.
  • ks_unni
    Volume of the minute = volume of last tick - volume of first tick.
    As per this calculation, first 2 minutes of the day are significantly different. rest of the day it is almost similar.
    please see the image below

    Since my strategy is based on vwap, this initial difference will affect the indicator values..
    So, how can I avoid this?
  • nithishkailas
    nithishkailas edited August 2021
    Why adding up 'last_quantity' of each ticks in that minute don't sum up to that minute's total volume?
  • sujith
    There can be hundreds of ticks in a second at the exchange any trading platform over the internet can capture only 1- 2 ticks over the internet. So the candle data you see is only a trend and not the accurate data.
    For accurate data, you will need tick by tick data that is possible to capture in the setup inside the exchange premises wherein all the systems are connected via LAN.

    We can help you get the colo setup done at the exchange, the cost starts from 18lakh onwards per annum. You can write to kiteconnect(at)zerodha.com for the same.
Sign In or Register to comment.