# Need the logic for resampling volume.

edited May 2021
I am resampling OHLC for 5min candle from the tick data, and its working fine.
however, I don't understand the logic for adding volume to each 5 min candle.
I tried to get the maximum of "volume" from ticks during the 5 mins - but its a very large value..
the I tried to get the sum of "last_quantity" from the ticks of each 5 mins - but its too small value.

I'm comparing the 5min volume of my candle with the same period candle data from historical data.. its is nowhere near..
So, can anyone tell me the logic for calculating volume for a 5 min candle from the streaming ticks?
Tagged:
• Correct me if I am wrong,
Your case: You are resampling a 1-min OHLC data to 5-min interval.
Solution: Convert timestamp to minutes format, add all volumes in between two timeframes, where minutes is divisible by 5.
Example:
9:15 -> start counter and store volume data from 15+16+17+18+19 to a var.
9:20 -> clear the counter and start fresh.

The colume which u get @ 9:20 is the volume for 5-minute candle starting @9:15.
---------------------------------------------------------------------------------------------------------------------------------
For websocket stream you get cumulative volumes for every tick, meaning every second. In this scenario follow the example given below.
@ 9:15 -> volume = 5000
@ 9:16 -> volume = 10000
so volume for @9:15 candle becomes = 10000 - 5000 = 5000

do this for every minute and sum it up for 5 candles to form the volume, use this in conjunction with the minutes%5 provided for ohlc.

Sorry if I misunderstood your question.
• edited May 2021
I'm not resampling 1min to 5min.
I'm resampling ticks to 5min, I'm taking maximum of "volume" from ticks during each 5 min.
example:
for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59
and its a too big value than it is supposed to be,
like - 9:15 -> 10000 (max of all ticks during the minute)
9:16: -> 12000
9:17-> 9000
9:18-> 8500
9:19 -> 11000
in this scenario I'm taking maximum i.e 12000..

and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.

and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that.
My question is how to achieve the same volume or any acceptable value near to the historical data api using tick data?

• edited May 2021
I have python class that represents each instruments..
I'm separating ticks for each instruments and pass to this class.. and here is the code that makes my ohlc..

def insert_tick(self,tick):
if tick['timestamp'].date() == now.date():
ts = tick['timestamp'] #local_tz.localize(tick['timestamp'])
price = tick['last_price']
volume = tick['volume']
cursor = self.cursor
cursor.execute(f"SELECT * FROM tbl_{self.token} ORDER BY ROWID desc limit 1")
last_row = cursor.fetchone()
if last_row != None:
prev_ts = dt.datetime.fromtimestamp(last_row[0])
if ts.day != prev_ts.day or (ts.minute % 5 == 0 and ts.minute != prev_ts.minute):
ts = dt.datetime.timestamp(ts.replace(second = 0, microsecond =0))
cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")
else:
new_close = price
if price > last_row[2]:
new_high = price
else:
new_high = last_row[2]
if price < last_row[3]:
new_low = price
else:
new_low = last_row[3]
if volume > last_row[5]:
new_vol = volume
else:
new_vol = last_row[5]
cursor.execute(f"UPDATE tbl_{self.token} SET high = {new_high}, low = {new_low}, close = {new_close}, vol = {new_vol} where ts = {last_row[0]}")
else:
ts = dt.datetime.timestamp(ts.replace(minute = ts.minute - (ts.minute % 5),second = 0, microsecond =0))
cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")

print(f"{self.symbol}:{self.token} tick inserted")

else:
print("it might be a holiday")
• Ok @RED, Thanks.. I read your answer again and I think I understood about the cumulative volume, so what about the very first tick get at 9:15:00 Does that contain the previous days last tick volume or is it just the volume of that second?
• @ks_unni
Code changes:
2. Never access database in main loop, seperate reading and updating database activities from the main code, dump them in a queue and execute them using multithreading. Modern day systems are capable of handling large amounts of data, so you can either use pandas or dictionaries to salvage time.
3. You can find youtube videos made by @Imran (aka Tradehull) on how to process tick data
Volume ambiguity:
example:
for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59
and its a too big value than it is supposed to be,
like - 9:15 -> 10000 (max of all ticks during the minute)
9:16: -> 12000
9:17-> 9000
9:18-> 8500
9:19 -> 11000
in this scenario I'm taking maximum i.e 12000..

and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.

and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that.
You are getting data from websockets, in which volumes are cumulative(except some random cases, negligible count). you need to calculate the volume_difference and add them to respective minute candles.

check the image below.

• edited May 2021
@ks_unni
Ok @RED, Thanks.. I read your answer again and I think I understood about the cumulative volume, so what about the very first tick get at 9:15:00 Does that contain the previous days last tick volume or is it just the volume of that second?
Every day starts fresh, you dont get previous day volume to todays volume count.

But significant amount of people tend to trade in pre-market using AMO orders(Check this in order types), meaning orders are placed before 9:15 using AMO orders. So these get executed @ 9:07 and you will get this volume @ 9:15

• Thanks @RED ,
This code is being executed from a worker thread.
From the websockets, I simply puts the tick to a queue and worker thread separates the tick for each instruments and calls the function of each instrument object.. So, it works fine with 150 instruments simultaneously without any delay or 1006 error. (may be I can add even more instruments ).

And thanks for the clarification about cumulative volume, that makes sense.
as you said, volume difference is to be calculated, I was taking the max of the volume as it is..
Thank you very much.
• two more questions, if the 9:15:00 tick contains cumulative volume of pre-market minutes, how do we extract that particular ticks volume? what do we subtract?

if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
• edited May 2021
@ks_unni
two more questions, if the 9:15:00 tick contains cumulative volume of pre-market minutes, how do we extract that particular ticks volume? what do we subtract?
Try to omit pre-market volume, set your volume at market open as your pivot and compute from that for every order. Pre-market orders only make sense if they are of huge volumes. You cant surmise the direction from them.
if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
In this case nothing much can be done, try to fetch data from historical for the day and continue your websocket stream for the remaining hours. Better to stay away from market if your websocket starts after 11AM
• edited May 2021
I'll comment after analysing the initial ticks tomorrow. unfortunately today is a holiday.
I also think someone from zerodha , should look into this discussion, because there must be some straight forward way to fetch the volume at any time of the day.. an api should be like that..
• Here is the first minute tick data for RELIANCE..

and the same minutes, 1 minute ohlc retried through historical data.

you can see that the total of volume differnces during 9:15 is 96138
and the volume got for 9:15 through historical data is 132887

the figures are significantly different, my question is how to get the 132887 or any acceptable figure nearby
using the tick data?

• Are you trying to capture the volume of each tick?
• edited May 2021
not capturing all ticks. just to ask this question.. I dumped it.
I just need the logic to calculate volume of a 1 minute, or 5 minute which is same or similar to the historical data of same period..
logic for 1 minute is enough, since 5 min volume is just sum of the five 1 minutes..
• edited May 2021
Volume of the minute = volume of last tick - volume of first tick.
• Volume of the minute = volume of last tick - volume of first tick.
As per this calculation, first 2 minutes of the day are significantly different. rest of the day it is almost similar.