I am resampling OHLC for 5min candle from the tick data, and its working fine. however, I don't understand the logic for adding volume to each 5 min candle. I tried to get the maximum of "volume" from ticks during the 5 mins - but its a very large value.. the I tried to get the sum of "last_quantity" from the ticks of each 5 mins - but its too small value.
I'm comparing the 5min volume of my candle with the same period candle data from historical data.. its is nowhere near.. So, can anyone tell me the logic for calculating volume for a 5 min candle from the streaming ticks?
Correct me if I am wrong, Your case: You are resampling a 1-min OHLC data to 5-min interval. Solution: Convert timestamp to minutes format, add all volumes in between two timeframes, where minutes is divisible by 5. Example: 9:15 -> start counter and store volume data from 15+16+17+18+19 to a var. 9:20 -> clear the counter and start fresh.
The colume which u get @ 9:20 is the volume for 5-minute candle starting @9:15. --------------------------------------------------------------------------------------------------------------------------------- For websocket stream you get cumulative volumes for every tick, meaning every second. In this scenario follow the example given below. @ 9:15 -> volume = 5000 @ 9:16 -> volume = 10000 so volume for @9:15 candle becomes = 10000 - 5000 = 5000
do this for every minute and sum it up for 5 candles to form the volume, use this in conjunction with the minutes%5 provided for ohlc.
I'm not resampling 1min to 5min. I'm resampling ticks to 5min, I'm taking maximum of "volume" from ticks during each 5 min. example: for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59 and its a too big value than it is supposed to be, like - 9:15 -> 10000 (max of all ticks during the minute) 9:16: -> 12000 9:17-> 9000 9:18-> 8500 9:19 -> 11000 in this scenario I'm taking maximum i.e 12000..
and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.
and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that. My question is how to achieve the same volume or any acceptable value near to the historical data api using tick data?
I have python class that represents each instruments.. I'm separating ticks for each instruments and pass to this class.. and here is the code that makes my ohlc.. everything is working fine in this code except volume.. please help me..
def insert_tick(self,tick): if tick['timestamp'].date() == now.date(): ts = tick['timestamp'] #local_tz.localize(tick['timestamp']) price = tick['last_price'] volume = tick['volume'] cursor = self.cursor cursor.execute(f"SELECT * FROM tbl_{self.token} ORDER BY ROWID desc limit 1") last_row = cursor.fetchone() if last_row != None: prev_ts = dt.datetime.fromtimestamp(last_row[0]) if ts.day != prev_ts.day or (ts.minute % 5 == 0 and ts.minute != prev_ts.minute): ts = dt.datetime.timestamp(ts.replace(second = 0, microsecond =0)) cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})") else: new_close = price if price > last_row[2]: new_high = price else: new_high = last_row[2] if price < last_row[3]: new_low = price else: new_low = last_row[3] if volume > last_row[5]: new_vol = volume else: new_vol = last_row[5] cursor.execute(f"UPDATE tbl_{self.token} SET high = {new_high}, low = {new_low}, close = {new_close}, vol = {new_vol} where ts = {last_row[0]}") else: ts = dt.datetime.timestamp(ts.replace(minute = ts.minute - (ts.minute % 5),second = 0, microsecond =0)) cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")
Ok @RED, Thanks.. I read your answer again and I think I understood about the cumulative volume, so what about the very first tick get at 9:15:00 Does that contain the previous days last tick volume or is it just the volume of that second?
Never access database in main loop, seperate reading and updating database activities from the main code, dump them in a queue and execute them using multithreading. Modern day systems are capable of handling large amounts of data, so you can either use pandas or dictionaries to salvage time.
You can find youtube videos made by @Imran (aka Tradehull) on how to process tick data
Volume ambiguity:
example: for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59 and its a too big value than it is supposed to be, like - 9:15 -> 10000 (max of all ticks during the minute) 9:16: -> 12000 9:17-> 9000 9:18-> 8500 9:19 -> 11000 in this scenario I'm taking maximum i.e 12000..
and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.
and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that.
You are getting data from websockets, in which volumes are cumulative(except some random cases, negligible count). you need to calculate the volume_difference and add them to respective minute candles.
Ok @RED, Thanks.. I read your answer again and I think I understood about the cumulative volume, so what about the very first tick get at 9:15:00 Does that contain the previous days last tick volume or is it just the volume of that second?
Every day starts fresh, you dont get previous day volume to todays volume count.
But significant amount of people tend to trade in pre-market using AMO orders(Check this in order types), meaning orders are placed before 9:15 using AMO orders. So these get executed @ 9:07 and you will get this volume @ 9:15
Thanks @RED , This code is being executed from a worker thread. From the websockets, I simply puts the tick to a queue and worker thread separates the tick for each instruments and calls the function of each instrument object.. So, it works fine with 150 instruments simultaneously without any delay or 1006 error. (may be I can add even more instruments ).
And thanks for the clarification about cumulative volume, that makes sense. as you said, volume difference is to be calculated, I was taking the max of the volume as it is.. Thank you very much.
two more questions, if the 9:15:00 tick contains cumulative volume of pre-market minutes, how do we extract that particular ticks volume? what do we subtract?
if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
two more questions, if the 9:15:00 tick contains cumulative volume of pre-market minutes, how do we extract that particular ticks volume? what do we subtract?
Try to omit pre-market volume, set your volume at market open as your pivot and compute from that for every order. Pre-market orders only make sense if they are of huge volumes. You cant surmise the direction from them.
if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
In this case nothing much can be done, try to fetch data from historical for the day and continue your websocket stream for the remaining hours. Better to stay away from market if your websocket starts after 11AM
I'll comment after analysing the initial ticks tomorrow. unfortunately today is a holiday. I also think someone from zerodha , should look into this discussion, because there must be some straight forward way to fetch the volume at any time of the day.. an api should be like that..
not capturing all ticks. just to ask this question.. I dumped it. I just need the logic to calculate volume of a 1 minute, or 5 minute which is same or similar to the historical data of same period.. logic for 1 minute is enough, since 5 min volume is just sum of the five 1 minutes..
There can be hundreds of ticks in a second at the exchange any trading platform over the internet can capture only 1- 2 ticks over the internet. So the candle data you see is only a trend and not the accurate data. For accurate data, you will need tick by tick data that is possible to capture in the setup inside the exchange premises wherein all the systems are connected via LAN.
We can help you get the colo setup done at the exchange, the cost starts from 18lakh onwards per annum. You can write to kiteconnect(at)zerodha.com for the same.
Your case: You are resampling a 1-min OHLC data to 5-min interval.
Solution: Convert timestamp to minutes format, add all volumes in between two timeframes, where minutes is divisible by 5.
Example:
9:15 -> start counter and store volume data from 15+16+17+18+19 to a var.
9:20 -> clear the counter and start fresh.
The colume which u get @ 9:20 is the volume for 5-minute candle starting @9:15.
---------------------------------------------------------------------------------------------------------------------------------
For websocket stream you get cumulative volumes for every tick, meaning every second. In this scenario follow the example given below.
@ 9:15 -> volume = 5000
@ 9:16 -> volume = 10000
so volume for @9:15 candle becomes = 10000 - 5000 = 5000
do this for every minute and sum it up for 5 candles to form the volume, use this in conjunction with the minutes%5 provided for ohlc.
Sorry if I misunderstood your question.
I'm resampling ticks to 5min, I'm taking maximum of "volume" from ticks during each 5 min.
example:
for 9:15 candle -> max of 9:15:00, 9:15:01, 9:15:02... 9:19:58, 9:19:59
and its a too big value than it is supposed to be,
like - 9:15 -> 10000 (max of all ticks during the minute)
9:16: -> 12000
9:17-> 9000
9:18-> 8500
9:19 -> 11000
in this scenario I'm taking maximum i.e 12000..
and if I take sum like 15+16+17+18+19, then it will be almost 5 times the value that I'm currently getting.
and if I take the same 9:15 candle with historical data api, the volume may be around 1500 or 2000 or somewhere near that.
My question is how to achieve the same volume or any acceptable value near to the historical data api using tick data?
I'm separating ticks for each instruments and pass to this class.. and here is the code that makes my ohlc..
everything is working fine in this code except volume.. please help me..
def insert_tick(self,tick):
if tick['timestamp'].date() == now.date():
ts = tick['timestamp'] #local_tz.localize(tick['timestamp'])
price = tick['last_price']
volume = tick['volume']
cursor = self.cursor
cursor.execute(f"SELECT * FROM tbl_{self.token} ORDER BY ROWID desc limit 1")
last_row = cursor.fetchone()
if last_row != None:
prev_ts = dt.datetime.fromtimestamp(last_row[0])
if ts.day != prev_ts.day or (ts.minute % 5 == 0 and ts.minute != prev_ts.minute):
ts = dt.datetime.timestamp(ts.replace(second = 0, microsecond =0))
cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")
else:
new_close = price
if price > last_row[2]:
new_high = price
else:
new_high = last_row[2]
if price < last_row[3]:
new_low = price
else:
new_low = last_row[3]
if volume > last_row[5]:
new_vol = volume
else:
new_vol = last_row[5]
cursor.execute(f"UPDATE tbl_{self.token} SET high = {new_high}, low = {new_low}, close = {new_close}, vol = {new_vol} where ts = {last_row[0]}")
else:
ts = dt.datetime.timestamp(ts.replace(minute = ts.minute - (ts.minute % 5),second = 0, microsecond =0))
cursor.execute(f"INSERT INTO tbl_{self.token} VALUES ({ts},{price},{price},{price},{price},{volume})")
print(f"{self.symbol}:{self.token} tick inserted")
else:
print("it might be a holiday")
Code changes:
check the image below.
But significant amount of people tend to trade in pre-market using AMO orders(Check this in order types), meaning orders are placed before 9:15 using AMO orders. So these get executed @ 9:07 and you will get this volume @ 9:15
Ref Links: https://zerodha.com/z-connect/queries/stock-and-fo-queries/pre-marketpost-marketafter-market-orders
This code is being executed from a worker thread.
From the websockets, I simply puts the tick to a queue and worker thread separates the tick for each instruments and calls the function of each instrument object.. So, it works fine with 150 instruments simultaneously without any delay or 1006 error. (may be I can add even more instruments ).
And thanks for the clarification about cumulative volume, that makes sense.
as you said, volume difference is to be calculated, I was taking the max of the volume as it is..
Thank you very much.
if I start my script for web socket streaming by the middle of the day, say at 11am or after that, in my very first tick I get a cumulative volume till that time, So, how do I extract volume for the first tick? I don't have any previous cumulation to subtract from it..
I also think someone from zerodha , should look into this discussion, because there must be some straight forward way to fetch the volume at any time of the day.. an api should be like that..
and the same minutes, 1 minute ohlc retried through historical data.
you can see that the total of volume differnces during 9:15 is 96138
and the volume got for 9:15 through historical data is 132887
the figures are significantly different, my question is how to get the 132887 or any acceptable figure nearby
using the tick data?
I just need the logic to calculate volume of a 1 minute, or 5 minute which is same or similar to the historical data of same period..
logic for 1 minute is enough, since 5 min volume is just sum of the five 1 minutes..
please see the image below
Since my strategy is based on vwap, this initial difference will affect the indicator values..
So, how can I avoid this?
For accurate data, you will need tick by tick data that is possible to capture in the setup inside the exchange premises wherein all the systems are connected via LAN.
We can help you get the colo setup done at the exchange, the cost starts from 18lakh onwards per annum. You can write to kiteconnect(at)zerodha.com for the same.