I am a new user of Kite API. I have been able to successfully develop a module which reads live streaming data via the websocket class in the official Python client of Kite Connect. Yesterday was the first day I tested it, but only for a few minutes.
Today was the first day let this code run. It ran fine, but roughly about 1.5 hour in the morning it gave a Fatal error and stopped. Then I re-attempted to run it in the afternoon, but again it crashed after 1 hour with the same error.
Brief description of the code: The code subscribes to about 1000 instruments in 'full' mode. on_tick() method reads the 'tick' dictionary and transforms it into two pandas DataFrames, trade and depth. I later save this dataFrames on the disk as a csv. The intention is to store the entire tick history in a data base for future analysis.
Error log here:
ERROR:websocket:error from callback >: maximum recursion depth exceeded Fatal Python error: Cannot recover from stack overflow.
Thread 0x000005f8 (most recent call first): File "C:\Users\siddh\Anaconda3\lib\threading.py", line 299 in wait File "C:\Users\siddh\Anaconda3\lib\threading.py", line 551 in wait File "C:\Users\siddh\Anaconda3\lib\threading.py", line 1180 in run File "C:\Users\siddh\Anaconda3\lib\threading.py", line 916 in _bootstrap_inner File "C:\Users\siddh\Anaconda3\lib\threading.py", line 884 in _bootstrap
Current thread 0x00002114 (most recent call first): File "C:\Users\siddh\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 729 in require ...
Process finished with exit code -1073740791 (0xC0000409)
Hi @sidverm, I would suggest using multi-threading. Your main thread should only receive data and nothing else. Once you get data pass it to a worker thread to generate panda frame and store it in the database.
A late update: I was able to solve this issue back in September. It was to do with more than 1 second processing time in my python loop, so all data which was coming in every second was going into a buffer to wait for its turn. When buffer exceeded its capacity, it would crash.
I improved my python code by avoiding DataFrame.append in every loop and more minor tweaks. Now my processing takes about 10 ms. So my code never crashes.
@sidverm, I would want to know more details on '10ms'. So are you able to convert your tickdata into pandas dataframe in 10ms time? Or now in loops without pandas, you are achieving 10ms.
I have experimented with these and here are my observations: Pandas are designed for large but static data reads and manipulation. It is very good if have large data which you need to read once and manipulate. In other words, adding or deleting rows in Pandas is a costly operation. Even, converting other data structures to pandas is costly. In our usecase, data comes as a stream, and every time you convert the incoming json/dictionary into pandas, its costly, I think it takes 30ms or so. What I found, is that for such data, it is better and much faster to loop through the dictionary, rather than using vectorised operations in pandas.
So I would avoid DataFrame.append, adding and dropping rows when latency is critical. For array like structures in this case, its better to use collections.deque (https://pymotw.com/2/collections/deque.html) which are designed for such use case.
I would be very interested in your comments on these.
@sauravkedia actually my solution discarded using pandas in the loop altogether. I do use pandas but only in the end to convert a list to a DataFrame because I have file write and database insert operations after the loop which are more convenient for Dataframes. I will explore deque for more latency sensitive operations I do later, thanks a lot for sharing!
I would suggest using multi-threading. Your main thread should only receive data and nothing else.
Once you get data pass it to a worker thread to generate panda frame and store it in the database.
I improved my python code by avoiding DataFrame.append in every loop and more minor tweaks. Now my processing takes about 10 ms. So my code never crashes.
I have experimented with these and here are my observations: Pandas are designed for large but static data reads and manipulation. It is very good if have large data which you need to read once and manipulate. In other words, adding or deleting rows in Pandas is a costly operation. Even, converting other data structures to pandas is costly. In our usecase, data comes as a stream, and every time you convert the incoming json/dictionary into pandas, its costly, I think it takes 30ms or so. What I found, is that for such data, it is better and much faster to loop through the dictionary, rather than using vectorised operations in pandas.
So I would avoid DataFrame.append, adding and dropping rows when latency is critical. For array like structures in this case, its better to use collections.deque (https://pymotw.com/2/collections/deque.html) which are designed for such use case.
I would be very interested in your comments on these.