Disconnection with Fatal error after 1 hr of live streaming using websocket in Python client

sidverm · August 2017

Hi all,

I am a new user of Kite API. I have been able to successfully develop a module which reads live streaming data via the websocket class in the official Python client of Kite Connect. Yesterday was the first day I tested it, but only for a few minutes.

Today was the first day let this code run. It ran fine, but roughly about 1.5 hour in the morning it gave a Fatal error and stopped. Then I re-attempted to run it in the afternoon, but again it crashed after 1 hour with the same error.

Brief description of the code: The code subscribes to about 1000 instruments in 'full' mode. on_tick() method reads the 'tick' dictionary and transforms it into two pandas DataFrames, trade and depth. I later save this dataFrames on the disk as a csv. The intention is to store the entire tick history in a data base for future analysis.

Error log here:

ERROR:websocket:error from callback >: maximum recursion depth exceeded
Fatal Python error: Cannot recover from stack overflow.

Thread 0x000005f8 (most recent call first):
File "C:\Users\siddh\Anaconda3\lib\threading.py", line 299 in wait
File "C:\Users\siddh\Anaconda3\lib\threading.py", line 551 in wait
File "C:\Users\siddh\Anaconda3\lib\threading.py", line 1180 in run
File "C:\Users\siddh\Anaconda3\lib\threading.py", line 916 in _bootstrap_inner
File "C:\Users\siddh\Anaconda3\lib\threading.py", line 884 in _bootstrap

Current thread 0x00002114 (most recent call first):
File "C:\Users\siddh\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 729 in require
...

Process finished with exit code -1073740791 (0xC0000409)

soumyadeep · August 2017

It happens to me too. Did you get help or is it still happening to you?

ZE4040 · October 2017

have you guys gonepast the issues?

Shaha · October 2017

It works fine with me everyday. Use reconnect mentioned in documentation.

sujith · October 2017

Hi @sidverm,
I would suggest using multi-threading. Your main thread should only receive data and nothing else.
Once you get data pass it to a worker thread to generate panda frame and store it in the database.

pinkpanther · October 2017

@sidverm can you happen to opensource this data which you are capturing?

sidverm · November 2017

A late update: I was able to solve this issue back in September. It was to do with more than 1 second processing time in my python loop, so all data which was coming in every second was going into a buffer to wait for its turn. When buffer exceeded its capacity, it would crash.

I improved my python code by avoiding DataFrame.append in every loop and more minor tweaks. Now my processing takes about 10 ms. So my code never crashes.

sauravkedia · November 2017

@sidverm, I would want to know more details on '10ms'. So are you able to convert your tickdata into pandas dataframe in 10ms time? Or now in loops without pandas, you are achieving 10ms.

I have experimented with these and here are my observations: Pandas are designed for large but static data reads and manipulation. It is very good if have large data which you need to read once and manipulate. In other words, adding or deleting rows in Pandas is a costly operation. Even, converting other data structures to pandas is costly. In our usecase, data comes as a stream, and every time you convert the incoming json/dictionary into pandas, its costly, I think it takes 30ms or so. What I found, is that for such data, it is better and much faster to loop through the dictionary, rather than using vectorised operations in pandas.

So I would avoid DataFrame.append, adding and dropping rows when latency is critical. For array like structures in this case, its better to use collections.deque (https://pymotw.com/2/collections/deque.html) which are designed for such use case.

I would be very interested in your comments on these.

sidverm · November 2017

@sauravkedia actually my solution discarded using pandas in the loop altogether. I do use pandas but only in the end to convert a list to a DataFrame because I have file write and database insert operations after the loop which are more convenient for Dataframes. I will explore deque for more latency sensitive operations I do later, thanks a lot for sharing!

Howdy, Stranger!

Categories

In this Discussion

Disconnection with Fatal error after 1 hr of live streaming using websocket in Python client