How to store tick data into a dataframe without loosing data due to overwriting before appending

puneethmeister
Hi @sujith ,

Im storing kite tick data into a pandas dataframe. im using the below code to do that
def on_tick(tick, ws):
TICK=pd.DataFrame(tick)
TICK['timestamp']=dt.datetime.today().strftime('%Y-%m-%dT%H:%M:%S%z')
TICK['timestamptime']=dt.datetime.today()
TICK['random']=1
# print(TICK)
global websocketdatatest
websocketdatatest=websocketdatatest.append(TICK)
# Callback for successful connection.
def on_connect(ws):
# Subscribe to a list of instrument_tokens (RELIANCE and ACC here).
ws.subscribe([53453831,53440775])
# Set RELIANCE to tick in `full` mode.
ws.set_mode(ws.MODE_LTP, [53453831,53440775])
# Assign the callbacks.
kws.on_tick = on_tick
kws.on_connect = on_connect

def on_tick1(tick, ws):
TICK1=pd.DataFrame(tick)
TICK1['timestamp']=dt.datetime.today().strftime('%Y-%m-%dT%H:%M:%S%z')
TICK1['timestamptime']=dt.datetime.today()
TICK1['random']=2
# print(TICK)
global websocketdatatest
websocketdatatest=websocketdatatest.append(TICK1)

def on_connect1(ws):
# Subscribe to a list of instrument_tokens (RELIANCE and ACC here).
ws.subscribe([53453831,53440775])
# Set RELIANCE to tick in `full` mode.
ws.set_mode(ws.MODE_LTP, [53453831,53440775])
# Assign the callbacks.
kws1.on_tick = on_tick1
kws1.on_connect = on_connect1

kws.connect(threaded=True)
kws1.connect(threaded=True)


i subscribe to around 431 instruments(3 sockets). will the code above work?
The problem im facing is that the data that is being stored in websocketdatatest is not being stored properly.
i believe its because multiple ticks received at once and before appending the data to websocketdatatest the tick data is being changed due to new tick.

can you help me with this?
whats the best way to save this data.

Using the above code today, for one of the instrument the first tick i was able to store was at 9.15.12 seconds. i believe there is something wrong with the way im storing the information.
  • gautam_s60
    gautam_s60 edited July 2017
    Does it make any sense loosing couple of ticks?
  • gautam_s60
    Btr run multi thread
  • DS4769
    Hi Punith,

    Have you tried using multiprocessing (not multiple threads)? You could run the websockets each in its own separate process where all you do is save the ticks to a shared memory dict. You can create a separate process to process these ticks and delete the dict entries after saving them to a db if required.

    Regards
  • puneethmeister
    @DS4769 Can you suggest any tutorials which can help me do that? or can you provide a sample code? your comments are much appreciated!
  • DS4769
    @puneethmeister, there are many tutorials for multiprocessing using python on the web and pyhon documentation is good as well.


    from multiprocessing import Process, Value, Manager
    ..
    ..
    # Start separate process for quotes
    p_ws = Process(target=initWebSocket, args=(token, quote, subs))
    p_ws.start()
    ..
    ..
    # as per standard example given, only change is under on_connect
    def initWebSocket(token, quote, subs):
    print("Initializing web socket...", my_api_key, token[1], my_user_id)
    # Initialise.
    kws = WebSocket(my_api_key, token[1], my_user_id)

    # Callback for tick reception.
    def on_tick(ticks, ws):
    #print(ticks)
    for tick in ticks:
    quote[tick["instrument_token"]] = tick["last_price"]
    #print("TICK===>", tick["instrument_token"], tick["last_price"])

    # Callback for successful connection.
    def on_connect(ws):
    print("On connect...", subs)
    print("Is connected...", ws.is_connected())
    # copying the list, otherwise gives a "Object of type 'ListProxy' is not JSON serializable"
    subcopy = []
    for k, v in subs.items():
    subcopy.append(v)
    #print(k, v)

    # Subscribe to a list of instrument_tokens
    ws.subscribe(subcopy)
    ws.set_mode(ws.MODE_LTP, subcopy)

    # Assign the callbacks.
    kws.on_tick = on_tick
    kws.on_connect = on_connect

    kws.enable_reconnect(reconnect_interval=5, reconnect_tries=50)

    # Infinite loop on the main thread. Nothing after this will run.
    # You have to use the pre-defined callbacks to manage subscriptions.
    kws.connect()

    ..
    if __name__ == "__main__": # put all the code of the master process under here, rest will be common
    ..
    ..

  • DS4769

    ..
    if __name__ == "__main__":

    print("======================================================================================")
    print("Login and initialization ...")

    mgr = Manager()

    subs = mgr.dict()
    quote = mgr.dict()
    token = mgr.list()

    ..
  • puneethmeister
    @DS4769 Thank you so much for the help. i got a new direction to think now.
    I solved the problem by queuing the tick data. now i dont have any data losses.

    if others are facing the same problem. Kindly read about Queue and Threading.
    Queue:href="https://docs.python.org/2/library/queue.html">https://docs.python.org/2/library/queue.html
    Queue will help to get all the tick data put it into a queue

    Threading: https://tutorialspoint.com/python/python_multithreading.htm

    Threading will help you do other operations of storing data and manipulation of data like storing into dataframe etc.
This discussion has been closed.