How to store web streaming data in pandas dataframe

CuriousTrader
Hi,

I am new to python and trying to understand how the can I store data that is received in on_tick function in pandas dataframe as in can I pass a dataframe as an argument in on_tick function.

Or is there any way to store the data so that I can calculate Technical indicators like Bollinger/ MACD/ RSI.
  • Vivek
    @CuriousTrader You can't send extra arguments in on_tick function but you can call your functions from on_tick function. Similarly you can have logic to write data to file or database from on_tick function.
  • sabyasm
    Add the below line in github example:

    df = pd.DataFrame(tick)
  • CuriousTrader
    @sabyasm - Will it append the tick data in dataframe ?
  • AnkitDoshi
    First you will have to convert the Tick data to DF in pandas.
    You can do something like this-
    import pandas as pd
    Tick=pd.DataFrame(tick)

    Also you can export this data to excel
    Tick.to_excel('Tick.xlsx', sheet_name='Tick', index=False)
  • nithishkailas
    from kiteconnect import WebSocket as wb
    kws = wb(api_key, public_token, user_id)
    import pandas
    def on_tick(tick, ws) : print(tick,"\n")
    def on_tick(tick,ws) : new = pandas.DataFrame(tick,ws)
    def on_connect(ws): ws.subscribe[408065]
    def on_connect(ws):ws.set_mode(ws.MODE_LTP,[408065])
    kws.on_tick=on_tick
    kws.on_connect=on_connect
    kws.connect()

    THIS CODE IS SHOWING ME THIS ERROR

    ERROR:websocket:error from callback >: 'WebSocket' object is not iterable

    ERROR:websocket:error from callback >: 'WebSocket' object is not iterable

    ERROR:websocket:error from callback >: 'WebSocket' object is not iterable
  • RP3436
    Hi
    I am new to python and API applications but Technical Analysis is my domain. I look forward to help / guidance till data receiving and management. I have read most of the discussions in the category Websocket and Python client and also gained some understanding on numPy,Pandas and sqlite3. I have few silly and basic questions , if someone can take pain to answer -
    1. It is understood that multi threading should be used so that the main thread remains unblocked for receiving continuous data feed. I want to understand after which point (of the main code for websocket) the new thread should be activated . Should assigning to pandas data frame and subsequent/prior writing to a database ,be done in the new thread? A snippet?
    2. If I need to focus only on 5-6 instruments , then should sqlite DB pose any challenge or will be ok ? I feel comfortable with sqlite.
    3. Are Pandas data frames and sqlite (or any other DB) , both are required or any one can accomplish the job?
    4. Which module and which function is suggested to form the candles of different time frames from tick data ? Hope some built in features are there for doing this task efficiently..
    5. While using Historical data should there be two databases - one for historical and one for live streamed data? Any strategy may require historical as well as live data. So, Is appending live data to historical database a solution or there is any other process to manage data from these two sources.

    Will be grateful if I get directed with some hints/references.
    Pinaki Paul
    [email protected]

  • ramatius
    You should make two processes, (1) to stream data into your system and (2) the algo that consumes the data.

    For (1), you should be using a RDBMS such as MySQL. Use in-memory data tables if you need it fast. This way, you can have multiple algos running in parallel, consuming the same real-time data. AFAIK, this is a robust way of handling trade data.
  • RP3436
    @Vivek
    Can you kindly attempt a reply to my questions. My problem areas are creating a new thread and which function to call in the new thread on_tick or on_connect? Or on_tick in t1 and on_connect in t2? Can you give a direct code example using threading.thread where in one thread we are receiving ticks and in the other we are creating dataframes or database and doing analysis? Just few lines / words as hint . It will be useful for hundreds of new entrants who are struggling to make use of the apis.

    Second, I can process the data captured in pandas data frames as well as in databases like MySQL , so for my kind of requirement where I focus only on 3 to 4 instruments, which approach is better? Does pandas fall under in-memory data tables.
  • Vivek
    @RP3436 It's a bad idea to use Sqlite since it doesn't support concurrency well. You can read more about it here - https://stackoverflow.com/a/26864360/973508

    Here is a sample project which you can clone and use it - https://github.com/vividvilla/kite-connect-python-example
    This is a simple example which uses Python Kite connect client to receive ticks and save it to Postgresql database. Celery is used as a Task queue manager to insert to database without blocking main Kite connect WebSocket thread.

    Kite ticker subscribes to tokens in specified in stream.py with 5 second delay. Ticks received are sent to celery task queue where it will be inserted to db.
  • RP3436
    @Vivek Thanks a lot. This is really helpful unlike the previous reply I got. If the initiative is to gain popularity - issues faced by newbies (in programming ) are to be addressed - through forum and also through user friendly documentation. Thanks again.
This discussion has been closed.