Usage model of historical data

sudhirshettyk · September 2017

Hi @Sujith,
I would like your feedback on what is our usage model versus what is your expected usage model on using historical data . Please help us align with your expected usage model :
I am trying to understand your rational of limiting access rate to 2 to 3 requests per second and seeking to understand if i can align with that.
What i feel is your perceived scenerio of usage by customers Or atleast given the rate limit how one can atmost use it :
1) Fetch a historical data of a stock
2) derive technical analysis and plot
3) Do manual analyse .
4) Repeat above steps for next stock.
This way you expect your customers dont need to access historical data of more than 2 to 3 per second.

My usage scenerio and believe this is how most would like to use going by the the number of requests from your customers to increase the rate of historical data -
1) Every Fixed interval , Say every 30minutes interval i.e at 9.30, 10.0 , 10.30...(some may do 1 min/5min)
2) Fetch historical data for missing candles .
Say , current time is 10.30 , my database has record upto 9.30 , then i would request for 30min candles from 9.30 to 10.00 and 10 to 10.30 (note that it is not just always the latest candle assuming my system might go down due to say network interruption etc)
3) Save these candle info to database .
4)repeat (1)(2)(3) for N stocks . Say 1000 stocks.
5) Do technical analsysi of these N stocks and save to database .
6) Apply filters to determine buy/sell signal for the stocks .
I would expect that in these usage scenerio steps 1 to 5 completes very fast say within 3 to 5 mins which i assume is reasonable for the trade signals generated to be relevant.
Given your rate limiting of 2 to 3 per second this is impractical . I can actually do only 1 request per second as i will have to keep aside request bandwidth for simultaneous trade transactions (on another process/thread).
I hope you are able to see the problem . You definitely have a valid reason to rate limit , but you need to help us align our usage model with your expectation . May be architecting your APIs such that it always you to differentiate from a real usage scenerio versus abuse of your system.
One thing you can do is to allow multiple stock requests in historical data requests (like you added in #3.0 revision of market live quote) .
Your feedback and solution to the problem is highly appreciated.

sauravkedia · September 2017

Great post @sudhirshettyk . I would also like to add to give additional insights on the issue. Zerodha would claim, rightly so, that for real-time trading needs, websockets is the way to go. However practically, it not very easy for retail traders for reasons below:

1. Websockets data are tick data. Technical Analysts and majority of other traders use Candles. When they need tick, say to feed to Limit price, they simple take Last price in current candle. This conversion of tick to candles, while simple in theory, is onerous for most traders. That (along with inherent backfill in historical API - see point 4) is probably is the reason for the continuing popularity of historical API for live trading needs vs the obvious choice, websockets.

2. For someone tracking 200 or more symbols, the data flow in websockets is very fast. Receiving data, converting the json into native data structure and then isolating the fields of our interest, and persisting it is not an easy job. Whatever you do, it introduces latency, howsoever tiny and is always annoying. This is especially frustrating if you want to place market orders. I would recommend that Zerodha comes with some demos around these. Perhaps, they could implement a demo which automatically downloads data to a local database and then fetch the data out of that database using a set of commands.

3. A typical retail trader has graduated from Amibroker or similar software where data handling is automatically done for them. Handling your implementation of websockets calls for significant computing expertise. Thanks to your tutorial, getting data from websockets is breeze, but then on it is really painful.

4. Websockets don't have backfill option. So it means it cannot be relied upon as a true and trustworthy datafeed. There are thousands of things which can prevent one from being able to capture all data. So peace of mind is never there. One has to complement with historical API for such missed instances. But lack of clarity around handling of corporate actions makes it a suspect choice as has been pointed out by @sudhirshettyk. Cant you introduce a backfill in websockets?

7. Your Websocket doesn't provide the quantity traded between two received ticks. So if volume in previous tick was 10000, and current tick is 12000, then 2000 shares were traded between the last tick and current tick. The last quantity in your feed is quantity traded in last trade which could even be 10 shares, and not 2000, which is quantity of all trades between last received tick and current tick. Which means, to maintain correct picture of volume being traded per tick, one has to calculate the difference between Volume of current tick and last received tick as soon as a tick arrives. In my experience, after trying multiple things in python, this always introduces latency. Even if you do it, what if you are disconnected for 10 minutes. You will have a massive volume in the fist tick received after connection. My request is that please add this field. It will save us a lot of work and latency.

To summarize, @sudhirshettyk has neatly explained issues for trader who may do analysis at certain point in time say every 30 minutes or so. But for a more onerous use case, say of a system which is tracking market continuously, there are additional challenges, even with websockets, as I have tried to explain.

sujith · September 2017

Hi @sudhirshettyk, @skk,
We understand your concern.
We are planning to increase the number of requests in Kite Connect 3.
We have also provided ohlc and ltp endpoints for up to 200 scrips in one HTTP calls.
In order to back populate, historical API now supports passing time with the date which provides a way to fetch particular candle.

We are also investigating on other ways to cater historical data to users one of them is streaming historical candles.

You can check out more information about release 1 & 2 of Kite Connect 3.

sudhirshettyk · September 2017

Thanks @Sujith .
1) Using ohlc and ltp endpoints in http calls is not feasible for most i guess because if locally our system/connectivity has problem then we cannot retrieve any missed candles other than the latest. Neverthless it is a useful feature for us to refresh our portfolio , watchlist etc.

2) Given that it is difficult for you to do-away with rate restriction or increase it significantly , The best solution would be if you could support querying candles for multiple stocks in a single historical data request . You can think of placing additional constraint if you think the response data is going to blow up , but this is the best approach i feel . May be a zipped csv file as response is also good. It would be nice if you could priortise this on top of your list and provide a quick solution in a week - would be impressive :-) .
One more alternative , the way i believe nse does on many queries - every candle interval (say 30min ) you can generate a csv file and place it in your server . we can download this as this data is going to be static. This i believe can free up significant compute bandwidth at your end and should allow you to give higher rate limitiation for these queries.

3) One more thing that you must think of is to isolate rate limiting of historical data from other time of trade requests . This is because we will have historical data fetching and data analysis on a seprate process compared to a front end which places trade orders . It is very difficult ensure usage of full available rate bandwidth at our end and would have to be very pessimisitic .

4) Another point - any comment regarding the consistency in data adjustment for corporate actions ?. Generally , i felt it would be easy for you to do data adjustment since NSE/private data providers like (http://www.cmots.com/services/market-data-provider) provides details of all forthcoming corporate actions much ahead of time and it is not hard on compute resource too. Anyways you need to bring in consistency to whether adjust or not . options could be in order of preference-
1) you have a better idea than listed below :-)
2) you guarantee always adjusted data.Simplest for you would be to switch your data provider who gives always adjusted data. (www.cmots.com and almost every data provider does that ) . Or alternatively fetch forthcoming corp-action data from NSE and maintain a ratio to be applied for a particular days stock data. This way your response data will always be dynamically adjusted.
3) provide a way to know upto what date corp-action is adjusted on a particular stock . This way we can atleast deterministically adjust the rest of the data at our end or atleast discard a buy/sell signal at our end from a price which is not fully adjusted.
4) Provide a option to fetch data that is never adjusted. we can adjust at our end.

Thanks,
Sudhir Shetty

sujith · September 2017

Hi @sudhirshettyk,
I am afraid we can't give bulk CSV dumps as there are regulations around it.
If our historical API doesn't serve your purpose then I would suggest using any NSE authorized vendor.

In future, we will provide some kind of mechanism which denotes if historical data is adjusted or not.

sudhirshettyk · September 2017

@Sujith , thanks for the suggestion of using other Vendors , I will choose to ignore it , until it comes from @Nithin .
Since i am using your API for trade i am trying to adapt to your usage model of historical data too.
If you find customers suggesting and requesting features offensive , please let me know , I can restrain . I was hoping that giving a detailed explanation of our usage models and suggesting what could be done can help you improve the system for all of us.
Thanks anyways for taking out time to answer the query while you are busy with v3.x updates.

Howdy, Stranger!

Categories

In this Discussion

Usage model of historical data