Is there a way to predict the number of documents returned through historics?


#1

Is there a way to predict the number of documents that will be returned by a historic? If my understanding is correct, the historics/prepare endpoint will return the percentage availability for the historic time period (i.e. there is 100% coverage for twitter, 75% coverage for facebook [even though that isn’t supported at this time for historics]). And it appears that the historics/get endpoint will return the total number of documents for the historic time period under “volume_info” (i.e. the total number of twitter interactions during the time period, NOT the number that match the query). But is there a way to get just the number of documents that match the query, or is this not supported?

Thank you for your help.


#2

Unfortunately this is not currently supported. We do not know how many interactions will be returned until after the filtering stage of the Historics process, and by this point we will have extracted all of the available interactions for your query.

One thing we could suggest is to perform an Historics query, but to set the 'sample' parameter in the historics/prepare API call to 1.56% in order to return the smallest possible sample of data. Your DPU charge for a 1.56% sample should be approximately 1/20th the price of the full 100% query, and the licence fee costs should be approximately 1/64th the price of running the full 100% query. From here you should be able to estimate the number of interactions you will receive when you run the query in full. 

Another suggestion would be to consume a sample the stream in real time using the interaction.sample target for a short time to see roughly what kind of volume the live stream returns. Obviously this approach would only work if you are running a query that consistently generates roughly the same amount of traffic. If your query was looking at a persidential election, or the release of a new iPhone, traffic would be much higher around the times of the events than it might be today.