Historic sample and csdl sample


Running a historic query with 100% sample rate and passing a CSDL with a condition interaction.sample < 5 is equivalent to running a historic query with 10% sample rate and interaction.sample < 50 in CSDL?
Do I receive the same amount of data and spend the same amount of DPUs?


There are some subtle differences.

Let's say an Historic with a 100% sample rate (not using interaction.sample in your CSDL) would return 100,000 interactions, and cost 1000 DPUs.

10% Historic samples are cheaper and easier for us to run, so we charge only 40% of the DPUs you would be charged for the equivalent query run at a 100% sample. This means that in this example, the Historic would only cost 400 DPUs. You should receive ~10% (~10,000) of the total interactions available in the time period you are querying.

Running an Historic at a 100% sample rate would cost the full 1000 DPUs, and you would use the interaction.sample target to reduce the number of interactions you receive (and are charged for) to a smaller percentage (use interaction.sample <= 10 to return just 10% of available interactions to return ~10,000 interactions).

In summary, you can use either method to return a smaller sample of interactions. These sample sizes are approximates, so running a 10% sample Historic or running a 100% Historic, with interaction.sample <= 10 may not return exactly the same number of interactions, but they should be pretty similar. If you want to save DPUs when looking for a sample of interactions, we would recommend using 10% sample Historics where possible. You could also look at using Historics Previews, which return a statistical analysis of a 1% Historic sample, rather than the interactions themselves, for a period of up to 30 days, at a cost of no more than 70 DPUs for a 30 day period.