I got a HTTP 413 response from your nginx when creating a stream through the REST API for a CSDL query that looks like this: twitter.user.id in [123, 456, 789, …] where the array of IDs is substantially large (around 700K IDs). Is creating this kind of query realistic or should I find some other way to acquire tweets from a specific group of users?
The only limit we currently impose on CSDL, is that the maximum size of a single CSDL query must not exceed 1MB - this is documented on our Advanced Features page.
The best way to get around this limit is to use sub-streams within a single master stream, using the stream keyword.
We are planning some improvements in the future which mean we should be able to remove this limit.
I just ran into the same problem with a 990k CSDL. You say the limit is 1MB, how do you enter a 1 MB CSDL into the system?
I tried through the Web UI. While the keyword list accepts the list, it is severely truncated before being added to the CSDL. Now I’ve tried the REST API, and I get the 413 Request Entity Too Large error.
How do I add a CSDL that is close to the 1 MB limit?
When compiling CSDL using the API, the CSDL may be URL encoded - adding additional characters.
If you have trouble compiling a large CSDL query, you could try splitting the query into smaller queries using the stream keyword. This allows you to create a master stream, and several sub-streams - each smaller than 1MB.
The URL encoding was the problem. Once I accounted for that (and in particular the stricter than spec URL encoding provided by Java, and in the Datasift Java client) I could send up to 1MiB of URL encoded CSDL.