PYLON 1.7 Release Notes


#1

Overview

This document outlines functional changes in PYLON 1.7, slated for release in February 2016.

###Functional Changes

Filter Hotswaps. A usability feature requested by customers is the ability to change the CSDL definition on an existing recording. The use case for this is to add new filtering terms as they become of interest to an advertiser over time, for example as their campaigns come to encompass new keywords, phrases and hashtags. In 1.7, DataSift will allow customers to change the CSDL definition associated with a recording after the recording has begun. When using API version 1.3, customers will receive a recording ID when starting a new recording, which is different from the recording’s CSDL hash. From that point forward, customers will be able to hit a new /pylon/update endpoint referencing the recording ID as a parameter, and specify a new CSDL hash to use as the recording definition, overwriting the previous CSDL.

Example Requests & Responses for Filter Hotswap:

/v1.3/pylon/start
Request (PUT) (same as previous versions)
{
    "hash": "<CSDL hash>",
    "name": "<string>"
}
200 Success (it’s a 204 in API v1.2)
{
    "id": "<recording id>"
}
PUT /v1.3/pylon/update
Request
{
    "id": "<recording id>", (the recording where you want to swap definitions)
    "hash": "<CSDL Hash>", (hash of your new filter to be swapped in)
    "name": "<string>" (optional param to rename the filter)
}
Response: 204 No content
{
    "id": "<recording id>"
}

Additional Tokenized Fields. DataSift performs word tokenization based on punctuation and whitespace in what we call “interaction filtering”, or the primary CSDL filtering that customers use to filter data into their indexes. In 1.6 and below, DataSift also performs tokenization for “query filtering”, or using CSDL in an analysis query to restrict the query to a subset of filtered interactions, but only on the main body text of posts. For other CSDL targets, the contains and contains_any operators are disallowed in query filtering because tokenization is not performed. In 1.7, DataSift will add tokenization on the fields listed below for query filtering. This will enable customers to write a query filter like:

links.url contains “gillette”

To return a url like: http://gawker.com/tags/gillette

fb.og_object
fb.topics.about
fb.topics.company_overview
fb.link
fb.parent.link
fb.parent.og_object
fb.parent.topics.company_overview
fb.parent.topics.name
links.normalized_url
links.url
fb.parent.topics.about
fb.parent.topics.website
fb.topics.website
fb.topics.name
fb.topics.category
fb.parent.topics.category
fb.topics.category_name
fb.parent.topics.category_name

CSDL Tokenization Parity. Two changes will be made to tokenization in query filtering for parity with how tokenization is handled in primary interaction filtering. These changes are being made purely for user experience consistency.

  1. Decimal numbers are now tokenized in query filtering:
    e.g. “565.46” is now tokenized into three tokens of “565”, “.”, “46”

  2. Number and letter boundaries are no longer used for tokenization in query filtering. Previously:
    – the string “3am” would have been tokenized into two strings of “3”, “am”
    – “400th” used to be tokenized as “400”, “th”
    – “l8ter” used to be tokenized as “l”, “8”, “ter”

New Target: links.title. In PYLON 1.6.1, we introduced 2 new targets for filtering on the titles of web pages for links shared on Facebook: fb.link_title (for stories) and fb.parent.link_title (for engagements). In 1.7, you can filter link titles across both stories and engagements with a new, single target: links.title. All 3 targets are available for both primary interaction filtering and secondary query filtering, but are not available for use as analysis targets.


Release Notes thru March 2nd, 2016