Links Filtering Questions


#1

I have a few questions about filtering by links. I want to consume facebook and twitter content meeting certain criteria but where any links in the content do NOT have certain substrings in the domain. I have 300 or so of these blacklist substrings.

  1. Am I right that the link augmentation and links.domain is the way to go

  2. In the content, to be identifeid as a link does a substring have to start with http:// or https://? Will it find and filter things like “blah.com”, “www.blah.com”, “www.blah.com/foo/bar?param=baz”, without the protocol part?

  3. I do not need to filter on or process the landing page content for links…will it be included in the output if I use the links augmentation and if so can that be disabled?

  4. Is it possible to filter on expanded links (e.g. on whatever a bitly expands to) but NOT the actual final link if redirects are taken into account?

Thank you!


#2

1. The Links Augmentation and a combination of links.domain and links.url or links.normalized_url will be the best way to filter. If I shorten a link using something like bit.ly, then post that short link in my Tweet, the twitter.links field returned in the interaction will be the bit.ly link - not the page that this link resolves to, which the links augmentation will do.

2. Some of this processing is actually done on Twitter's side. If you enter 'blah.com' into a Tweet (without any protocol data), Twitter will identify this as a link, and pass it on to us as a link. This is another reason why our links augmentation can be so useful: In cases where people include strings which can be incorrectly identified as a link, for example 'will.i.am', we use our Links Augmentation to let you know that although this may be a link, we could not resolve it. In a case where we failed to resolve a link, links.code would be populated with a 500 or 404, rather than the standard 200 HTTP response code.

3. Using the regular links augmentation, we just return basic information about the link, but unfortunately if a data source is enabled, you can not currently prevent it from returning this information to you - you can however ignore it when you receive it.

4. Take a look at the links.hops target - thats exactly what this is for. It returns every link in the redirect chain excluding the final resolved URL.


#3

Thanks Jason, for replying to this and another of my questions to quickly and thoroughly. Tell your boss that your customers appreciate the great support you provide!


#4

Jason, Does that mean that using augmentation with links and domain would be the best option to filter posts pointing to a specific domain to include all links including shortened urls. Will this cover all posts including facebook/twitter etc

thanks,


#5

Yes, the Links Augmentation is perfect for tracking any links that resolve to a certain link or domain. The links augmentation does cover links from both Twitter and Facebook.


#6

the link augmentation it can resolve the cover links of every sites

get fit


#7

The links augmentation can resolve links to any site. Searching for something like: links.domain in "getfitgal.com"will track any links shared to your site.