I have a few questions about filtering by links. I want to consume facebook and twitter content meeting certain criteria but where any links in the content do NOT have certain substrings in the domain. I have 300 or so of these blacklist substrings.
Am I right that the link augmentation and
links.domainis the way to go
In the content, to be identifeid as a link does a substring have to start with http:// or https://? Will it find and filter things like “blah.com”, “www.blah.com”, “www.blah.com/foo/bar?param=baz”, without the protocol part?
I do not need to filter on or process the landing page content for links…will it be included in the output if I use the links augmentation and if so can that be disabled?
Is it possible to filter on expanded links (e.g. on whatever a bitly expands to) but NOT the actual final link if redirects are taken into account?