Monitoring pastebin.com with Scumblr

I have been experimenting with Scumblr and Sketchy - two open source products released in August 2014 by Netflix.

Broadly speaking, Scumblr is a tool for performing external searches that can aggregate and track the results it finds. In addition it employs various workflow/status/tag features to produce a management tool for taking action on the items or 'events'.

Sketchy is a tool for taking screenshots of a page. Working in conjunction, Scumblr can make Sketchy screenshot the search result it finds, which is useful if the result is actually deleted from the third party site at a later date (you still have the screenshot as evidence).

You can read more about these tools in Netflix's announcement here.

In terms of a specific use case, Scumblr is useful if you have a product (or any sort of keyword) and you want to monitor various sites for 'chatter' regarding that term. For example, you might offer a unique digital service, and you want to monitor Twitter for trend information about people talking about your service. Alternatively (and this is especially the case if it's a digital front-facing service that you need to protect), you (or your boss, or your ISO27001 auditor) may be concerned about people identifying vulnerabilities or compromising your service, and leaking that information on various nefarious websites. In short: it's useful as a Threat Monitoring intelligence tool.

As we know, Pastebin.com is a popular site for dropping payloads of compromised database credentials and the like. I wanted to monitor Pastebin.com for a specific term to catch any such pastes that might correlate to a compromised database of a digital service.

I already had my Google Search engine setup as per Scumblr's documentation, whereby I am 'crawling the entire web' via Google's Custom Search for a specific term. I tried to edit my Pastebin search to use this Google Search but limit the results to site:pastebin.com. Sadly this simply didn't return anything, nor did it do so in the CSE console at Google.

So I visited pastebin.com and noticed that they themselves use Google Custom Search. You can see that, after performing a search on Pastebin.com, the URL as well as the page source shows the cx id 013305635491195529773:0ufpuq-fpt0:

http://pastebin.com/search?cx=013305635491195529773%3A0ufpuq-fpt0&cof=FO...

As it happens, the Google Search engine in Scumblr allows for setting a 'Custom search id (cx)'. On a hunch, I set this to 013305635491195529773:0ufpuq-fpt0 and also set the 'Limit to site' to pastebin.com (this might not even be needed if the CX already does this, I'm not sure). And voilĂ ! The search works as expected, I guess via Pastebin.com's own Custom Search service rather than your own.

Finally, props to Mike at Hardwater Information Security for his post which helped me overcome some installation hurdles with Sketchy.

Tags: