Every week, Sprocket CEO and Founder Casey Cammilleri interviews an expert leading the charge on empowering security experts and practitioners with the knowledge and insights needed to excel in the future of cybersecurity.
We recently spoke with Andrew Morris, Founder & Chief Architect at GreyNoise Intelligence. Here are the top takeaways from the interview.
#1: Filter Internet Background Noise Like Noise-Canceling Headphones
“Do you know how noise canceling headphones work? You just capture a little bit of ambient noise and then you functionally play the precise opposite of that to filter it out from what the listener's hearing. So people have a tendency to think that internet background noise is this natural weather phenomenon, or it's created by God, or it's static electricity or something like that, but it's not. It's an enumerable set of data. There is somebody scanning the internet at that exact point times a million at any given point in time from a place for a reason.
“So what that means is if you, let's say I were to ask you, ‘What is the normal amount of internet scan and attack traffic for any random host on the internet to see right now?’ I don't think I could tell you. It is a very hard question to answer. It's an answerable question. You just have to say like, ‘All right, well the way that I would figure that out is I would set up maybe 10 servers in every region of AWS, 10 servers in every region of Azure, Google Cloud,’ and then I would do that and all the other ones DigitalOcean, OVH, all these other ones.
“And then you would just keep doing that iteratively until you had functionally what was actually a definitive list at a point in time of all the IPs that were scanning the internet or attacking the internet at that exact point in time. And then if you were to have a log file or a SIEM or something like that, and you'd be like, ‘Hey Andrew, filter out all the internet background noise from this.’ It's like grep, you just take those out and that's it. And what you're left with is a nice clean signal of what's hitting you, like noise canceling headphones.”
Actionable Takeaway: Internet scanning isn't random noise — it's deliberate activity from identifiable sources. Build comprehensive baseline data by deploying sensors across every major cloud provider and regional network. This creates a definitive list of scanning IPs at any point in time, which you can then filter from your logs like a grep command, leaving clean signals of actual threats targeting your infrastructure.
#2: Distinguish Between Good Guy and Bad Guy Internet Scanning
“There's legitimate, totally non-malicious reasons to scan and crawl the internet. There's totally legitimate reasons to support scanner bots. Sysadmins do it all the time. Network admins do it all the time. There's nothing wrong with getting scanned on the internet. Google built their business on port scanning and then crawling and indexing content on the internet of other people's servers. They've been doing it for years. There's nothing wrong with that. And then at the same time, bad guys might be crawling the internet to find who's running vulnerable software to try to compromise it. And that's bad, even though the traffic might be the same.
“So yeah, there's a ton of different reasons to do that. There are little tricks and indicators that you can use to try to map who is who and what they're up to and why they're scanning the internet. Because I think at the end of the day, what you're getting at is, who are the good guys? Who are the bad guys that are doing it? What does that list look like? There's ways that you can scan the internet and not be an asshole about it. You can use RDNs, you can use user agents. An opt out list. You can respect all that kind of stuff.”
Actionable Takeaway: Not all internet scanning indicates malicious intent. For instance, Google conducts systematic port scanning and content crawling, just like legitimate sysadmins and researchers do daily. Attackers use identical traffic patterns but ignore behavioral conventions. Look for good guy indicators like proper reverse DNS records, respectful user agents, adherence to robots.txt files, and response to opt-out requests to separate threats from noise.
#3: Plan for Infrastructure Burnout When Building Sensor Networks
“I would say it's easier to do when you have 50 sensors. Harder to do when you have 100,000. Most of our customers, we don't deploy sensors for most of our customers. We just give them a read pipe of the data or access to the API or whatever.
“The only other thing, building an internet-wide distributed sensor network is really like getting something running in every region of AWS, Google Cloud, like the tier ones, you can do that in a weekend, no problem. Then getting it down to the ones below that. The slightly smaller ones, the more regional ones, that is going to take you a couple of weeks, couple months. Then as soon as you start getting below that to like Joe Smith host, which is like ultra local to Ukraine. Or something like AWS isn't in Ukraine. So like you don't get the same robust APIs and stuff like that. That's when it starts to get really tricky.
“Then you've got to assume that you're going to be burned. People are going to find out that something's a honeypot network. You're going to accidentally leak your IPs to somebody, something like that. Somebody's going to fingerprint you. Someone's going to just get in a internet dick measuring contest with you just to try to, and try to fingerprint your stuff. And maybe they do it, maybe they don't. So you're going to have to burn them and take them all back up. You can't just stand something up once, leave it there all precious. You're gonna have to burn it not once, not twice — this isn’t the 1990s — and again and again.
“So being able to handle that kind of stuff is — those are some of the bigger technical challenges operationally. I mean, like starting a business, you got to get the right team, stuff like that. That makes everything easier. Gotta have a stomach for it.”
Actionable Takeaway: Building global sensor networks requires accepting that your infrastructure will be discovered and burned regularly. Start with major cloud providers for quick deployment, then expand to regional hosts where APIs become less reliable. Assume adversaries will fingerprint your honeypots through various methods. Design systems that can be rebuilt repeatedly rather than treating sensors as permanent installations requiring protection.