Have you heard of Semgrep, or Semantic Grep (For Code)? If not, it should be a part of your pentesting toolkit, and I will convince you of that in this blog post.

Introduction to Semgrep & Web Application Penetration Testing

In a nutshell, Semgrep is an open-source, lightweight tool that parses source code and finds vulnerabilities.

Expanding on this from a technical perspective, Semgrep is a linter and, more specifically, is an open-source static analysis tool for finding vulnerabilities and entire bug classes in your code - and it's all done locally; no whacky out-of-band code uploading. It gracefully removes that painful abstract tree building (AST/DSL) where you don't need to worry about those ugly language-specific syntax regexes. Just write a rule, run Semgrep against your code, and you're done!

If you've ever found yourself in front of anything between a million-line code base or a simple C program and said to yourself, "I wish I could find all the exposed API secrets, dangerous functions that could lead to code execution or unsafe memory allocations in this program," then Semgrep is for you.

Or if you've used Project Discovery’s Nuclei before, it's sort of like that. But instead of scanning network devices and applications for vulnerabilities, it scans the code responsible for them. Not a perfect analogy, I know, but I personally find it helpful to think of the framework like that.

Semgrep started as a Facebook open-source project but was abandoned in 2015. If you're curious, that fork lives here.

For a quick look, here's a Semgrep rule to identify AWS access tokens:

CODE

# source: https://github.com/returntocorp/semgrep-rules/blob/develop/python/boto3/security/hardcoded-token.yaml
rules:
- id: hardcoded-token
 message: >-
 Hardcoded AWS access token detected. Use environment variables
 to access tokens (e.g., os.environ.get(...)) or use non version-controlled
 configuration files.
 metadata:
 cwe: 'CWE-798: Use of Hard-coded Credentials'
 owasp: 'A2: Broken Authentication'
 source-rule-url: https://pypi.org/project/flake8-boto3/
 references:
 - https://bento.dev/checks/boto3/hardcoded-access-token/
 - https://aws.amazon.com/blogs/security/what-to-do-if-you-inadvertently-expose-an-aws-access-key/
 category: security
 technology:
 - boto3
 languages: [python]
 severity: WARNING
 pattern-either:
 - pattern: $W(..., aws_secret_access_key="=~/^[A-Za-z0-9/+=]+$/", ...)
 - pattern: $W(..., aws_access_key_id="=~/^AKI/", ...)
 - pattern: $W(..., aws_session_token="...", ...)

Why Semgrep?

Because you love to hack and find obscure bugs in web applications! But seriously, Semgrep can help immensely when you're pentesting web apps.

Do you want to find SQL injection vulnerabilities against a target application? What about XSS or SSRF? Deserialization that leads to code execution? XXE? Stick around, and I'll walk you through the basics of Semgrep and send you on your path to finding more web application vulnerabilities.

Bonus: For developers and DevOps folks, you can include it in your CD/CD pipeline to get those shift-left reps in. It pairs nicely with the MITRE ATT&CK-like CI/CD Pipeline Threat Matrix.

Setting The Stage

Alright, let's get to hacking.

Picture this: You've got your target web app in front of you, Burp is creeping up in memory usage, and you've got a handful of medium-severity findings (you know which ones I'm talking about), and you need a win.

For this tutorial, we will set up a local OWASP Juiceshop environment. This gives us a modern target with modern vulnerabilities that can hopefully be immediately applicable to your testing. I won't cover the setup process in this blog post, but an excellent reference for doing so can be found here.

Now, you may be thinking, "With a local installation, we'll have access to the source code. What if I'm performing a black box test with no access to the source?" Have no fear; we'll cover that in Part 2. We'll walk through how to scrape a single page application (SPA) to access the contents locally, essentially turning a black box test into a white box(ish) one.

In this post, we'll identify an SQL injection vulnerability and weaponize it. Later, I'll cover additional vulnerabilities, but SQLi is a common enough place to start. So get JuiceShop set up, download Semgrep, and let's get going.

The easiest way to run Semgrep is from the command line, pointing it at the directory containing your source code. But you can also dockerize Semgrep or use the live playground here.

Authentication Bypass via SQL Injection (SQLi)

Let's start by running the following command, which will run Semgrep using the top-10 ruleset above (--config "p/owasp-top-ten") and only report back high-level security issues (--severity ERROR). Lucky for us, Semgrep provides almost 500 rules in their OWASP Top 10 Ruleset. Make sure you move into the installation directory:

semgrep --config "p/owasp-top-ten" --severity ERROR

Let's review the output with some annotations below:

  1. The Semgrep command, which uses the top-10 ruleset.
  2. Count on files and number of rules within the ruleset.
  3. Further breakdown on analyzed languages, the files, and rules applied to each.
  4. Source of the rules.
  5. Total number of findings.

A bit further down in the output, you’ll see things like:

Where we have - 1) The full path of the file where a potential vulnerability is contained and 2) the problematic code in question. Semgrep is nice enough to identify and tell us, “Hey, there might be SQL injection here. Double-check this code.”

Okay great! Let’s move down to where I want to focus: the login page. When doing a web application penetration test, this is a common landing zone, right? You’ll almost always have one of these, so let’s start there.

The SQL query that Semgrep flagged as vulnerable:

SELECT * FROM Users WHERE email = '${req.body.email || ''}' AND password = '${security.hash(req.body.password || '')}' AND deletedAt IS NULL

This is about as basic a SQL injection vulnerability as you will ever come across.

The vulnerability stems from the string concatenation used to build the SQL query.

The req.body.email and req.body.password values are directly inserted into the query without any sanitization. Since we, as attackers, can control these values, we can influence the behavior of the SQL query.

Even with client-side email validation enabled, we could provide a common SQLi payload as the email address: ' OR 1=1 --.

Since the -- is a SQL comment character, this effectively truncates the remaining query, allowing us to bypass the password check. Since 1=1 is true, the final payload will return the first user it finds in the Users table (in this case, the administrator account), effectively bypassing authentication and authenticating us as the admin user:

That was a nice, quick, and easy demonstration of using Semgrep to find vulnerabilities. I’ve been thinking of different ways to incorporate Semgrep on a larger scale. More on that later…

In The Wild Integrations & Real World Use Cases

To give you an idea of who is running Semgrep and perhaps some inspiration, here’s a handy Google Dork:

https://github.com/search?q=.semgrep.yml&type=code

Below are a few more examples that I think highlight the power and uniqueness of Semgrep:

Hashicorp rules on their terraform-provider-aws: https://github.com/hashicorp/t...
Zulip (Python team chat) rules: https://github.com/zulip/zulip...

Or even better, browse a ton of examples on their site: https://semgrep.dev/docs/writing-rules/pattern-examples

Wrap Up & Resources

While the SQLi example was rather basic, I would like to focus a bit more on real-world cases in the next installment. Specifically, I want to introduce the actual scraping of single-page applications to get as much of the application locally as possible so that Semgrep can really work with us.

I’ll also walk through building some Semgrep rules, and we can look for more exciting vulnerabilities and bug classes.

I really think this combination of transforming these new iterations of web applications into something to work with locally and then running Semgrep on them can yield many fruitful vulnerabilities.

In the meantime, if you’d like a comparison of Semgrep and CodeQL, the brilliant folks at Doyensec published their insightful analysis here.

Additionally, the equally impressive humans at Trail of Bits have published an excellent introduction on bringing Semgrep into your organization.

Thanks for reading, and happy hacking!