An evaluation of Github code scanning

Published 6 Oct 2020 - 5 min read

Last week Github released their first official version of Code Scanning. It allows users to quickly scan their source code to identify vulnerabilities. But I have a question.

How reliable is it?

In this post, I’ll try to give an answer to that based on a sample C code.

I will use CodeQl to scan a deliberately bugged code and then we will analyze the results to check the performance of the tool.

CodeQl code scanning

Github code scanning uses CodeQl, a semantic code analysis engine. It runs queries on your code to identify potential threats and bad patterns. It supports a huge variety of languages (C/C++, Go, Java, JavaScript, Python, ecc.) and it has a cool action called autobuild that you can use to build your code.

How to set it up

To enable CodeQl in your GitHub repository, you can follow the official documentation.

If you are a VS code user there’s also an extension for you.

Reference Environment

For this evaluation I start from the standard CodeQl configuration, then I specify the language of the source code for the analysis: language: ['cpp'] plus an extended set of queries: queries: +security-extended, security-and-quality.

name: "CodeQL"

on:
  push:
    branches: [main]
  pull_request:
    # The branches below must be a subset of the branches above
    branches: [main]
  schedule:
    - cron: '0 13 * * 3'

jobs:
  analyze:
    name: Analyze
    runs-on: ubuntu-latest

    strategy:
      fail-fast: false
      matrix:
        language: ['cpp']
          
    steps:
    - name: Checkout repository
      uses: actions/checkout@v2
      with:
        fetch-depth: 2

    - run: git checkout HEAD^2
      if: ${{ github.event_name == 'pull_request' }}

    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v1
      with:
        languages: ${{ matrix.language }}
        queries: +security-extended, security-and-quality

    - run: |
        make

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v1

Reference code

I’m using a C code that contains, on purpose, some traits, and errors. The reference code is the fuzzgoat repository from fuzzstation and it contains 4 main vulnerabilities:

  • Use of memory after a free:
/******************************************************************************
WARNING: Fuzzgoat Vulnerability

The line of code below frees the memory block referenced by *top if
the length of a JSON array is 0. The program attempts to use that memory
block later in the program.
Diff       - Added: free(*top);
Payload    - An empty JSON array: []
Input File - emptyArray
Triggers   - Use after free in json_value_free()
******************************************************************************/

               free(*top);
/****** END vulnerable code **************************************************/
  • 2 invalid frees:
   switch (value->type)
      {
         case json_object:

            if (!value->u.object.length)
            {
               settings->mem_free (value->u.object.values, settings->user_data);
               break;
            }

/******************************************************************************
  WARNING: Fuzzgoat Vulnerability
  
  The line of code below incorrectly decrements the value of 
  value->u.object.length, causing an invalid read when attempting to free the 
  memory space in the if-statement above.
  Diff       - [--value->u.object.length] --> [value->u.object.length--]
  Payload    - Any valid JSON object : {"":0}
  Input File - validObject
  Triggers   - Invalid free in the above if-statement
******************************************************************************/

            value = value->u.object.values [value->u.object.length--].value;
/****** END vulnerable code **************************************************/

            continue;

         case json_string:

/******************************************************************************
  WARNING: Fuzzgoat Vulnerability
  
  The code below decrements the pointer to the JSON string if the string
  is empty. After decrementing, the program tries to call mem_free on the
  pointer, which no longer references the JSON string.
  Diff       - Added: if (!value->u.string.length) value->u.string.ptr--;
  Payload    - An empty JSON string : ""
  Input File - emptyString
  Triggers   - Invalid free on decremented value->u.string.ptr
******************************************************************************/

            if (!value->u.string.length){
              value->u.string.ptr--;
            }
/****** END vulnerable code **************************************************/
  • Null pointer dereference:
  /******************************************************************************
  WARNING: Fuzzgoat Vulnerability
  The code below creates and dereferences a NULL pointer if the string
  is of length one.
  Diff       - Check for one byte string - create and dereference a NULL pointer
  Payload    - A JSON string of length one : "A"
  Input File - oneByteString
  Triggers   - NULL pointer dereference
******************************************************************************/

            if (value->u.string.length == 1) {
              char *null_pointer = NULL;
              printf ("%d", *null_pointer);
            }
/****** END vulnerable code **************************************************/

Code analysis results

These are the results from CodeQl. CodeQl source code analysis results code quality

Known vulnerabilities

As you can see in the analysis above, none of the vulnerability that were introduced on purpose on fuzzgoat are discovered by CodeQl.

free errors are difficult to spot

but this is not true for a NULL pointer dereference! It can easily be caught by other tools like clang:

$ scan-build-8 make all
scan-build: Using '/usr/lib/llvm-8/bin/clang' for static analysis
/usr/share/clang/scan-build-8/bin/../libexec/ccc-analyzer -o fuzzgoat -I. main.c fuzzgoat.c -lm
fuzzgoat.c:298:29: warning: Dereference of null pointer (loaded from variable 'null_pointer')
              printf ("%d", *null_pointer);
                            ^~~~~~~~~~~~~

New findings

On the other side, let’s analyze the actual results.

CodeQl was able to locate a potential buffer overflow source code quality analysis buffer overflow scanning evaluation which is a crucial vulnerability. That one is probably a false positive since the size of the buffer is fixed, but in general, it’s a good practice to be cautious when performing a buffer write operation.

Apart from that, the other findings are:

  • Missing enum case in the switch;
  • Poorly documented large function;
  • Time-of-check time-of-use filesystem race condition;
  • Uncontrolled data used in path expression;
  • Potentially uninitialized local variable.

This is it!

After this evaluation, I think that CodeQl is a good tool for the daily use to get insight on the code quality: it’s easy to use and configure, it can be quickly integrated into your GitHub repository and it provides small actionable results.

The downside is that you can have a high number of false positives to deal with.

Reach me on Twitter @gasparevitta and let me know your thoughts!

You can find the code snippets on Github

Get emails about new articles!


I write about Continuous Integration, Continuous Deployment, testing, and other cool stuff.
Gaspare Vitta on Twitter