Skip links
solidity-static-analyzers

Security Auditing Solidity Codebases with GitHub CodeQL

We researched the state of the art of Solidity static analyzers like Slither from Trail of Bits and Mythril from Consensys. One common issue with static analysis tools is the output, which is normally filled with false positives that should be filtered by an expert and it is time consuming. As a cybersecurity company we have been following CodeQL from the beginning, and we are happy to share these insights for the community. Our project is active and will be available soon. If you are a crypto skeptic jump to the end.

CodeQL is a powerful semantic code analysis engine that allows you to query codebases for vulnerabilities and bugs. It is available for free for both research purposes and open source projects. CodeQL is a sibling of Datalog, and was created by Semmle who was acquired by GitHub in 2019. GitHub was previously acquired by Microsoft who keep it as an independent entity.

The CodeQL approach to Static Analysis is very promising for security tooling, as detectors are modular, easy to write and maintain, and edge cases found may be quickly incorporated into the original queries to improve accuracy and reduce false positives without the need to rethink the logic from scratch. Also, they are surprisingly fast.

This is how we built support for Solidity in codeQL:

  1. We used this tree-sitter grammar for Solidity (credit to Joran Honig!), as it is well maintained and mature.
  2. We adapted CodeQL helper generation scripts (written by the Github team for the Ruby language, as explained in this blog entry) to work with this grammar, creating a language specific database scheme and an extractor for Solidity codebases.
  3. To take advantage of the rich expressiveness of CodeQL and be able to write more insightful queries, we construct the following layers on top of the AST auto-generated from the Treesitter grammar:
    • A user-friendly layer of abstraction to expose AST elements (e.g. function definitions, function calls, operators, etc.) with a clean, declarative interface.
    • Create a CFG (Control Flow Graph) and its primitives to allow for reasoning at the control flow level.
    • Support for Dataflow Analysis and Taint Analysis.

We currently have more than eight active detectors and are continuously expanding this number. We’re also actively encouraging the security community to contribute, aiming to enhance the security of the entire blockchain ecosystem.

Examples of Issues with Slither

When attempting to utilize the intermediate representation, SlithIR to do dataflow analysis we encountered the following bug:

Suppose we have this Solidity snippet for a contract called PoC:

// SPDX-License-Identifier: MIT 
pragma solidity ^0.8.0;

contract PoC {
    constructor() {}
    function bug(int256 num) external pure returns (uint256) {
        uint256 res = 2;
        if (num > 0){ 
            res = 1;
        } else if (num < 0) {
            res = 0;
        }
        return res;
    }
}

Running Slither on top, and outputing the SlithIR representation yields:

INFO:Printers:Contract PoC
        Function PoC.constructor()
        Function PoC.bug(int256)
                Expression: res = 2
                IRs:
                        res_1(uint256) := 2(uint256)
                Expression: num > 0
                IRs:
                        TMP_0(bool) = num_1 > 0
                        CONDITION TMP_0
                Expression: res = 1
                IRs:
                        res_2(uint256) := 1(uint256)
                Expression: num < 0
                IRs:
                        TMP_1(bool) = num_1 < 0
                        CONDITION TMP_1
                Expression: res = 0
                IRs:
                        res_3(uint256) := 0(uint256)
                IRs:
                        res_4(uint256) := ϕ(['res_3', 'res_1'])
                IRs:
                        res_5(uint256) := ϕ(['res_2', 'res_1'])
                Expression: res
                IRs:
                        RETURN res_5

Note the orphaned phi node assigned to res_4.

In simple terms, when there is any assignment to a variable declared outside of a conditional in both an IF and an ELSE block, the SSA form of the intermediate representation breaks and we get orphaned phi nodes at the end, thus making it impossible to analyze flow in this scenario. For more information, refer to this issue in the official repo.

Why Blockchain Stack Security Matters, Even for Crypto Skeptics

The particular thing about crypto cybersecurity is the closeness of money to hackers, the relatively small smart contract codebases compared to traditional software. One simple example is that if you hack a bank through their web page (without impersonating or scamming someone) you don’t access the treasury directly. In DeFi apps most hacking events means getting the TVL inside the system. A good list of hacks with their amount is available at https://rekt.news/es/leaderboard/. We can have the position of completely dismiss the Web3 world or take into account that Web3 is a good playground, specially for extrapolating the research in this space to practices in the traditional software world.