Why the Fuzz About Fuzzing Compilers?

Compilers translate human-readable code into machine code, and any errors in this process can lead to software bugs or security vulnerabilities. One effective way to test compilers is through fuzzing, a method that involves providing seemingly valid random data as input to find unexpected issues.

Fuzzing for Blockchain Compilers: An Uncharted Territory

Exploring fuzzers in the blockchain compiler sphere is still a new frontier. As Virtual Machines (VMs) become more complex and technologies like WASM VMs emerge, applying advanced testing techniques to compilers is increasingly important. These proactive measures are much needed in ecosystems where fixing flawed or harmful code is both costly and difficult.

Case Study: The Clarity Compiler

The Clarity compiler, processes the Clarity smart contract language, translating its Abstract Syntax Tree (AST) directly into WebAssembly (WebASM), bypassing optimization passes for a straightforward translation. For further details on Clarity’s language specifics, refer to the Clarity documentation.

Elevating Fuzzer Efficiency

Zest stands out by embracing structure-aware fuzzing, crucial for generating highly structured inputs – programs in this context. Focusing beyond parser errors, Zest ensures inputs are syntactically valid, with a high likelihood of semantic validity. Leveraging coverage-guided fuzzing, it retains inputs that unveil new coverage patterns, forming a corpus for subsequent mutations. Semantic validity feedback is integrated, enhancing Zest’s discernment in result evaluation. Utilizing libfuzzer for byte stream generation, Zest employs a Clarity generator to deterministically convert these streams into syntactically and semantically valid Clarity programs, filtering out predictable errors like type mismatches.

Our Solution: Architectural Insights and Coverage Reporting

Schematic view of our mutation-based fuzzing strategy.

We devised a Clarity code generator that accepts a random byte string and outputs valid Clarity programs. Despite its initial iteration having a limited set of opcodes and language features, it proved effective in uncovering noteworthy findings. Utilizing Rust’s implementation of LibFuzzer, we managed the byte string generation, corpus handling, and coverage information retrieval.

Post-run, we meticulously analyzed the compiler’s output, filtering trivial findings and employing a WASM parser check to ensure the generated WASM’s validity in successful compilation cases.

Findings and Insights

The compiler allows you to name variables/functions with type names. This means that from there on in the contract, the type name is replaced by the name of the variable/function. There might be attack vectors that could exploit this.
The compiler does not appear to limit the number of locals that the resulting WASM generates. It seems that the garbage lines of literals (e.g., declaring a bare integer that is not used at all and is functionally equivalent to a nop) generate locals in the WASM. This is particularly insidious with lists. So a “bloated” program might compile without problems, but when validating them with a WASM parser, it warns that the use of locals is being exceeded.

Looking Ahead: Enhancements and Future Fuzzers

Our journey doesn’t end here. We envision extending the Clarity compiler’s capabilities, exploring additional fuzzing strategies, and delving deeper into the nuances of Stacks.

As we continue to unravel the complexities of blockchain compilers, we invite the community to join us in this pioneering endeavor. Together, we can enhance the robustness and reliability of blockchain technologies, paving the way for a more secure future.