LLVM Passes for Security: A Pass for Fuzzing – Coverage and Context-sensitivity (Part 4/4)
Abstract
This module doesn’t add too many novel concepts, compared to the previous ones, but still, it shows a trending application of LLVM for security, i.e., fuzzing. Our goal is to create a pass that instruments a target to log the code coverage. As a further exercise, we’ll then implement an additional instrumentation feature, i.e., context sensitivity (simply taken from AFL++).. Our refernce tool will be AFL++, and thus our pass will work with this fuzzer, even though in principle many of the ideas we’re going to exploare are equivalent with other fuzzers.
LLVM Concepts
- Add LLVM IR at compile time
- Use of external variables
What we want to implement
I assume that you already have an idea about how fuzzers work. Anyway, the role of a compiler pass in fuzzing, is to instrument a target according to one of the many existing approaches, in such a way that the instrumentation will collect some information about the status of a certain execution. Such type of information, a.k.a. feedback, drives the fuzzer to discover different program points. For instance, in the first part of this exercise, we develop an edge-coverage instrumentation AFL-style. This essentially logs when new edges are discovered and takes into account how many times these are executed. In the second part instead, we’ll implement a more sophisticated technique to take into account not just the edge-coverage but also the context, i.e., the routines/methods invoked before reaching a certain point.
Our implementation
For this implementation, I’ll refer to AFL++ commit 147654f8715d237fe45c1657c87b2fe36c4db22a. Let’s start.
First, let’s try to reproduce the edge-coverage with hitcounts, typical of AFL/AFL++. As usual, we’ll need to iterate over all basic blocks of the CFG. As you probably know, the core idea is to assign a random id for each block and then encode an adge in the following way:
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;
Thus, this means that we have to access two variables in the runtime, namely, a variable for the shared memory between the fuzzer and the target (__afl_are_ptr
) and a variable for the previous basic block (__afl_prev_loc
). Thus, we’ll start by creating two GlobalVariable objects:
GlobalVariable *AFLMapPtr = new GlobalVariable(M, Int8PTy, false, GlobalValue::ExternalLinkage, 0, "__afl_area_ptr");
GlobalVariable *AFLPrevLoc = new GlobalVariable(M, Int32Ty, false, GlobalValue::ExternalLinkage, 0, "__afl_prev_loc", 0, GlobalVariable::GeneralDynamicTLSModel, 0, false);
Then for each basic block, we need to represent the pseudocode mentioned above. For the current location, we can just create a random number modulo the size of the map. Then we’ll issue a Load instruction to read the value contained in __afl_prev_loc
:
cur_loc = random() % map_size;
Constant* CurLoc = ConstantInt::get(Int32Ty, cur_loc);
LoadInst* LoadPrevLoc = IRB.CreateLoad(AFLPrevLoc);
LoadPrevLoc->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
Value* PrevLoc = static_cast<Value*>(LoadPrevLoc);
At this point we can load the __afl_are_ptr
at the index cur_location ^ prev_location
and increase the counter by one:
LoadInst* MapPtr = IRB.CreateLoad(AFLMapPtr);
MapPtr->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
Value* MapPtrIdx = IRB.CreateGEP(MapPtr, IRB.CreateXor(PrevLoc, CurLoc));
LoadInst* NumberOfHits = IRB.CreateLoad(MapPtrIdx);
NumberOfHits->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
Value* Add = IRB.CreateAdd(NumberOfHits, One);
Finally, we store the result of the addition at the same position inside the bitmap and we shift right the current location by 1:
StoreInst* UpdateHits = IRB.CreateStore(Add, MapPtrIdx);
UpdateHits->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
StoreInst* UpdatePrevLoc = IRB.CreateStore(ConstantInt::get(Int32Ty, cur_loc >> 1), AFLPrevLoc);
UpdatePrevLoc->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
We’re done! Now, we can think of many improvements. For instance, one thing that was introduced in AFL++ is to check that the hitcount never reaches 0 (otherwise this would result in a wrong coverage evaluation when the hitcount overflows). To achieve this, you can use the following code:
Constant* ZeroConst = ConstantInt::get(Int8Ty, 0);
auto comparisonFlag = IRB.CreateICmpEQ(Add, ZeroConst);
auto carry = IRB.CreateZExt(comparisonFlag, Int8Ty);
Add = IRB.CreateAdd(Add, carry);
At this point there’s no limit to the potential improvements that can introduce. You can use two shared maps, you can create different feedback mechanisms, introduce a more lightweight instrumentation policy, etc. Here, just to report an example, I implemented an additional feedback based on context-sensitivity, i.e., when we reach a program point, we don’t just take into account the two basic blocks but also the context (function calls invoked so far essentially). We assign a second random ID for each entry block and then this ID represents the context for that function. Thus, we update a variable to store the current value of the context (i.e., the xor of the previous function IDs).
LoadInst* PrevCtxLoad = IRB.CreateLoad(AFLContext);
PrevCtx = static_cast<Value*>(PrevCtxLoad);
PrevCtxLoad->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
unsigned cur_ctx = random() % MAP_SIZE;
Value* CurCtx = ConstantInt::get(Int32Ty, cur_ctx);
StoreInst* StoreCtx = IRB.CreateStore(IRB.CreateXor(PrevCtx, CurCtx), AFLContext);
StoreCtx->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
Now, if context-sensitivity is enabled, we need to include the context whenever we encode an edge in the shared memory.
Thus, we can xor the PrevCtx
variable with the PrevLoc
:
PrevLoc = IRB.CreateZExt(IRB.CreateXor(PrevLoc, PrevCtx), Int32Ty);
Finally, we need some code to restore the context whenever we exit from the function, e.g., when we meet a ReturnInst
:
if (isa<ReturnInst>(I)) {
IRBuilder<> Restore_IRB(I);
StoreInst* Restore = Restore_IRB.CreateStore(PrevCtx, AFLContext);
Restore->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(*C, None));
}
Don’t forget that at compile-time you’ll have to link against the object file that contains the AFL++ runtime named afl-compiler-rt.o
(see the cc
compiler wrapper python script). Now you can compile a target with our wrapper cc.py
and let the fuzzer run.
As for the other passes that we developed in this course, you can find the code here.