Overview

Link to the Github Link to crates.io There is a lot to break down in this project and structuring something like this in a digestible format is a little difficult, so I am simply going to give an overview of what and why it is and break down a rough description of how the features work. If you are interested in a very technical description, there are a large number of comments in the code and it isn’t particularly long nor daunting.

If you are writing a red teaming tool (read: malware), it’s necessary to call on Windows and NT APIs to perform your malicious activities. This is a core part of really any Windows tooling you will build. There has been a very, very long chain of developments in the task of bypassing detection measures while doing so. All of these tools and techniques, this one included, follow the same basic chain of events:

This project only makes changes to the last two steps. Graphing it out, this process would look like this:

The benefit of this is that it makes analyzing the caller of the syscall more difficult, as the detection mechanism monitoring executing needs to track control flow through both returns (ROP) and jumps (JOP), which is fairly nonstandard behaviour.

Despite some searching I have not managed to find any open source hell’s-whatever implementations that use this technique, though I am under the impression that aceldr might do something similar for its WinAPI calls.

In any case, while this library is extremely unstable and I probably wouldn’t use it for any stability-critical application right now, hopefully you can learn something from me talking through its features and source code. If you find any use for it or make something cool with it or inspired by it please let me know.

Syscall with JOP/ROP Return Address Spoofing

Theory

As mentioned in the overview, the big change in this implementation is that the execution of the syscall instruction (meaning the path to it) and the path to return control flow back to the caller are obfuscated through a user provided ROP/JOP chain that can be calculated at runtime.

This gadget chain can not be any longer than 5 gadgets because of the Windows x64 calling convention. The first gadget in the chain is the memory address of an assembly instruction that jumps to rcx, and it must always be this. It also cannot clobber any other registers, as by the time this jump is made the registers are prepared to make the system call. The other 4 potential arguments are the memory addresses of gadgets which return the user back to the original return address through some obfuscated chain. They are pushed to the stack in left to right order with the exception of the first gadget (since it jumps to the syscall).

For the example in this project, I constructed the following chain:

jmp rcx
pop rcx; ret
ret
jmp rcx

With that chain, whenever the first gadget (jmp rcx) is hit, the stack looks like this:

As the execution begins:

  1. The library stores g1 in the r11 register and jumps in it, beginning the chain
  2. g1 will jump to the syscall instruction stored in the rcx register (which is why this must be the first gadget and why it must be jmp rcx without misaligning any arguments or clobbering any other registers).
  3. Windows syscall instructions are immediately followed by a ret, meaning that execution flow will pass to g2.
  4. g2 begins with a pop rcx instruction, meaning it will then pop the next item on the stack (g3) into the rcx register. It then returns, moving execution flow to g4.
  5. g4 will jump to whatever value is contained within rcx, which is currently a ret instruction.
  6. Control flow is now passed to g3. g3 is simply a return instruction, and the last item on the stack to return to is the caller’s original return address.
  7. Control flow is returned to the calling function.

I understand that this process is convoluted, and to some degree it is on purpose in an attempt to break detection mechanisms. This is an example of what this looks like in WinDbg

Practice

The steps of constructing this chain loosely look like this:

The user calls a function like this:

let poprcx = get_gadgets!(hash!("ntdll.dll"), &[0x59, 0xc3], 16, 12);

This macro expands into a few functions, but in short they parse through the memory sections of ntdll.dll (in this case) and search for the provided byte array against any executable memory sections. This is fairly simple as assembly instructions do not need to be properly aligned, meaning any combination of those bytes will work as a pop rcx; ret assembly instruction. The other two arguments are just constraints for the arrays returned by the macro.

They then construct a chain of pointers using these values like this:

let gadgetchain:[*const c_void; 4] = [pick_random(&jmprcx), pick_random(&poprcx), pick_random(&retgad), pick_random(&jmprcx)];  
 

The pick_random function allows the user to use different gadgets for every execution if they so wished which may aid in detection.

The user then needs to parse out a syscall using the get_syscall! macro. It looks like this:

let NtWriteFile = get_syscall!(hash!("ntdll.dll"), hash!("NtWriteFile")).unwrap();

On the backend, this calls a few functions. First, it finds the base address of ntdll.dll. It then finds the NtWriteFile function exported by ntdll.dll. That function address is then passed to a parsing function, which extracts the SSN and checks the syscall stub for any hooks (such as a jmp instruction which isn’t meant to be there). Provided everything is found, it returns a struct which tells the user the SSN, the address, and whether or not the call is hooked.

Calling a syscall is then fairly straightforward. The following function looks a little bit disgusting because it takes a lot of arguments, but I’ll note the important parts for your viewing pleasure:

let ntstatus = jopcall!(&gadgetchain, NtWriteFile, stdouthandle as *const c_void, null::<*const c_void>(), null::<*const c_void>(), null::<*const c_void>(), &mut iosb as *mut _ as *mut c_void, message.as_ptr() as *const c_void, message.len() as usize, null::<*const c_void>(), 0 as usize);
 

I know that’s miserable to look at, but the majority of that is just legitimate arguments to the NtWriteFile NTAPI. Simplified, it looks more like this:

let ntstatus = jopcall!(&gadgetchain, NtWriteFile, ...);

As long as the user didn’t clob the rax register in their gadget chain, ntstatus should reflect the return value from the syscall and if the arguments were correct then it should execute correctly. Here’s an example (ignore the calc.exe pop, that’s showing off another feature)

Other Features

This library offers a few other features which I will just briefly describe because they aren’t nearly as interesting.

no_std, no_alloc, no Dependency

For maximal flexibility, the entire library is written with zero heap allocations and without use of the standard library. The only dependencies are for the compile time macro which powers the API hashing. Why? Simple, you can compile this to shellcode. Cool.

Dynamic WinAPI Calls & Non-ROP/JOP Syscalls

Macros are exported to call normal WinAPI and Syscalls in a similar manner. This keeps them from being seen in the IAT, but doesn’t require a gadget chain to be passed. This is basically just how the various halls and gates work though so it’s nothing to write about. They look like this:

let ntstatus = functioncall!(hash!("kernel32.dll"), hash!("WinExec"), b"calc.exe\0".as_ptr() as *const i8, 1 as u32); 
let ntstatus = syscall!(NtPowerInformation, 0 as u32, 0 as usize, 0 as u64, &mut outputbuffer, size_of::<[u64;32]>());

API Hashing

In the previous examples, you likely saw every &str wrapped in a hash!() macro. This is a default implementation of API hashing. To make this more flexible, though, the crate defines a static function pointer to do all of its internal hashing with. If you set the RUNTIME_HASHER variable to be a new function, like this:

unsafe fn new_hasher(input: &str)->u128 {    
	input.len() as u128
	}
    println!("Before we overwrite the built in function: {}", format!("{}", 
    
    RUNTIME_HASHER("Hello, world!")));
    
    RUNTIME_HASHER = new_hasher;
    
    println!("After we overwrite the built in function: {}", format!("{}", RUNTIME_HASHER("Hello, world!")));
    

All of the internal library functions will use your newly defined function instead, meaning you can define a new hash!() proc macro and change the hashing algorithm it uses to prevent detection. If you don’t want to do that, though, there’s already a default provided.

Zero Memory Safety

If anything goes wrong in this process you get to experience the joy of trawling through windbg for hours. In retrospect, Rust was probably not the language to write this in. LLVM will attempt to optimize away the functionality if this is compiled with optimization passes turned on, and it’s extremely unstable. It’s probably broken in many different places, but hopefully the theory is useful and someone much better at assembly than I can take this and build something cooler with it.

Improvements

There are an enormous number of improvements that could be made on this library and some point I might get around to them.

  1. There is no support for x86 or older versions of Windows with different PE Headers
  2. The assembly is probably unstable, and avoiding stack modification leads to a 5 gadget limit.
  3. There is an abundance of macros in the code because it is written in no_std rust. Performing these actions manually requires a lot of function calls.

Other References

www.geoffchappell.com https://www.nirsoft.net/ Plenty of other places, naturally. I will add them as I remember them