理解池污染三部曲

msdn上关于池污染的官方说明。

  • Understanding Pool Corruption Part 1 (MSDN) - here
  • Understanding Pool Corruption Part 2 (MSDN) - here
  • Understanding Pool Corruption Part 3 (MSDN) - here

理解池污染三部曲

Understanding Pool Corruption Part 1 – Buffer Overflows

理解池污染第一部分:缓冲区溢出

Before we can discuss pool corruption we must understand what pool is. Pool is kernel mode memory used as a storage space for drivers. Pool is organized in a similar way to how you might use a notepad when taking notes from a lecture or a book. Some notes may be 1 line, others may be many lines. Many different notes are on the same page.

在我们讨论池污染之前必须先理解什么是池。池是内核模式下为驱动程序所需的存储空间而准备的内存。池的组织结构和我们日常使用记事本的方式很像,比如你从一本书或文献中摘取一段内容。某段内容可能只有一行,某段则可能有多行。多段不同的内容可以在同一页上。

Memory is also organized into pages, typically a page of memory is 4KB. The Windows memory manager breaks up this 4KB page into smaller blocks. One block may be as small as 8 bytes or possibly much larger. Each of these blocks exists side by side with other blocks.

内存也被组织成页,典型的页内存大小为4KB。Windows内存管理器把4K的页拆分成较小的块。每个块的大小至少是8字节,也可能更大一些。这些块彼此相互毗邻。

The !pool command can be used to see the pool blocks stored in a page.

!pool命令可以列出单个页上存储的池块。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
kd> !pool fffffa8003f42000
Pool page fffffa8003f42000 region is Nonpaged pool
*fffffa8003f42000 size: 410 previous size: 0 (Free) *Irp
Pooltag Irp : Io, IRP packets
fffffa8003f42410 size: 40 previous size: 410 (Allocated) MmSe
fffffa8003f42450 size: 150 previous size: 40 (Allocated) File
fffffa8003f425a0 size: 80 previous size: 150 (Allocated) Even
fffffa8003f42620 size: c0 previous size: 80 (Allocated) EtwR
fffffa8003f426e0 size: d0 previous size: c0 (Allocated) CcBc
fffffa8003f427b0 size: d0 previous size: d0 (Allocated) CcBc
fffffa8003f42880 size: 20 previous size: d0 (Free) Free
fffffa8003f428a0 size: d0 previous size: 20 (Allocated) Wait
fffffa8003f42970 size: 80 previous size: d0 (Allocated) CM44
fffffa8003f429f0 size: 80 previous size: 80 (Allocated) Even
fffffa8003f42a70 size: 80 previous size: 80 (Allocated) Even
fffffa8003f42af0 size: d0 previous size: 80 (Allocated) Wait
fffffa8003f42bc0 size: 80 previous size: d0 (Allocated) CM44
fffffa8003f42c40 size: d0 previous size: 80 (Allocated) Wait
fffffa8003f42d10 size: 230 previous size: d0 (Allocated) ALPC
fffffa8003f42f40 size: c0 previous size: 230 (Allocated) EtwR

Because many pool allocations are stored in the same page, it is critical that every driver only use the space they have allocated. If DriverA uses more space than it allocated they will write into the next driver’s space (DriverB) and corrupt DriverB’s data. This overwrite into the next driver’s space is called a buffer overflow. Later either the memory manager or DriverB will attempt to use this corrupted memory and will encounter unexpected information. This unexpected information typically results in a blue screen.

因为大多数分配的池块都在同一页上,所以每个驱动程序只能使用自己分配的空间。如果DriverA使用了超出自身分配的内存空间,就有可能把数据写入到下一个驱动程序空间(DriverB)并污染DriverB的数据。这种写入到下一个驱动空间的行为被称为池溢出。此后内存管理器或DriverB可能会尝试使用这块被污染的内存并遭遇未曾料到的问题。该问题一个典型的表现就是引起蓝屏。

The NotMyFault application from Sysinternals has an option to force a buffer overflow. This can be used to demonstrate pool corruption. Choosing the “Buffer overflow” option and clicking “Crash” will cause a buffer overflow in pool. The system may not immediately blue screen after clicking the Crash button. The system will remain stable until something attempts to use the corrupted memory. Using the system will often eventually result in a blue screen.

Sysinternals工具集中的NotMyFault程序有一个强制触发缓冲区溢出的选项。它可以用来演示如何触发池污染。选择”Buffer overflow”选项并单击Crash按钮来引起池中的缓冲区溢出。按下Crash按钮后,系统不会立即蓝屏。系统会稳定运行一段时间,直到有什么尝试去使用这块被污染的内存。最终的结果一般是触发蓝屏。

Often pool corruption appears as a stop 0x19 BAD_POOL_HEADER or stop 0xC2 BAD_POOL_CALLER. These stop codes make it easy to determine that pool corruption is involved in the crash. However, the results of accessing unexpected memory can vary widely, as a result pool corruption can result in many different types of bugchecks.

通常池污染表现为停止码0x19 BAD_POOL_HEADER或0xC2 BAD_POOL_CALLER。这些停止码简化了本次崩溃是由于池污染这一原因的判断。然而,可观察的非期望内存一般每次都不同,毕竟池污染可以引起多种不同类型的bugcheck。

As with any blue screen dump analysis the best place to start is with !analyze -v. This command will display the stop code and parameters, and do some basic interpretation of the crash.

分析任何的蓝屏转储,一开始最有效的方法就是执行!analyze -v命令。该命令会展示停止码以及参数,并做一些基本的崩溃解读。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
SYSTEM_SERVICE_EXCEPTION (3b)
An exception happened while executing a system service routine.
Arguments:
Arg1: 00000000c0000005, Exception code that caused the bugcheck
Arg2: fffff8009267244a, Address of the instruction which caused the bugcheck
Arg3: fffff88004763560, Address of the context record for the exception that caused the bugcheck
Arg4: 0000000000000000, zero.

In my example the bugcheck was a stop 0x3B SYSTEM_SERVICE_EXCEPTION. The first parameter of this stop code is c0000005, which is a status code for an access violation. An access violation is an attempt to access invalid memory (this error is not related to permissions). Status codes can be looked up in the WDK header ntstatus.h.

在我的例子中,bugcheck停止码是0x3B SYSTEM_SERVICE_EXCEPTION。停止码的第一个参数是c0000005,这是一个访问违例的状态码。访问违例是指尝试访问不合法的内存(该错误与权限许可相关)。状态码可以在WDK的头文件ntstatus.h中查到。

The !analyze -v command also provides a helpful shortcut to get into the context of the failure.

!analyze -v命令也提供了获取失败上下文的有效捷径。

CONTEXT: fffff88004763560 – (.cxr 0xfffff88004763560;r)

Running this command shows us the registers at the time of the crash.

运行该命令,显示崩溃时的寄存器。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
kd> .cxr 0xfffff88004763560
rax=4f4f4f4f4f4f4f4f rbx=fffff80092690460 rcx=fffff800926fbc60
rdx=0000000000000000 rsi=0000000000001000 rdi=0000000000000000
rip=fffff8009267244a rsp=fffff88004763f60 rbp=fffff8009268fb40
r8=fffffa8001a1b820 r9=0000000000000001 r10=fffff800926fbc60
r11=0000000000000011 r12=0000000000000000 r13=fffff8009268fb48
r14=0000000000000012 r15=000000006374504d
iopl=0 nv up ei pl nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206
nt!ExAllocatePoolWithTag+0x442:
fffff800`9267244a 4c8b4808 mov r9,qword ptr [rax+8] ds:002b:4f4f4f4f`4f4f4f57=????????????????

From the above output we can see that the crash occurred in ExAllocatePoolWithTag, which is a good indication that the crash is due to pool corruption. Often an engineer looking at a dump will stop at this point and conclude that a crash was caused by corruption, however we can go further.

从上面的输出可以看到,崩溃发生在ExAllocatePoolWithTag中,这是个很好的标志用以说明本次崩溃和池污染有关。通常如果工程师分析转储时会在这里停下并大胆推断该崩溃是由污染引起,不过我们要更深入下去。

The instruction that we failed on was dereferencing rax+8. The rax register contains 4f4f4f4f4f4f4f4f, which does not fit with the canonical form required for pointers on x64 systems. This tells us that the system crashed because the data in rax is expected to be a pointer but it is not one.

引起错误的命令是去解引用rax+8。rax寄存器的值为0x4f4f4f4f4f4f4f4f,这和x64系统上的所需指针规格有所出入。也就是说系统崩溃是因为rax中的数据本应是个指针而这一次并不是。

To determine why rax does not contain the expected data we must examine the instructions prior to where the failure occurred.

想要了解rax为何没有包含一个期望的值,我们需要看看失败位置前面的指令。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
kd> ub .
nt!KzAcquireQueuedSpinLock [inlined in nt!ExAllocatePoolWithTag+0x421]:
fffff800`92672429 488d542440 lea rdx,[rsp+40h]
fffff800`9267242e 49875500 xchg rdx,qword ptr [r13]
fffff800`92672432 4885d2 test rdx,rdx
fffff800`92672435 0f85c3030000 jne nt!ExAllocatePoolWithTag+0x7ec (fffff800`926727fe)
fffff800`9267243b 48391b cmp qword ptr [rbx],rbx
fffff800`9267243e 0f8464060000 je nt!ExAllocatePoolWithTag+0xa94 (fffff800`92672aa8)
fffff800`92672444 4c8b03 mov r8,qword ptr [rbx]
fffff800`92672447 498b00 mov rax,qword ptr [r8]

The assembly shows that rax originated from the data pointed to by r8. The .cxr command we ran earlier shows that r8 is fffffa8001a1b820. If we examine the data at fffffa8001a1b820 we see that it matches the contents of rax, which confirms this memory is the source of the unexpected data in rax.

汇编代码显示了rax源于r8指向的数据。.cxr命令可以显示出r8是fffffa8001a1b820。如果查看fffffa8001a1b820处的数据会发现它和rax值相同,这证实了该内存就是rax中非期望的数据源头。

1
2
3
kd> dq fffffa8001a1b820 l1
fffffa80`01a1b820 4f4f4f4f`4f4f4f4f

To determine if this unexpected data is caused by pool corruption we can use the !pool command.

为了确定这一非期望数据是由池污染所引起,我们可以用!pool命令。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
kd> !pool fffffa8001a1b820
Pool page fffffa8001a1b820 region is Nonpaged pool
fffffa8001a1b000 size: 810 previous size: 0 (Allocated) None
fffffa8001a1b810 doesn't look like a valid small pool allocation, checking to see
if the entire page is actually part of a large page allocation...
fffffa8001a1b810 is not a valid large pool allocation, checking large session pool...
fffffa8001a1b810 is freed (or corrupt) pool
Bad previous allocation size @fffffa8001a1b810, last size was 81
***
*** An error (or corruption) in the pool was detected;
*** Attempting to diagnose the problem.
***
*** Use !poolval fffffa8001a1b000 for more details.
Pool page [ fffffa8001a1b000 ] is __inVALID.
Analyzing linked list...
[ fffffa8001a1b000 --> fffffa8001a1b010 (size = 0x10 bytes)]: Corrupt region
Scanning for single bit errors...
None found

The above output does not look like the !pool command we used earlier. This output shows corruption to the pool header which prevented the command from walking the chain of allocations.

上面的输出看起来和之前的!pool不太像。这一输出展示了池头的污染,它阻断了遍历分配链的命令。

The above output shows that there is an allocation at fffffa8001a1b000 of size 810. If we look at this memory we should see a pool header. Instead what we see is a pattern of 4f4f4f4f`4f4f4f4f.

上面的输出显示,在fffffa8001a1b000处有一个810大小的分配。如果我们查看该内存地址的话,本应看到一个池头。然而,我们这里看到的却是4f4f4f4f`4f4f4f4f。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
kd> dq fffffa8001a1b000 + 810
fffffa80`01a1b810 4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f
fffffa80`01a1b820 4f4f4f4f`4f4f4f4f 4f4f4f4f`4f4f4f4f
fffffa80`01a1b830 4f4f4f4f`4f4f4f4f 00574f4c`46524556
fffffa80`01a1b840 00000000`00000000 00000000`00000000
fffffa80`01a1b850 00000000`00000000 00000000`00000000
fffffa80`01a1b860 00000000`00000000 00000000`00000000
fffffa80`01a1b870 00000000`00000000 00000000`00000000
fffffa80`01a1b880 00000000`00000000 00000000`00000000

At this point we can be confident that the system crashed because of pool corruption.

到此,我们可以确认系统崩溃的原因是池污染。

Because the corruption occurred in the past, and a dump is a snapshot of the current state of the system, there is no concrete evidence to indicate how the memory came to be corrupted. It is possible the driver that allocated the pool block immediately preceding the corruption is the one that wrote to the wrong location and caused this corruption. This pool block is marked with the tag “None”, we can search for this tag in memory to determine which drivers use it.

由于污染发生在过去,而转储只是当前系统状态的一个快照,没有具体的证据来指出内存是如何被污染的。可能是驱动在分配池块前,有谁在错误的位置写了数据并引起污染。该池块被标记为”None”标签,我们可以在内存中搜索这一标签来找出哪个驱动程序用了它。

1
2
3
4
5
6
7
8
9
10
11
12
13
kd> !for_each_module s -a @#Base @#End "None"
fffff800`92411bc2 4e 6f 6e 65 e9 45 04 26-00 90 90 90 90 90 90 90 None.E.&........
kd> u fffff800`92411bc2-1
nt!ExAllocatePool+0x1:
fffff800`92411bc1 b84e6f6e65 mov eax,656E6F4Eh
fffff800`92411bc6 e945042600 jmp nt!ExAllocatePoolWithTag (fffff800`92672010)
fffff800`92411bcb 90 nop

The file Pooltag.txt lists the pool tags used for pool allocations by kernel-mode components and drivers supplied with Windows, the associated file or component (if known), and the name of the component. Pooltag.txt is installed with Debugging Tools for Windows (in the triage folder) and with the Windows WDK (in \tools\other*platform*\poolmon). Pooltag.txt shows the following for this tag:

文件Pooltag.txt列举了Windows支持的内核模式组件和驱动所用到的池分配标签、关联的文件或组件(如果知道的话)以及组件的名称。Pooltag.txt内置在Debugging Tools中(在triage目录),在Windows WDK中也有(\tools\other*platform*\poolmon)。Pooltag.txt显示了该标签:

None - - call to ExAllocatePool

Unfortunately what we find is that this tag is used when a driver calls ExAllocatePool, which does not specify a tag. This does not allow us to determine what driver allocated the block prior to the corruption. Even if we could tie the tag back to a driver it may not be sufficient to conclude that the driver using this tag is the one that corrupted the memory.

不幸的是,我们找到的标签是在驱动调用ExAllocatePool时不指定标签时所用。这让我们无法判断到底是哪个驱动分配了污染前的块。即使我们可以将该标签与驱动绑定,想要推断驱动使用的标签就是被污染的那一个,也缺少证据。

The next step should be to enable special pool and hope to catch the corruptor in the act. We will discuss special pool in our next article.

下一步我们会通过激活特殊池来抓到污染源。我们会在下篇文章中讨论特殊池。

Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns

##理解池污染第二部分:特殊池下的缓冲区溢出

In our previous article we discussed pool corruption that occurs when a driver writes too much data in a buffer. In this article we will discuss how special pool can help identify the driver that writes too much data.

在上文中我们讨论了池污染发生的场景——某个驱动写缓冲区时写了太多数据而越界。本文中我们会讨论特殊池是如何鉴别真凶的。

Pool is typically organized to allow multiple drivers to store data in the same page of memory, as shown in Figure 1. By allowing multiple drivers to share the same page, pool provides for an efficient use of the available kernel memory space. However this sharing requires that each driver be careful in how it uses pool, any bugs where the driver uses pool improperly may corrupt the pool of other drivers and cause a crash.

池经常被典型的组织成,允许多个驱动在同一内存页上保存数据,如下图。为了支持多个驱动共享同一页,池提供了一种可用内核内存空间的有效用法。然而这一共享需要每个驱动小心翼翼的使用自己的池,驱动程序的池相关bug会污染其他驱动的池并引起崩溃。

Uncorrupted Pool

With pool organized as shown in Figure 1, if DriverA allocates 100 bytes but writes 120 bytes it will overwrite the pool header and data stored by DriverB. In Part 1 we demonstrated this type of buffer overflow using NotMyFault, but we were not able to identify which code had corrupted the pool.

如上图显示的池结构,如果DriverA分配了100字节但是却写入了200字节,他就会覆盖掉DriverB的池块的头部和数据。在第一部分中,我们用NotMyFault展示了这种缓冲区溢出,但是我们无法找出污染池的是哪些代码。

Corrupted Pool

To catch the driver that corrupted pool we can use special pool. Special pool changes the organization of the pool so that each driver’s allocation is in a separate page of memory. This helps prevent drivers from accidentally writing to another driver’s memory. Special pool also configures the driver’s allocation at the end of the page and sets the next virtual page as a guard page by marking it as invalid. The guard page causes an attempt to write past the end of the allocation to result in an immediate bugcheck.

为了抓住污染池的真凶,我们可以使用特殊池。特殊池修改了池的组织结构以至于每个驱动的池块都分配在一个独立的内存页上。这阻止了驱动意外的写入到另一个驱动的内存空间。特殊池同时也配置了驱动会分配在页的尾部,同时会通过将下一个虚拟页标记为非法来将其设定为守护页。当写入的数据位置超过了分配尾端并试图向守护页写入时,会立即引起一个bugcheck。

Special pool also fills the unused portion of the page with a repeating pattern, referred to as “slop bytes”. These slop bytes will be checked when the page is freed, if any errors are found in the pattern a bugcheck will be generated to indicate that the memory was corrupted. This type of corruption is not a buffer overflow, it may be an underflow or some other form of corruption.

特殊池也同时用一个重复的pattern填充了页未使用的部分,作为”溢出字节”。这些溢出字节会在页被释放时检测到,如果发现了任何错误就会生成一个bugcheck来指示内存被污染了。这一污染不是缓冲区溢出,他可能是个下溢或者其他形式的污染。

Special Pool

Because special pool stores each pool allocation in its own 4KB page, it causes an increase in memory usage. When special pool is enabled the memory manager will configure a limit of how much special pool may be allocated on the system, when this limit is reached the normal pools will be used instead. This limitation may be especially pronounced on 32-bit systems which have less kernel space than 64-bit systems.

因为特殊池把自身的每个池分配都存储在自己的4K页上,它会引起内存使用上的增加。当特殊池被启用时,内存管理器会配置一个系统上可被分配的特殊池的上限,当上限值达到时,就会使用普通的池取而代之。这一限制尤其在32位系统上声明,它比64位系统的内核空间小太多了。

Now that we have explained how special pool works, we should use it.

现在我们解释过了特殊池是如何工作的,我们现在使用它。

There are two methods to enable special pool. Driver verifier allows special pool to be enabled on specific drivers. The PoolTag registry value described in KB188831 allows special pool to be enabled for a particular pool tag. Starting in Windows Vista and Windows Server 2008, driver verifier captures additional information for special pool allocations so this is typically the recommended method.

有两种方法来启用特殊池。Driver verifier可以在特定驱动上启用特殊池。在 KB188831中描述的PoolTag注册的值允许特殊池以一个特定的池标签来启用。从Windows Vista和Server 2008开始,driver verifier还会捕获特殊池分配额外的信息,所以这种方法极为推荐。

To enable special pool using driver verifier use the following command line, or choose the option from the verifier GUI. Use the /driver flag to specify drivers you want to verify, this is the place to list drivers you suspect as the cause of the problem. You may want to verify drivers you have written and want to test or drivers you have recently updated on the system. In the command line below I am only verifying myfault.sys. A reboot is required to enable special pool.

使用driver verifier启用特殊池要用下面的命令行,或者用verifier GUI配置选项。使用/driver来指定驱动,它用于列举你怀疑可能引起问题的驱动程序。你可能想要核查你写过的驱动或者系统上最近更新过的驱动。在下面的命令行中我仅仅核查myfault.sys。激活特殊池需要重启。

verifier /flags 1 /driver myfault.sys

After enabling verifier and rebooting the system, repeat the activity that causes the crash. For some problems the activity may just be to wait for a period of time. For our demonstration we are running NotMyFault (see Part 1 for details).

激活并重启后,重复上文中引起崩溃的操作。由于一些原因,我们需要等待一段时间。为了演示我们运行NotMyFault(更多细节参考第一部分)。

The crash resulting from a buffer overflow in special pool will be a stop 0xD6, DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION.

在特殊池中由于缓冲区溢出而崩溃,停止码为0xD6 DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION (d6)
N bytes of memory was allocated and more than N bytes are being referenced.
This cannot be protected by try-except.
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: fffff9800b5ff000, memory referenced
Arg2: 0000000000000001, value 0 = read operation, 1 = write operation
Arg3: fffff88004f834eb, if non-zero, the address which referenced memory.
Arg4: 0000000000000000, (reserved)

We can debug this crash and determine that notmyfault.sys wrote beyond its pool buffer.

我们可以调试这一崩溃,判断notmyfault.sys执行了越界写。

The call stack shows that myfault.sys accessed invalid memory and this generated a page fault.

栈回溯显示了myfault.sys访问了非法的内存,导致了一个页错误。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
kd> k
Child-SP RetAddr Call Site
fffff880`04822658 fffff803`721333f1 nt!KeBugCheckEx
fffff880`04822660 fffff803`720acacb nt! ?? ::FNODOBFM::`string'+0x33c2b
fffff880`04822700 fffff803`7206feee nt!MmAccessFault+0x55b
fffff880`04822840 fffff880`04f834eb nt!KiPageFault+0x16e
fffff880`048229d0 fffff880`04f83727 myfault+0x14eb
fffff880`04822b20 fffff803`72658a4a myfault+0x1727
fffff880`04822b80 fffff803`724476c7 nt!IovCallDriver+0xba
fffff880`04822bd0 fffff803`7245c8a6 nt!IopXxxControlFile+0x7e5
fffff880`04822d60 fffff803`72071453 nt!NtDeviceIoControlFile+0x56
fffff880`04822dd0 000007fc`4fe22c5a nt!KiSystemServiceCopyEnd+0x13
00000000`004debb8 00000000`00000000 0x000007fc`4fe22c5a

The !pool command shows that the address being referenced by myfault.sys is special pool.

!pool命令显示了myfault.sys引用的内存是特殊池。

1
2
3
4
5
kd> !pool fffff9800b5ff000
Pool page fffff9800b5ff000 region is Special pool
fffff9800b5ff000: Unable to get contents of special pool block

The page table entry shows that the address is not valid. This is the guard page used by special pool to catch overruns.

页表项显示了这一地址是非法的。这是特殊池用于捕捉溢出的守护页。

1
2
3
4
5
6
7
8
9
kd> !pte fffff9800b5ff000
VA fffff9800b5ff000
PXE at FFFFF6FB7DBEDF98 PPE at FFFFF6FB7DBF3000 PDE at FFFFF6FB7E6002D0 PTE at FFFFF6FCC005AFF8
contains 0000000001B8F863 contains 000000000138E863 contains 000000001A6A1863 contains 0000000000000000
pfn 1b8f ---DA--KWEV pfn 138e ---DA--KWEV pfn 1a6a1 ---DA--KWEV not valid

The allocation prior to this memory is an 800 byte block of non paged pool tagged as “Wrap”. “Wrap” is the tag used by verifier when pool is allocated without a tag, it is the equivalent to the “None” tag we saw in Part 1.

该段内存前的800字节非分页内存池块被标记为”Wrap”。”Wrap”是verifier在分配未使用标签时,使用的标签,它和第一部分看到的”None”标签是等价的。

1
2
3
4
5
6
7
kd> !pool fffff9800b5ff000-1000
Pool page fffff9800b5fe000 region is Special pool
*fffff9800b5fe000 size: 800 data: fffff9800b5fe800 (NonPaged) *Wrap
Owning component : Unknown (update pooltag.txt)

Special pool is an effective mechanism to track down buffer overflow pool corruption. It can also be used to catch other types of pool corruption which we will discuss in future articles.

特殊池是一个追溯缓冲区溢出污染的有效机制。他也可以用于捕捉其他类型的池污染,我们会在下文中说明。

Understanding Pool Corruption Part 3 – Special Pool for Double Frees

理解池污染第三部分:特殊池下的Double Free

In Part 1 and Part 2 of this series we discussed pool corruption and how special pool can be used to identify the cause of such corruption. In today’s article we will use special pool to catch a double free of pool memory.

在第一和第二部分我们讨论了池污染以及如何使用特殊池来侦测池污染的真凶。今天我们将使用特殊池来捕捉池内存的二次释放。

A double free of pool will cause a system to blue screen, however the resulting crash may vary. In the most obvious scenario a driver that frees a pool allocation twice will cause the system to immediately crash with a stop code of C2 BAD_POOL_CALLER, and the first parameter will be 7 to indicate “Attempt to free pool which was already freed”. If you experience such a crash, enabling special pool should be high on your list of troubleshooting steps.

池的二次释放会引起一个系统蓝屏,然而导致崩溃的结果多种多样。最明显的就是一个驱动释放了同一个池块两次,此时立即引起系统崩溃,停止码为0xC2 BAD_POOL_CALLER,第一个参数是7,表示“试图释放已经释放的池块”。如果你遇到过这样的崩溃,启用特殊池应该是定位问题的不二手段。

1
2
3
4
5
6
7
8
9
10
11
12
13
BAD_POOL_CALLER (c2)
The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 00000000000011c1, (reserved)
Arg3: 0000000004810007, Memory contents of the pool block
Arg4: fffffa8001b10800, Address of the block of pool being deallocated

A less obvious crash would be if the pool has been reallocated. As we showed in Part 2, pool is structured so that multiple drivers share a page. When DriverA calls ExFreePool to free its pool block the block is made available for other drivers. If memory manager gives this memory to DriverF, and then DriverA frees it a second time, a crash may occur in DriverF when the pool allocation no longer contains the expected data. Such a problem may be difficult for the developer of DriverF to identify without special pool.

如果该池块被重新分配过了,那么崩溃现场不是那么显而易见。如在第二部分中描述的,池是一个被多个驱动共享同一页的结构。当DriverA调用ExFreePool来释放池块时,该池块对其他驱动来说就变成可用的。如果内存管理器把这段内存给了DriverF,而DriverA在此后二次释放它的话,就会引起DriverF的崩溃,该池块不再包含期望的数据。如果不用特殊池,这样一个问题对DriverF的开发者来说难以甄别原因。

Special pool will place each driver’s allocation in a separate page of memory (as discussed in Part 2). When a driver frees a pool block in special pool the whole page will be freed, and any access to a free page will cause an immediate bugcheck. Additionally, special pool will place this page on the tail of the list of pages to be used again. This increases the likelihood that the page will still be free when it is freed a second time, decreasing the likelihood of the DriverA/DriverF scenario shown above.

特殊池会把每个驱动的分配块都放在独立的内存页上(第二部分已讨论过)。当某驱动在特殊池释放池块时,整个页都会被释放,任何访问该页的操作都会立即引起bugcheck。此外,特殊池会把这些页放在页链表的最后以重复使用。这会增加页被二次释放时仍然处于空闲态的可能性,降低上面描述的DriverA/DriverF经典情景发生的可能性。

To demonstrate this failure we will once again use the Sysinternals tool NotMyFault. Choose the “Double free” option and click “Crash”. Most likely you will get the stop C2 bugcheck mentioned above. Enable special pool and reboot to get a more informative error.

为了演示我们再次使用Sysinternals工具NotMyFault。选择“Double free“选项并单击Crash。极大可能的,你会看到如上面描述般C2停止码的bugcheck。激活特殊池并重启以获取更有信息量的错误。

verifier /flags 1 /driver myfault.sys

Choosing the “Double free” option with special pool enabled resulted in the following crash. The bugcheck code PAGE_FAULT_IN_NONPAGED_AREA means some driver tried to access memory that was not valid. This invalid memory was the freed special pool page.

激活特殊池并选择”Double free”选项,崩溃结果如下。BugCheck码为PAGE_FAULT_IN_NONPAGED_AREA,这意味着某个驱动正在试图访问非法的内存。该非法内存时一个释放了的特殊池页。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fffff9800a7fe7f0, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff80060263888, If non-zero, the instruction address which referenced the bad memory address.
Arg4: 0000000000000002, (reserved)

Looking at the call stack we can see myfault.sys was freeing pool and ExFreePoolSanityChecks took a page fault that lead to the crash.

看看栈回溯,我们可以发现myfault.sys在释放池块,ExFreePoolSanityChecks引起了一个缺页异常并最终导致崩溃。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
kd> kn
# Child-SP RetAddr Call Site
00 fffff880`0419fe28 fffff800`5fd7e28a nt!DbgBreakPointWithStatus
01 fffff880`0419fe30 fffff800`5fd7d8de nt!KiBugCheckDebugBreak+0x12
02 fffff880`0419fe90 fffff800`5fc5b544 nt!KeBugCheck2+0x79f
03 fffff880`041a05b0 fffff800`5fd1c5bc nt!KeBugCheckEx+0x104
04 fffff880`041a05f0 fffff800`5fc95acb nt! ?? ::FNODOBFM::`string'+0x33e2a
05 fffff880`041a0690 fffff800`5fc58eee nt!MmAccessFault+0x55b
06 fffff880`041a07d0 fffff800`60263888 nt!KiPageFault+0x16e
07 fffff880`041a0960 fffff800`6024258c nt!ExFreePoolSanityChecks+0xe8
08 fffff880`041a09a0 fffff880`04c9b5d9 nt!VerifierExFreePoolWithTag+0x3c
09 fffff880`041a09d0 fffff880`04c9b727 myfault!MyfaultDeviceControl+0x2fd
0a fffff880`041a0b20 fffff800`60241a4a myfault!MyfaultDispatch+0xb7
0b fffff880`041a0b80 fffff800`600306c7 nt!IovCallDriver+0xba
0c fffff880`041a0bd0 fffff800`600458a6 nt!IopXxxControlFile+0x7e5
0d fffff880`041a0d60 fffff800`5fc5a453 nt!NtDeviceIoControlFile+0x56
0e fffff880`041a0dd0 000007fd`ea212c5a nt!KiSystemServiceCopyEnd+0x13

Using the address from the bugcheck code, we can verify that the memory is in fact not valid:

使用bugcheck码处的地址,可以核实该段内存是非法的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
kd> dd fffff9800a7fe7f0
fffff980`0a7fe7f0 ???????? ???????? ???????? ????????
fffff980`0a7fe800 ???????? ???????? ???????? ????????
fffff980`0a7fe810 ???????? ???????? ???????? ????????
fffff980`0a7fe820 ???????? ???????? ???????? ????????
fffff980`0a7fe830 ???????? ???????? ???????? ????????
fffff980`0a7fe840 ???????? ???????? ???????? ????????
fffff980`0a7fe850 ???????? ???????? ???????? ????????
fffff980`0a7fe860 ???????? ???????? ???????? ????????
kd> !pte fffff9800a7fe7f0
VA fffff9800a7fe7f0
PXE at FFFFF6FB7DBEDF98 PPE at FFFFF6FB7DBF3000 PDE at FFFFF6FB7E600298 PTE at FFFFF6FCC0053FF0
contains 0000000002A91863 contains 0000000002A10863 contains 0000000000000000
pfn 2a91 ---DA--KWEV pfn 2a10 ---DA--KWEV not valid

So far we have enough evidence to prove that myfault.sys was freeing invalid memory, but how to we know this memory is being freed twice? If there was a double free we need to determine if the first or second call to ExFreePool was incorrect. To this so we need to determine what code freed the memory first.

到此我们已经有了足够的证据来证实myfault.sys在释放一段非法的内存,但是要如何知道这段内存被释放了两次呢?如果是二次释放的话我们需要判断第一次或第二次对ExFreePool的调用是有误的。因此我们需要判断该内存第一次释放时的代码。

Driver Verifier special pool keeps track of the last 0x10000 calls to allocate and free pool. You can dump this database with the !verifier 80 command. To limit the data output you can also pass this command the address of the memory you suspect was double freed.

Driver verifier特殊池保留了最后的0x10000个分配和释放池块调用的回溯。你可以通过!verifier 80命令将其转储成数据库。你也可以通过传递该疑似二次释放的内存地址作为参数来限制输出的数据。

Don’t assume the address in the bugcheck code is the address being freed, go get the address from the function that called VerifierExFreePoolWithTag.

不要假定bugcheck码就是被释放的地址,从VerifierExFreePoolWithTag的调用处获取地址。

In the above call stack the call below VerifierExFreePoolWithTag is frame 9 (start counting with 0, or use kn).

上面的栈回溯中,VerifierExFreePoolWithTag下面的调用是帧9(从0开始数,或者使用kn命令)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
kd> .frame /r 9
09 fffff880`041a09d0 fffff880`04c9b727 myfault+0x15d9
rax=0000000000000000 rbx=fffff9800a7fe800 rcx=fffff9800a7fe800
rdx=fffffa8001a37fa0 rsi=fffffa80035975e0 rdi=fffffa8003597610
rip=fffff88004c9b5d9 rsp=fffff880041a09d0 rbp=fffffa80034568d0
r8=fffff9800a7fe801 r9=fffff9800a7fe7f0 r10=fffff9800a7fe800
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=fffff800600306c7 r15=fffffa8004381b80
iopl=0 nv up ei ng nz na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000286
myfault+0x15d9:
fffff880`04c9b5d9 eb7a jmp myfault+0x1655 (fffff880`04c9b655)

On x64 systems the first parameter is passed in rcx. The below assembly shows that rcx originated from rbx.

x64系统上第一个参数由rcx传递。下面的汇编显示了rcx从rbx获取。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
kd> ub fffff880`04c9b5d9
myfault+0x15ba:
fffff880`04c9b5ba ff15a80a0000 call qword ptr [myfault+0x2068 (fffff880`04c9c068)]
fffff880`04c9b5c0 33d2 xor edx,edx
fffff880`04c9b5c2 488bc8 mov rcx,rax
fffff880`04c9b5c5 488bd8 mov rbx,rax
fffff880`04c9b5c8 ff154a0a0000 call qword ptr [myfault+0x2018 (fffff880`04c9c018)]
fffff880`04c9b5ce 33d2 xor edx,edx
fffff880`04c9b5d0 488bcb mov rcx,rbx
fffff880`04c9b5d3 ff153f0a0000 call qword ptr [myfault+0x2018 (fffff880`04c9c018)]

Run !verifier 80 using the address from rbx:

运行!verifier 80,使用rbx中的地址。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
kd> !verifier 80 fffff9800a7fe800
Log of recent kernel pool Allocate and Free operations:
There are up to 0x10000 entries in the log.
Parsing 0x0000000000010000 log entries, searching for address 0xfffff9800a7fe800.
======================================================================
Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0
fffff80060251a32 nt!VfFreePoolNotification+0x4a
fffff8005fe736c9 nt!ExFreePool+0x595
fffff80060242597 nt!VerifierExFreePoolWithTag+0x47
fffff88004c9b5ce myfault!MyfaultDeviceControl+0x2f2
fffff88004c9b727 myfault!MyfaultDispatch+0xb7
fffff80060241a4a nt!IovCallDriver+0xba
fffff800600306c7 nt!IopXxxControlFile+0x7e5
fffff800600458a6 nt!NtDeviceIoControlFile+0x56
fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13
======================================================================
Pool block fffff9800a7fe800, Size 0000000000000800, Thread fffffa80046ce4c0
fffff80060242a5d nt!VeAllocatePoolWithTagPriority+0x2d1
fffff8006024b20e nt!XdvExAllocatePoolInternal+0x12
fffff80060242f69 nt!VerifierExAllocatePool+0x61
fffff88004c9b5c0 myfault!MyfaultDeviceControl+0x2e4
fffff88004c9b727 myfault!MyfaultDispatch+0xb7
fffff80060241a4a nt!IovCallDriver+0xba
fffff800600306c7 nt!IopXxxControlFile+0x7e5
fffff800600458a6 nt!NtDeviceIoControlFile+0x56
fffff8005fc5a453 nt!KiSystemServiceCopyEnd+0x13

The above output shows the pool block being allocated by myfault.sys and then freed by myfault.sys. If we combine this information with the call stack leading up to our bugcheck we can conclude that the pool was freed once in MyfaultDeviceControl at offset 0x2f2, then freed again in MyfaultDeviceControl at offset 0x2fd.

上面的输出显示myfault.sys分配了池块,后来又释放了池块。如果我们组合栈回溯和bugcheck的信息,就可以推断出池块被第一次释放是在MyfaultDeviceControl偏移0x2f2处,第二次释放是在MyfaultDeviceControl偏移0x2fd处。

Now we know which driver is causing the problem, and if this is our driver we know which area of the code to investigate.

现在我们知道了到底是哪个驱动引起的问题,如果这是我们的驱动我们就知道该去调查哪段代码了。

文章目录
  1. 1. 理解池污染三部曲
    1. 1.1. Understanding Pool Corruption Part 1 – Buffer Overflows
    2. 1.2. 理解池污染第一部分:缓冲区溢出
    3. 1.3. Understanding Pool Corruption Part 2 – Special Pool for Buffer Overruns
    4. 1.4. Understanding Pool Corruption Part 3 – Special Pool for Double Frees
    5. 1.5. 理解池污染第三部分:特殊池下的Double Free
,