Hungry Mind , Blog about everything in IT - C#, Java, C++, .NET, Windows, WinAPI, ...

KiPageFault into BSOD when stepping over

I've been struggling long time with weird bug check during kernel driver debugging. Stack trace would look like this:

1: kd> k
Child-SP          RetAddr           Call Site
ffffd000`20463d78 fffff800`1aa610ea nt!DbgBreakPointWithStatus
ffffd000`20463d80 fffff800`1aa609fb nt!KiBugCheckDebugBreak+0x12
ffffd000`20463de0 fffff800`1a9d8da4 nt!KeBugCheck2+0x8ab
ffffd000`204644f0 fffff800`1aa00b1f nt!KeBugCheckEx+0x104
ffffd000`20464530 fffff800`1a8c75ad nt! ?? ::FNODOBFM::`string'+0x1797f
ffffd000`204645d0 fffff800`1a9e2f2f nt!MmAccessFault+0x7ed
ffffd000`20464710 fffff800`002b92e3 nt!KiPageFault+0x12f
ffffd000`204648a0 fffff800`0117b41f Wdf01000!imp_WdfFdoInitQueryProperty+0x28
ffffd000`204648f0 fffff800`0118117f MyVolFlt!WdfFdoInitQueryProperty+0x5f [c:\program files (x86)\windows kits\8.1\include\wdf\kmdf\1.13\wdffdo.h @ 217]
ffffd000`20464940 fffff800`0027f55b MyVolFlt!MyVolFltEvtDeviceAdd+0x9f [c:\development\projects\kernelmode\myvolflt\driver.c @ 116]
ffffd000`20464bd0 fffff800`1a9539d9 Wdf01000!FxDriver::AddDevice+0xab
ffffd000`20464ff0 fffff800`1ace18ab nt!PpvUtilCallAddDevice+0x35
ffffd000`20465030 fffff800`1acdff9e nt!PnpCallAddDevice+0x63
ffffd000`204650b0 fffff800`1acdf2db nt!PipCallDriverAddDevice+0x6e2
ffffd000`20465250 fffff800`1ad14b89 nt!PipProcessDevNodeTree+0x1cf
ffffd000`204654d0 fffff800`1a97d0b8 nt!PiProcessReenumeration+0x91
ffffd000`20465520 fffff800`1a97cf2e nt!PnpDeviceActionWorker+0x168
ffffd000`204655d0 fffff800`1af93382 nt!PnpRequestDeviceAction+0x1da
ffffd000`20465610 fffff800`1af89022 nt!IopInitializeBootDrivers+0x83e
ffffd000`204658b0 fffff800`1af7794d nt!IoInitSystem+0x91e
ffffd000`204659d0 fffff800`1ad7bd09 nt!Phase1InitializationDiscard+0xe61
ffffd000`20465bd0 fffff800`1a9182e4 nt!Phase1Initialization+0x9
ffffd000`20465c00 fffff800`1a9df2c6 nt!PspSystemThreadStartup+0x58
ffffd000`20465c60 00000000`00000000 nt!KiStartSystemThread+0x16

Bug Check description:

   1: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: ffffe00020464c10, memory referenced.
...

Now lets see what is this address:

1: kd> !pool ffffe00020464c10
Pool page ffffe00020464c10 region is Nonpaged pool
ffffe00020464000 is not a valid large pool allocation, checking large session pool...
Unable to read large session pool table (Session data is not present in mini and kernel-only dumps)
ffffe00020464000 is not valid pool. Checking for freed (or corrupt) pool
Address ffffe00020464000 could not be read. It may be a freed, invalid or paged out page
1: kd> ? poi(DeviceInit)
Evaluate expression: -35183830610928 = ffffe000`20464c10

Wow, faulting memory references is DeviceInit actually! And it is located on stack (because of KMDF model).

Sure IRQL is at PASSIVE level:

1: kd> !irql
Debugger saved IRQL for processor 0x1 -- 0 (LOW_LEVEL)

The funniest thing so far is that if I set bp after the call to WdfFdoInitQueryProperty - it would run smoothly. So there is something wrong with the debugger interacting OS kernel.

Now I finally managed to figure out what was wrong. I would normally set my bp during initial break-in sequence:

Connected to Windows 8 9600 x64 target at (Thu Jan 16 00:54:33.435 2014 (UTC + 2:00)), ptr64 TRUE
Kernel Debugger connection established.
************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       cache*C:\Development\Tools\Symbols
Deferred                                       srv*http://msdl.microsoft.com/download/symbols
Symbol search path is: cache*C:\Development\Tools\Symbols;srv*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 8 Kernel Version 9600 MP (1 procs) Free x64
Built by: 9600.16452.amd64fre.winblue_gdr.131030-1505
Machine Name:
Kernel base = 0xfffff800`5547e000 PsLoadedModuleList = 0xfffff800`55742990
System Uptime: 0 days 0:00:00.102
nt!DebugService2+0x5:
fffff800`555d28e5 cc              int     3
kd> bp MyVolFltEvtDeviceAdd
kd> g

And here what happens after:

Unload module \SystemRoot\system32\mcupdate_GenuineIntel.dll at fffff800`1b200000
Unload module \SystemRoot\System32\drivers\werkernel.sys at fffff800`19ed5000
...
Unload module \SystemRoot\system32\DRIVERS\MyVolFlt.sys at fffff800`1b9ed000
nt!DebugService2+0x5:
fffff800`555d28e5 cc              int     3
kd> k
 # Child-SP          RetAddr           Call Site
00 fffff800`573991a8 fffff800`55544361 nt!DebugService2+0x5
01 fffff800`573991b0 fffff800`555442ff nt!DbgLoadImageSymbols+0x45
02 fffff800`57399200 fffff800`55b76fc4 nt!DbgLoadImageSymbolsUnicode+0x2b
03 fffff800`57399240 fffff800`55b7684b nt!MiReloadBootLoadedDrivers+0x300
04 fffff800`573993c0 fffff800`55b6c091 nt!MiInitializeDriverImages+0x163
05 fffff800`57399470 fffff800`55b67299 nt!MiInitSystem+0x3d9
06 fffff800`57399500 fffff800`557e84ea nt!InitBootProcessor+0x301
07 fffff800`57399740 fffff800`557de1a3 nt!KiInitializeKernel+0x5a2
08 fffff800`57399ad0 00000000`00000000 nt!KiSystemStartup+0x193

It is unloading boot time drivers! And reloading with different start addresses! So when I set my breakpoint at MyVolFltEvtDeviceAdd, WinDbg would insert int 3 instruction and during module relocation that instruction is copied as is. So my breakpoint actually hits, despite code relocation. But this is where the Windows and debugger fall apart - they don't know about this breakpoint.

In order to issue correct breakpoint address, you must break on module load:

kd> sxe ld MyVolFlt
kd> sxe ud MyVolFlt
kd> sx
  ct - Create thread - ignore
  et - Exit thread - ignore
 cpr - Create process - ignore
 epr - Exit process - ignore
  ld - Load module - break
       (only break for myvolflt)
  ud - Unload module - break
       (only break for MyVolFlt)

And issue bp command after kernel reloads boot loaded drivers.

Metal Gear Rising: Revengeance

Metal Gear Rising: Revengeance has just released on PC!

Amazing OST:

ReaderWriterLockSlim fails on dual-socket environments

This is yet another story of orphaned ReaderWriterLockSlim.

Another dump, the same problem - ReaderWriterLockSlim object state is corrupted:

0:173> !do 0x0000000001c679f8
Name:        System.Threading.ReaderWriterLockSlim
MethodTable: 000007f87ec7c1d8
EEClass:     000007f87e999448
Size:        96(0x60) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007f880dbcbf0  4000755       50       System.Boolean  1 instance                1 fIsReentrant
000007f880dbe0c8  4000756       30         System.Int32  1 instance                0 myLock
000007f880db2308  4000757       34        System.UInt32  1 instance                1 numWriteWaiters
000007f880db2308  4000758       38        System.UInt32  1 instance               28 numReadWaiters
000007f880db2308  4000759       3c        System.UInt32  1 instance                0 numWriteUpgradeWaiters
000007f880db2308  400075a       40        System.UInt32  1 instance                0 numUpgradeWaiters
000007f880dbcbf0  400075b       51       System.Boolean  1 instance                0 fNoWaiters
000007f880dbe0c8  400075c       44         System.Int32  1 instance               -1 upgradeLockOwnerId
000007f880dbe0c8  400075d       48         System.Int32  1 instance               -1 writeLockOwnerId
000007f880db9138  400075e        8 ...g.EventWaitHandle  0 instance 000000000381eb38 writeEvent
000007f880db9138  400075f       10 ...g.EventWaitHandle  0 instance 00000000035a32e0 readEvent
000007f880db9138  4000760       18 ...g.EventWaitHandle  0 instance 0000000000000000 upgradeEvent
000007f880db9138  4000761       20 ...g.EventWaitHandle  0 instance 0000000000000000 waitUpgradeEvent
000007f880dd0398  4000763       28         System.Int64  1 instance 9 lockID
000007f880dbcbf0  4000765       52       System.Boolean  1 instance                0 fUpgradeThreadHoldingRead
000007f880db2308  4000766       4c        System.UInt32  1 instance       1073741824 owners
000007f880dbcbf0  4000767       53       System.Boolean  1 instance                0 fDisposed
000007f880dd0398  4000762      408         System.Int64  1   static 14118 s_nextLockID
000007f87ec99a20  4000764        8 ...ReaderWriterCount  0 TLstatic  t_rwc
0:173> .formats 0n1073741824 
Evaluate expression:
  Hex:     00000000`40000000

EnterReadLock, EnterWriteLock and other Enter operations waiting for an event which never goes off. Deadlock.

I must say that I checked possibilities of thread aborts in this code and found no signs of such scenarios happening. This made me desperately searching for another root cause of the problem.

So started searching ReaderWriterLockSlim.cs file for potential problems. I immediately became suspicious when I realyzed there is lack of synchronization when, for example, TryEnterUpgradeableReadLockCore method modified one of object fields:

uint owners;
...
private bool TryEnterReadLockCore(TimeoutTracker timeout)
{ 
...
       owners++;
}

Fields are not declared volatile, nor are they modified via interlocked operations. The only exception is the myLock field, which is used as a spin lock ang modified via Interlocked.CompareExchange:

[MethodImpl(MethodImplOptions.AggressiveInlining)] 
private void EnterMyLock()
{
   if (Interlocked.CompareExchange(ref myLock, 1, 0) != 0)
      EnterMyLockSpin(); 
}

Note, however, spin lock release method doesn't use Interlocked operation:

private void ExitMyLock()
{ 
   Debug.Assert(myLock != 0, "Exiting spin lock that is not held");
   myLock = 0; 
} 

This looks to be a mistake, possibly the root cause on one of root causes.

OK, lets go back to the problem - ReaderWriterLockSlim gets locked forever on 24-core dual socket Intel hardware. Threads are not aborted, the code is perfect. So what the hell is going on?

Well, the problem looks to be bad software (ReaderWriterLockSlim) on expensive hardware. Dell PowerEdge R720 has two psysical CPUs - 2x Intel Xeon E5-2620, 1200 MHz (12 x 100), 6 cores and 12 threads each. 24 logical cores total. And the problem is experienced only on such configurations.

I made a program that creates 24 (= Environment.ProcessorCount) threads with highest priority acquiring and releasing the lock in a tight loop:

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Threading;
namespace RWLSTest
{
   internal class Program
   {
      private static readonly ReaderWriterLockSlim slim = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
      private static readonly List<object> objects = new List<object>();
      private static readonly Int32 processorCount = Environment.ProcessorCount;
      private static Int32 threadsCount;
      private static Int64 reads;
      private static Int64 writes;
      private static volatile Object[] threads = new Object[processorCount];
      private static Action loopAction;
      static Program()
      {
         // Let it JIT those methods
         using (var temp = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion)) {
            Thread.Yield();
            temp.EnterReadLock();
            temp.ExitReadLock();
         }
         var thread = new Thread(() =>
         {
            try {
               Thread.Sleep(Timeout.Infinite);
            }
            catch {
               return;
            }
            throw new InvalidOperationException();
         });
         thread.Start();
         try {
            thread.Abort();
         }
         catch (Exception e) {
            Console.WriteLine(e.Message);
         }
      }
      private static void LoopWithEmptryTryBlocks()
      {
         var random = new Random(Environment.TickCount);
         for (;;) {
            if (random.Next(processorCount) <= (processorCount / 4)) {
               Interlocked.Increment(ref writes);
               try {}
               finally {
                  slim.EnterWriteLock();
               }
               try {
                  ExclusiveLoop(random);
               }
               finally {
                  slim.ExitWriteLock();
               }
            }
            else {
               Interlocked.Increment(ref reads);
               try {}
               finally {
                  slim.EnterReadLock();
               }
               try {
                  SharedLoop(random);
               }
               finally {
                  slim.ExitReadLock();
               }
            }
         }
      }
      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      private static void SharedLoop(Random random)
      {
         foreach (var o in objects) {
            var i = (Int32)o;
            if ((i % processorCount) == (random.Next() % processorCount) && random.Next(37) == 3) {
               break;
            }
         }
      }
      [MethodImpl(MethodImplOptions.AggressiveInlining)]
      private static void ExclusiveLoop(Random random)
      {
         if (objects.Count < 10240) {
            for (var i = 0; i < 19; ++i) {
               if (random.Next(13) == 7) {
                  objects.Add(random.Next());
               }
            }
         }
         for (var i = 0; i < 13; ++i) {
            if (objects.Count > 0 && random.Next(19) == 13) {
               objects.Remove(random.Next() % objects.Count);
            }
         }
      }
      private static void Loop()
      {
         var random = new Random(Environment.TickCount);
         for (;;) {
            if (random.Next(processorCount) <= (processorCount / 4)) {
               slim.EnterWriteLock();
               try {
                  ExclusiveLoop(random);
               }
               finally {
                  slim.ExitWriteLock();
               }
            }
            else {
               slim.EnterReadLock();
               try {
                  SharedLoop(random);
               }
               finally {
                  slim.ExitReadLock();
               }
            }
         }
      }
      private static void StartOneThread(Object state)
      {
         var thread = new Thread(() =>
         {
            try {
               Interlocked.Increment(ref threadsCount);
               loopAction();
            }
            catch (ThreadAbortException) {}
            finally {
               Interlocked.Decrement(ref threadsCount);
               ThreadPool.UnsafeQueueUserWorkItem(StartOneThread, state);
            }
         }) { Priority = ThreadPriority.Highest };
         thread.Start();
         Thread.VolatileWrite(ref threads[(Int32)state], thread);
      }
      private static void Main(string[] args)
      {
         var random = new Random(Environment.TickCount);
         var abortCycle = 0;
         if (args.Length > 0) {
            abortCycle = Int32.Parse(args[0]);
            loopAction = LoopWithEmptryTryBlocks;
         }
         else {
            loopAction = Loop;
         }
         for (var i = 0; i < processorCount; ++i) {
            StartOneThread(i);
         }
         for (var i = 0U;; ++i) {
            Thread.Sleep(1);
            if (abortCycle > 0 && i % abortCycle == 0) {
               var ti = random.Next(111) % processorCount;
               var thread = (Thread)Thread.VolatileRead(ref threads[ti]);
               if (thread != null) {
                  Console.WriteLine("Aborting thread #" + ti);
                  try {
                     thread.Abort();
                  }
                  catch (Exception e) {
                     Console.WriteLine(e.Message);
                  }
               }
            }
         }
      }
   }
}

I ran it several times and after about 1 hour all threads ended up waiting for lock event to fire. Voila! Have a look at the state of ReaderWriterLockSlim object:

0:000> !do 000000b343bd2860 
Name:        System.Threading.ReaderWriterLockSlim
MethodTable: 000007fbf887c1a8
EEClass:     000007fbf8599448
Size:        96(0x60) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fbfe6ac7b8  4000755       50       System.Boolean  1 instance                1 fIsReentrant
000007fbfe6adc90  4000756       30         System.Int32  1 instance                0 myLock
000007fbfe6a1ed0  4000757       34        System.UInt32  1 instance                1 numWriteWaiters
000007fbfe6a1ed0  4000758       38        System.UInt32  1 instance               23 numReadWaiters
000007fbfe6a1ed0  4000759       3c        System.UInt32  1 instance                0 numWriteUpgradeWaiters
000007fbfe6a1ed0  400075a       40        System.UInt32  1 instance                0 numUpgradeWaiters
000007fbfe6ac7b8  400075b       51       System.Boolean  1 instance                0 fNoWaiters
000007fbfe6adc90  400075c       44         System.Int32  1 instance               -1 upgradeLockOwnerId
000007fbfe6adc90  400075d       48         System.Int32  1 instance               -1 writeLockOwnerId
000007fbfe6a8d00  400075e        8 ...g.EventWaitHandle  0 instance 000000b343beb448 writeEvent
000007fbfe6a8d00  400075f       10 ...g.EventWaitHandle  0 instance 000000b343be2fd0 readEvent
000007fbfe6a8d00  4000760       18 ...g.EventWaitHandle  0 instance 0000000000000000 upgradeEvent
000007fbfe6a8d00  4000761       20 ...g.EventWaitHandle  0 instance 0000000000000000 waitUpgradeEvent
000007fbfe6bff60  4000763       28         System.Int64  1 instance 1 lockID
000007fbfe6ac7b8  4000765       52       System.Boolean  1 instance                0 fUpgradeThreadHoldingRead
000007fbfe6a1ed0  4000766       4c        System.UInt32  1 instance       1073741824 owners
000007fbfe6ac7b8  4000767       53       System.Boolean  1 instance                0 fDisposed
000007fbfe6bff60  4000762      408         System.Int64  1   static 2 s_nextLockID
000007fbf88999f0  4000764        8 ...ReaderWriterCount  0 TLstatic  t_rwc

There are 23 reader waiters, 1 writer waiter and owners field is 0x40000000 once again. All of 24 threads look like the following:

0:000> ~22e !CLRStack
OS Thread Id: 0xf28 (22)
        Child SP               IP Call Site
000000b361a1df78 000007fc137b315b [HelperMethodFrame_1OBJ: 000000b361a1df78] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
000000b361a1e0a0 000007fbfe5195c4 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
000000b361a1e0e0 000007fbf8af4c25 System.Threading.ReaderWriterLockSlim.WaitOnEvent(System.Threading.EventWaitHandle, UInt32 ByRef, TimeoutTracker)
000000b361a1e150 000007fbf8dd4c48 System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker)
000000b361a1e1b0 000007fbf8804d4a System.Threading.ReaderWriterLockSlim.TryEnterReadLock(TimeoutTracker)
000000b361a1e200 000007fbf8af55ad System.Threading.ReaderWriterLockSlim.TryEnterReadLock(Int32)
000000b361a1e250 000007fba0010a45 RWLSTest.Program.Loop()
000000b361a1e2c0 000007fba00106f7 RWLSTest.Program+<>c__DisplayClass4.<startonethread>b__3()
000000b361a1e330 000007fbfe4ff8a5 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000b361a1e490 000007fbfe4ff609 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000b361a1e4c0 000007fbfe4ff5c7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000b361a1e510 000007fbfe512d21 System.Threading.ThreadHelper.ThreadStart()
000000b361a1e828 000007fbff6bf713 [GCFrame: 000000b361a1e828] 
000000b361a1eb58 000007fbff6bf713 [DebuggerU2MCatchHandlerFrame: 000000b361a1eb58] 

They all WaitOnEvent, but who and when will fire the event? This never happens. Deadlock.

Now lets get back to ExitMyLock. ReaderWriterLockSlim contains many fields with at least 88 bytes storage space required (without extra aligning, if needed). Modern Intel CPUs have cache lines of 64 bytes which is too small to entirely hold ReaderWriterLockSlim object instance. So each one requires at least two cache lines to hold its data. Since the distance between myLock and owners fields is more than 64 bytes (both x86 and x64), releasing myLock without a memory barrier (or interlocked instruction) causes only a portion of object's storage invalidated on demand between CPU cores and/or CPUs. Invalidation is forced by EnterMyLock's interlocked instruction. But only 64 bytes of aligned memory where myLock resides. Other cache line's changes might not be visible at that point. So the core acquiring the lock may see inconsistent object state.

Very important note: ReaderWriterLockSlim.cs is a part of 4.5Update1 reference source. Vanilla .NET 4.5 and probably several updates following it has this code, for example 4.0.30319.17929, 4.0.30319.18408. Recent versions, for example 4.0.30319.33440, has fixed this:

private void ExitMyLock()
{ 
   Volatile.Write(ref myLock, 0);
}

Volatile write inserts explicit memory barriers and makes any changes visible to other cores and CPUs.

Conclusion: do not use ReaderWriterLockSlim class without .NET Framework updated to at least 4.0.30319.33440. Its will eventually fail, at least on dual-socket Intel system.

Windows 8.1 and Windows Server 2012 R2 have this issue fixed. Windows Server 2012 (nor R2) seems to stuck with buggy implementation of ReaderWriterLockSlim class. After installing all available updates, ExitMyLock looks the same (no volatile write operation).

Injustice: Gods Among Us

Игроки в онлайне делятся на два типа: одни спамят, а другие занимались сексом с их матерью

Visual Studio 2012 Update 4

Copyright 2007-2011 Chabster