ReaderWriterLockSlim fails on dual-socket environments
This is yet another story of orphaned ReaderWriterLockSlim
.
Another dump, the same problem - ReaderWriterLockSlim
object state is corrupted:
0:173> !do 0x0000000001c679f8 Name: System.Threading.ReaderWriterLockSlim MethodTable: 000007f87ec7c1d8 EEClass: 000007f87e999448 Size: 96(0x60) bytes File: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll Fields: MT Field Offset Type VT Attr Value Name 000007f880dbcbf0 4000755 50 System.Boolean 1 instance 1 fIsReentrant 000007f880dbe0c8 4000756 30 System.Int32 1 instance 0 myLock 000007f880db2308 4000757 34 System.UInt32 1 instance 1 numWriteWaiters 000007f880db2308 4000758 38 System.UInt32 1 instance 28 numReadWaiters 000007f880db2308 4000759 3c System.UInt32 1 instance 0 numWriteUpgradeWaiters 000007f880db2308 400075a 40 System.UInt32 1 instance 0 numUpgradeWaiters 000007f880dbcbf0 400075b 51 System.Boolean 1 instance 0 fNoWaiters 000007f880dbe0c8 400075c 44 System.Int32 1 instance -1 upgradeLockOwnerId 000007f880dbe0c8 400075d 48 System.Int32 1 instance -1 writeLockOwnerId 000007f880db9138 400075e 8 ...g.EventWaitHandle 0 instance 000000000381eb38 writeEvent 000007f880db9138 400075f 10 ...g.EventWaitHandle 0 instance 00000000035a32e0 readEvent 000007f880db9138 4000760 18 ...g.EventWaitHandle 0 instance 0000000000000000 upgradeEvent 000007f880db9138 4000761 20 ...g.EventWaitHandle 0 instance 0000000000000000 waitUpgradeEvent 000007f880dd0398 4000763 28 System.Int64 1 instance 9 lockID 000007f880dbcbf0 4000765 52 System.Boolean 1 instance 0 fUpgradeThreadHoldingRead 000007f880db2308 4000766 4c System.UInt32 1 instance 1073741824 owners 000007f880dbcbf0 4000767 53 System.Boolean 1 instance 0 fDisposed 000007f880dd0398 4000762 408 System.Int64 1 static 14118 s_nextLockID 000007f87ec99a20 4000764 8 ...ReaderWriterCount 0 TLstatic t_rwc 0:173> .formats 0n1073741824 Evaluate expression: Hex: 00000000`40000000
EnterReadLock
, EnterWriteLock
and other Enter
operations waiting for an event which never goes off. Deadlock.
I must say that I checked possibilities of thread aborts in this code and found no signs of such scenarios happening. This made me desperately searching for another root cause of the problem.
So started searching ReaderWriterLockSlim.cs
file for potential problems. I immediately became suspicious when I realyzed there is lack of synchronization when, for example, TryEnterUpgradeableReadLockCore
method modified one of object fields:
uint owners; ... private bool TryEnterReadLockCore(TimeoutTracker timeout) { ... owners++; }
Fields are not declared volatile, nor are they modified via interlocked operations. The only exception is the myLock
field, which is used as a spin lock ang modified via Interlocked.CompareExchange
:
[MethodImpl(MethodImplOptions.AggressiveInlining)] private void EnterMyLock() { if (Interlocked.CompareExchange(ref myLock, 1, 0) != 0) EnterMyLockSpin(); }
Note, however, spin lock release method doesn't use Interlocked
operation:
private void ExitMyLock() { Debug.Assert(myLock != 0, "Exiting spin lock that is not held"); myLock = 0; }
This looks to be a mistake, possibly the root cause on one of root causes.
OK, lets go back to the problem - ReaderWriterLockSlim
gets locked forever on 24-core dual socket Intel hardware. Threads are not aborted, the code is perfect. So what the hell is going on?
Well, the problem looks to be bad software (ReaderWriterLockSlim
) on expensive hardware. Dell PowerEdge R720 has two psysical CPUs - 2x Intel Xeon E5-2620, 1200 MHz (12 x 100), 6 cores and 12 threads each. 24 logical cores total. And the problem is experienced only on such configurations.
I made a program that creates 24 (= Environment.ProcessorCount
) threads with highest priority acquiring and releasing the lock in a tight loop:
using System; using System.Collections.Generic; using System.Runtime.CompilerServices; using System.Threading; namespace RWLSTest { internal class Program { private static readonly ReaderWriterLockSlim slim = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion); private static readonly List<object> objects = new List<object>(); private static readonly Int32 processorCount = Environment.ProcessorCount; private static Int32 threadsCount; private static Int64 reads; private static Int64 writes; private static volatile Object[] threads = new Object[processorCount]; private static Action loopAction; static Program() { // Let it JIT those methods using (var temp = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion)) { Thread.Yield(); temp.EnterReadLock(); temp.ExitReadLock(); } var thread = new Thread(() => { try { Thread.Sleep(Timeout.Infinite); } catch { return; } throw new InvalidOperationException(); }); thread.Start(); try { thread.Abort(); } catch (Exception e) { Console.WriteLine(e.Message); } } private static void LoopWithEmptryTryBlocks() { var random = new Random(Environment.TickCount); for (;;) { if (random.Next(processorCount) <= (processorCount / 4)) { Interlocked.Increment(ref writes); try {} finally { slim.EnterWriteLock(); } try { ExclusiveLoop(random); } finally { slim.ExitWriteLock(); } } else { Interlocked.Increment(ref reads); try {} finally { slim.EnterReadLock(); } try { SharedLoop(random); } finally { slim.ExitReadLock(); } } } } [MethodImpl(MethodImplOptions.AggressiveInlining)] private static void SharedLoop(Random random) { foreach (var o in objects) { var i = (Int32)o; if ((i % processorCount) == (random.Next() % processorCount) && random.Next(37) == 3) { break; } } } [MethodImpl(MethodImplOptions.AggressiveInlining)] private static void ExclusiveLoop(Random random) { if (objects.Count < 10240) { for (var i = 0; i < 19; ++i) { if (random.Next(13) == 7) { objects.Add(random.Next()); } } } for (var i = 0; i < 13; ++i) { if (objects.Count > 0 && random.Next(19) == 13) { objects.Remove(random.Next() % objects.Count); } } } private static void Loop() { var random = new Random(Environment.TickCount); for (;;) { if (random.Next(processorCount) <= (processorCount / 4)) { slim.EnterWriteLock(); try { ExclusiveLoop(random); } finally { slim.ExitWriteLock(); } } else { slim.EnterReadLock(); try { SharedLoop(random); } finally { slim.ExitReadLock(); } } } } private static void StartOneThread(Object state) { var thread = new Thread(() => { try { Interlocked.Increment(ref threadsCount); loopAction(); } catch (ThreadAbortException) {} finally { Interlocked.Decrement(ref threadsCount); ThreadPool.UnsafeQueueUserWorkItem(StartOneThread, state); } }) { Priority = ThreadPriority.Highest }; thread.Start(); Thread.VolatileWrite(ref threads[(Int32)state], thread); } private static void Main(string[] args) { var random = new Random(Environment.TickCount); var abortCycle = 0; if (args.Length > 0) { abortCycle = Int32.Parse(args[0]); loopAction = LoopWithEmptryTryBlocks; } else { loopAction = Loop; } for (var i = 0; i < processorCount; ++i) { StartOneThread(i); } for (var i = 0U;; ++i) { Thread.Sleep(1); if (abortCycle > 0 && i % abortCycle == 0) { var ti = random.Next(111) % processorCount; var thread = (Thread)Thread.VolatileRead(ref threads[ti]); if (thread != null) { Console.WriteLine("Aborting thread #" + ti); try { thread.Abort(); } catch (Exception e) { Console.WriteLine(e.Message); } } } } } } }
I ran it several times and after about 1 hour all threads ended up waiting for lock event to fire. Voila! Have a look at the state of ReaderWriterLockSlim
object:
0:000> !do 000000b343bd2860 Name: System.Threading.ReaderWriterLockSlim MethodTable: 000007fbf887c1a8 EEClass: 000007fbf8599448 Size: 96(0x60) bytes File: C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll Fields: MT Field Offset Type VT Attr Value Name 000007fbfe6ac7b8 4000755 50 System.Boolean 1 instance 1 fIsReentrant 000007fbfe6adc90 4000756 30 System.Int32 1 instance 0 myLock 000007fbfe6a1ed0 4000757 34 System.UInt32 1 instance 1 numWriteWaiters 000007fbfe6a1ed0 4000758 38 System.UInt32 1 instance 23 numReadWaiters 000007fbfe6a1ed0 4000759 3c System.UInt32 1 instance 0 numWriteUpgradeWaiters 000007fbfe6a1ed0 400075a 40 System.UInt32 1 instance 0 numUpgradeWaiters 000007fbfe6ac7b8 400075b 51 System.Boolean 1 instance 0 fNoWaiters 000007fbfe6adc90 400075c 44 System.Int32 1 instance -1 upgradeLockOwnerId 000007fbfe6adc90 400075d 48 System.Int32 1 instance -1 writeLockOwnerId 000007fbfe6a8d00 400075e 8 ...g.EventWaitHandle 0 instance 000000b343beb448 writeEvent 000007fbfe6a8d00 400075f 10 ...g.EventWaitHandle 0 instance 000000b343be2fd0 readEvent 000007fbfe6a8d00 4000760 18 ...g.EventWaitHandle 0 instance 0000000000000000 upgradeEvent 000007fbfe6a8d00 4000761 20 ...g.EventWaitHandle 0 instance 0000000000000000 waitUpgradeEvent 000007fbfe6bff60 4000763 28 System.Int64 1 instance 1 lockID 000007fbfe6ac7b8 4000765 52 System.Boolean 1 instance 0 fUpgradeThreadHoldingRead 000007fbfe6a1ed0 4000766 4c System.UInt32 1 instance 1073741824 owners 000007fbfe6ac7b8 4000767 53 System.Boolean 1 instance 0 fDisposed 000007fbfe6bff60 4000762 408 System.Int64 1 static 2 s_nextLockID 000007fbf88999f0 4000764 8 ...ReaderWriterCount 0 TLstatic t_rwc
There are 23 reader waiters, 1 writer waiter and owners field is 0x40000000
once again. All of 24 threads look like the following:
0:000> ~22e !CLRStack OS Thread Id: 0xf28 (22) Child SP IP Call Site 000000b361a1df78 000007fc137b315b [HelperMethodFrame_1OBJ: 000000b361a1df78] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean) 000000b361a1e0a0 000007fbfe5195c4 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean) 000000b361a1e0e0 000007fbf8af4c25 System.Threading.ReaderWriterLockSlim.WaitOnEvent(System.Threading.EventWaitHandle, UInt32 ByRef, TimeoutTracker) 000000b361a1e150 000007fbf8dd4c48 System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(TimeoutTracker) 000000b361a1e1b0 000007fbf8804d4a System.Threading.ReaderWriterLockSlim.TryEnterReadLock(TimeoutTracker) 000000b361a1e200 000007fbf8af55ad System.Threading.ReaderWriterLockSlim.TryEnterReadLock(Int32) 000000b361a1e250 000007fba0010a45 RWLSTest.Program.Loop() 000000b361a1e2c0 000007fba00106f7 RWLSTest.Program+<>c__DisplayClass4.<startonethread>b__3() 000000b361a1e330 000007fbfe4ff8a5 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) 000000b361a1e490 000007fbfe4ff609 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) 000000b361a1e4c0 000007fbfe4ff5c7 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 000000b361a1e510 000007fbfe512d21 System.Threading.ThreadHelper.ThreadStart() 000000b361a1e828 000007fbff6bf713 [GCFrame: 000000b361a1e828] 000000b361a1eb58 000007fbff6bf713 [DebuggerU2MCatchHandlerFrame: 000000b361a1eb58]
They all WaitOnEvent
, but who and when will fire the event? This never happens. Deadlock.
Now lets get back to ExitMyLock
.
ReaderWriterLockSlim
contains many fields with at least 88 bytes storage space required (without extra aligning, if needed). Modern Intel CPUs have cache lines of 64 bytes which is too small to entirely hold ReaderWriterLockSlim
object instance.
So each one requires at least two cache lines to hold its data. Since the distance between myLock
and owners
fields is more than 64 bytes (both x86 and x64), releasing myLock
without a memory barrier (or interlocked instruction) causes only a portion of object's storage invalidated on demand between CPU cores and/or CPUs.
Invalidation is forced by EnterMyLock
's interlocked instruction. But only 64 bytes of aligned memory where myLock
resides. Other cache line's changes might not be visible at that point.
So the core acquiring the lock may see inconsistent object state.
Very important note: ReaderWriterLockSlim.cs
is a part of 4.5Update1 reference source. Vanilla .NET 4.5 and probably several updates following it has this code, for example 4.0.30319.17929
, 4.0.30319.18408
.
Recent versions, for example 4.0.30319.33440
, has fixed this:
private void ExitMyLock() { Volatile.Write(ref myLock, 0); }
Volatile write inserts explicit memory barriers and makes any changes visible to other cores and CPUs.
Conclusion: do not use ReaderWriterLockSlim
class without .NET Framework updated to at least 4.0.30319.33440
. Its will eventually fail, at least on dual-socket Intel system.
Windows 8.1 and Windows Server 2012 R2 have this issue fixed. Windows Server 2012 (nor R2) seems to stuck with buggy implementation of ReaderWriterLockSlim
class. After installing all available updates, ExitMyLock
looks the same (no volatile write operation).
2 коммент.:
Thank you for your article - this was a very enlightening read.
Hi Andrii
Thank you for your article. It helped us troubleshoot an incident seen on a clients server.
However, we have observed the problem on .NET 4.5.2 We have reproduced the problem, using you code example, and verified the corrupted ReaderWriterLockSlim state using WinDbg. I have checked the loaded native version of System.Core and decompiled it and verified that Volatile.Write is used in ExitMyLock. So apparantly there are other circumstances for which this problem can occur? Any comment you may have is highly appreciated.
Best regards
Henrik
Отправить комментарий