Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Threading.Thread.StartCallback() - System.NullReferenceException #109697

Open
tub5 opened this issue Nov 11, 2024 · 7 comments
Open

System.Threading.Thread.StartCallback() - System.NullReferenceException #109697

tub5 opened this issue Nov 11, 2024 · 7 comments
Labels
area-System.Threading untriaged New issue has not been triaged by the area owner

Comments

@tub5
Copy link
Contributor

tub5 commented Nov 11, 2024

Runtime: 8.0.10
OS: Ubuntu Server 22.04

We've had multiple Fatal exceptions in our production server causing docker containers to fail due to a System.NullReferenceException in System.Threading.Thread.StartCallback(). Looking at the line in question /_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 103 I can see the code is as follows:

// Called from the runtime
private void StartCallback()
{
    StartHelper? startHelper = _startHelper;
    Debug.Assert(startHelper != null);
    _startHelper = null;

    startHelper.Run();
}

If startHelper is nullable, should the final line be startHelper?.Run(); to prevent the null reference exception?

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 11, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 11, 2024
@jkotas
Copy link
Member

jkotas commented Nov 11, 2024

The check at

// Is the thread already started? You can't restart a thread.
if (!ThreadNotStarted(pNewThread))
{
COMPlusThrow(kThreadStateException, W("ThreadState_AlreadyStarted"));
}
should guarantee that _startHelper is non-null.

If you are getting into this method with _startHelper that is null, something else went wrong earlier. We would want to report a more appropriate error earlier. We would not want to ignore the problem silently.

@jkotas jkotas added area-System.Threading and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Nov 11, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@teo-tsirpanis
Copy link
Contributor

Does this occur on ARM architecture by any chance? Maybe _startHelper needs to be volatile.

@tub5
Copy link
Contributor Author

tub5 commented Nov 11, 2024

Does this occur on ARM architecture by any chance? Maybe _startHelper needs to be volatile.

runtime/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs

Line 34 in 7ee5b7d

private StartHelper? _startHelper;

We don't currently have any servers running ARM

@tub5
Copy link
Contributor Author

tub5 commented Nov 11, 2024

The check at

runtime/src/coreclr/vm/comsynchronizable.cpp

Lines 227 to 231 in 7ee5b7d

// Is the thread already started? You can't restart a thread.
if (!ThreadNotStarted(pNewThread))
{
COMPlusThrow(kThreadStateException, W("ThreadState_AlreadyStarted"));
}
should guarantee that _startHelper is non-null.
If you are getting into this method with _startHelper that is null, something else went wrong earlier. We would want to report a more appropriate error earlier. We would not want to ignore the problem silently.

Looking at the dump closer, I'm seeing exactly the same as ticket #103129.
Having looked through our code base I'm not seeing any references to GCHandle from our side, but we do rely on Confluent.Kafka which does have these references.

@tub5
Copy link
Contributor Author

tub5 commented Nov 11, 2024

(lldb) pe
Exception object: 00007f4e1f141ad8
Exception type:   System.ArgumentException
Message:          Value does not fall within the expected range.
InnerException:   <none>
StackTrace (generated):
    SP               IP               Function
    00007F4BFF7FDA40 00007F8DF08BC164 System.Private.CoreLib.dll!System.Threading.Thread.SetThreadPoolWorkerThreadName()+0x114
    00007F4BFF7FDAE0 00007F8DF08BBD67 System.Private.CoreLib.dll!System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()+0x87

StackTraceString: <none>
HResult: 80070057

(lldb) clrthreads
...
  88   43      107 00007F4CF84897D0  1021220 Preemptive  00007F4E2A094090:00007F4E2A094950 000055F6451ADBE0 -00001 Ukn (Threadpool Worker) System.ArgumentException 00007f4e1f141ad8
...

(lldb) clrstack -a
OS Thread Id: 0x107 (88)
        Child SP               IP Call Site
00007F4BFF7FD920 00007f8e662ceb57 [HelperMethodFrame: 00007f4bff7fd920] 
00007F4BFF7FDA40 00007F8DF08BC165 System.Threading.Thread.SetThreadPoolWorkerThreadName() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Thread.cs @ 402]
    PARAMETERS:
        this (0x00007F4BFF7FDA90) = 0x00007f4e264be3a8
    LOCALS:
        <no data>
        <no data>

00007F4BFF7FDAE0 00007F8DF08BBD68 System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() [/_/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.NonBrowser.cs @ 58]
    LOCALS:
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>

00007F4BFF7FDCD0 00007f8e65f05df7 [DebuggerU2MCatchHandlerFrame: 00007f4bff7fdcd0] 
(lldb) dumpobj 0x00007f4e264be3a8
Name:        MySqlConnector.MySqlConnection
MethodTable: 00007f8deba335e0
EEClass:     00007f8deba26cb8
Tracked Type: false
Size:        176(0xb0) bytes
File:        /app/MySqlConnector.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007f8dec9b4cc8  4000019        8 ...ponentModel.ISite  0 instance 0000000000000000 _site
00007f8dec9b4598  400001a       10 ....EventHandlerList  0 instance 0000000000000000 _events
00007f8de77b5ad0  4000018       98        System.Object  0   static 0000000000000000 s_eventDisposed
00007f8de77bcb00  40005b8       20       System.Boolean  1 instance                0 _suppressStateChangeForReconnection
00007f8df009e028  40005b9       18 ...hangeEventHandler  0 instance 00007f8e5f97e778 StateChange
0000000000000000  4000068       28                       0 instance 0000000000000000 <ProvideClientCertificatesCallback>k__BackingField
00007f8dedb941f8  4000069       30 ...Private.CoreLib]]  0 instance 0000000000000000 <ProvidePasswordCallback>k__BackingField
00007f8de8a17e58  400006a       38 ...alidationCallback  0 instance 0000000000000000 <RemoteCertificateValidationCallback>k__BackingField
00007f8dee98d8c8  400006b       40 ...ssageEventHandler  0 instance 0000000000000000 InfoMessage
00007f8dedb9a100  400006c       48 ....MySqlTransaction  0 instance 0000000000000000 <CurrentTransaction>k__BackingField
00007f8dedaae360  400006d       50 ...gingConfiguration  0 instance 00007f4df0c0bbe8 <LoggingConfiguration>k__BackingField
00007f8dedad2020  4000073       58 ...r.MySqlDataSource  0 instance 0000000000000000 m_dataSource
00007f8de96b02e8  4000074       60 ...s.Logging.ILogger  0 instance 00007f4df4802268 m_logger
00007f8de786d7c8  4000075       68        System.String  0 instance 00007f4e264be1e8 m_connectionString
00007f8dedad1268  4000076       70 ...onnectionSettings  0 instance 00007f4dfb454580 m_connectionSettings
00007f8dedaaf830  4000077       78 ...ore.ServerSession  0 instance 00007f4df3045a90 m_session
00007f8deba33518  4000078       a0         System.Int32  1 instance                1 m_connectionState
00007f8de77bcb00  4000079       21       System.Boolean  1 instance                1 m_hasBeenOpened
00007f8de77bcb00  400007a       22       System.Boolean  1 instance                0 m_isDisposed
00007f8dedb9a480  400007b       80 ... MySqlConnector]]  0 instance 0000000000000000 m_cachedProcedures
0000000000000000  400007c       88 ...re.SchemaProvider  0 instance 0000000000000000 m_schemaProvider
00007f8dedaf1490  400007d       90 ...r.MySqlDataReader  0 instance 0000000000000000 m_activeReader
00007f8dedad3330  400007e       98 ...edTransactionBase  0 instance 0000000000000000 m_enlistedTransaction
00007f8dedad2fe8  400006e       40 ...teChangeEventArgs  0   static 00007f4df0c0bdd8 s_stateChangeClosedConnecting
00007f8dedad2fe8  400006f       48 ...teChangeEventArgs  0   static 00007f4df0c0bdf0 s_stateChangeConnectingOpen
00007f8dedad2fe8  4000070       50 ...teChangeEventArgs  0   static 00007f4df0c0be08 s_stateChangeOpenClosed
00007f8de77b5ad0  4000071       58        System.Object  0   static 00007f4df0c0be20 s_lock
00007f8dedad3958  4000072       60 ...Private.CoreLib]]  0   static 00007f4df0c0be38 s_transactionConnections
ThinLock owner 2b (00007F4CF84897D0), Recursive 0

@jkotas
Copy link
Member

jkotas commented Nov 11, 2024

Yes, it looks exactly like #103129. I think the best way to get to the bottom of this is to find all places where GCHandles are set to point to MySqlConnector.MySqlConnection (under debugger or using traces) and then review the life-time management of those GCHandles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Threading untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants