All we need is an easy explanation of the problem, so here it is.
We have enabled sp_configure ‘tempdb metadata memory-optimized’ = 1, and now the tempdb meta data is taking over 400 GB on one of our servers and continues to grow. There are some drops in memory usage, but generally it keeps growing it’s memory usage. We’ve had a couple times where the server actually crashes because there’s not enough memory for the other system processes to modify tempdb and it brings down the whole server.
Question I am asking
How do I keep the SQL Server in-memory optimized tempdb metadata from growing continually and crashing my server? If anything, what are some other pieces of information that I can look into to find what’s consuming so much memory?
Data about the problem
The following query currently returns 438 GB.
SELECT SUM(domc.pages_kb / 1024.0 / 1024.0) AS pages_gb FROM sys.dm_os_memory_clerks AS domc WHERE domc.type LIKE 'MEMORYCLERK_XTP'
The following query provides the data that the biggest usage of memory (290 GB) is memory_consumer_id of 113 – ‘LOB Page Allocator’. It has no object_id or xtp_object_id, so I’m guessing that it’s a database wide object.
SELECT ddxmc.memory_consumer_id , ddxmc.memory_consumer_type_desc , ddxmc.memory_consumer_desc , ddxmc.object_id , ddxmc.xtp_object_id , ddxmc.used_bytes / 1024.0 / 1024.0 / 1024.0 AS used_gb FROM sys.dm_db_xtp_memory_consumers AS ddxmc ORDER BY ddxmc.allocated_bytes DESC
Version: SQL Server 2019 CU9 – Enterprise
Memory on box: 3 TB
Instance Type: Fail-Over Clustered Instance
OS: Windows Server 2019 Standard
Number of CPU Cores: 80 Physical Cores (we’ve had to increase this number lately)
Number Of Tempdb Files: 64
Replication: This server is a publisher and subscriber to transactional replication.
We are heavy users of tempdb. We are continually creating and dropping temp tables and table variables all the time in stored procedures. We need to work with bulk amounts of data, passing in lists of data that we then turn into tables to work with result set data instead of working with one piece of information at a time. Because of the heavy churn on tempdb we have had to implement the in-memory optimized tempdb to relieve PAGELATCH_* waits we were experiencing.
I am pretty sure there are NOT any user tables that are memory optimized. When I run the following query I only see
SYSTEM_TABLE type objects:
SELECT * FROM sys.dm_db_xtp_index_stats AS ddxis JOIN sys.objects AS o ON o.object_id = ddxis.object_id
Things I’m aware of and what I’ve tried
In-memory garbage collection can only clean up items older then the oldest transaction.
I’m quite aware that garbage collection can only happen on rows that are older then the oldest transaction, and so I have stopped all of our long running SQL Agent jobs and any other processes longer then 5 minutes. Unfortunately our memory usage didn’t come back down. We do have old sessions, but none of them have any transactions open according to the following query.
SELECT * FROM sys.dm_exec_sessions AS des WHERE des.open_transaction_count > 0
To rule out replication I did temporarily stop the publishing and subscription agents for this server. No change in memory usage.
I have run ‘CHECKPOINT’ in tempdb. Memory usage didn’t come down.
DBCC FREEPROCCACHE – Getting rid of old temp tables
To drop old temp tables that are being cached I cleared the query plan cache and verified that the temp tables were re-created by running the following query. This did not result in the memory usage dropping any substantial amount.
SELECT * FROM sys.tables AS t WHERE t.name LIKE '#%' AND t.create_date < 'TimeOfClearingPlanCache'
Other Possible Solutions
Restarting SQL Server
We can restart SQL Server, and it does mitigate the problem for a while, but the memory usage does increase again after a while. While this is a work around, this is a terrible solution that we are not very fond of.
Turning Off ‘In-Memory Optimized Tempdb’
We could turn off ‘in-memory optimized tempdb’, but we then would be subject to the heavy PAGELATCH_* waits we were experiencing before. We have 64 tempdb files to help reduce the contention that we were seeing, but even that wasn’t enough during our heavy times. This could be an option, but it would be best if we could find why the memory usage is continually growing.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Pam Lahoud SQL Enterprise Team Principal Program Manager at Microsoft said on twitter:
and contributed via comment here:
There are a few known issues that could be coming into play here. It’s best to open a case with support if you can, the more data we have on these issues the more likely we’ll be able to find a fix.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂