Does a SQL Server update nvarchar statement overwrite the same address on disk if the new value is the same size?

All we need is an easy explanation of the problem, so here it is.

Is it possible to overwrite the same disk address in a T-SQL update statement? A use case would be preventing recovery of the original text on disk for security or anonymity reasons.

CREATE TABLE Test(
  Id int, 
  Message nvarchar(1000))
GO

INSERT INTO Test (Id, Message) VALUES (1, 'Hello!')
GO

So at this point, ‘Hello!’ is written to disk. If an update statement is run, can it be guaranteed (or at least highly likely) to overwrite the same location on disk?

UPDATE Test SET Message = '000000' WHERE Id = 1
GO

In the SQL Server implementation, what is the probability that the original ‘Hello!’ value will no longer be on disk?

I’m assuming the new value would be the same size. My guess is that if the new value were of a different size, then SQL Server would write the updated value to a new location, leaving the ‘Hello!’ in the original location, but now marked as free. I also assume that a delete does nothing to the original value on disk, but just marks the location as free.

If this is not the case, or not guaranteed, or at least highly-likely, Is there another way to remove the value from disk?

The requirement here is not to the level of keeping government secrets safe. It’s more of a marketing claim that deleted items are truly gone and can’t be recovered.

I understand there are a lot of caveats here. I know the SQL language does not address this. I’m asking specifically about SQL Server’s implementation. The answer could be that it’s unknowable. Just wondering if there’s documentation, or common knowledge, or if someone has tested implementation specifics.

We’re using a hosting service that doesn’t encrypt the SQL Server disk. We do encrypt the value before writing to the db. Just looking into a ‘belt and braces’ solution.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

There are so many aspects to this, but the short answer is that you have to expect that old values can be on disk (as this is from what I understand what the question boils down to).

So, this falls under the same category as below question, with the same answer:

Q: Can we protect from a Windows admin to be able to do anything inside a SQL Server and see all data in a SQL Server?

A: No. But you can try to make sure that what they read appear as garbage to them. I.e., a suitable encryption implementation.

If you are truly interested in this, I suggest you first read up on the storage architecture of SQL Server. Without that, you won’t be able to make heads of tails of the answers. I.e., make sure you understand database files, pages, transaction logging, heaps, B-trees etc.

The update can be done in place, on the same page. That goes for an index as well. If you modify the key of an index column then the row has to move (DELETE/INSERT internally). If the row doesn’t fit, then you get a forwarded record or a page split. Another factor is that after a delete, (and an update can be delete/insert, remember), the old row will initially still be there as a ghost row. But even after that, the page isn’t "compacted on delete", i.e., the storage reclaim is done when the "free" space is needed at some later time.

And then you have transaction logging.

And backups.

If you have versioning enabled, then old row versions are in tempdb and cleared up when not needed, after some time.

And the added complexity how other, perhaps less basic, technologies affects the picture. I’m thinking of things such as:

  • Columnstore indexes
  • Memory optimized tables
  • Change Data Capture
  • Replication
  • HA solutions such as Availability Groups
  • And whatever more I could come up with given some more time to think about this

I think you get the picture by now. No, you can’t claim that the old data is gone. Encryption might be an important component to this, depending on your particular situation.

Method 2

Based on my knowledge of data pages and indexing, I would say the odds are actually highly unlikely that the exact same memory address or place on disk is overwritten by a subsequent update, even if the same length string is supplied for the NVARCHAR.

If you think about how indexes work for a second, their underlying data structure is a B-Tree which logically sorts the data. Therefore an index containing your Message field, even just the clustered index itself, would add a layer of nondeterminism for which memory address a particular Message lives in, especially as that B-Tree is re-sorted, and whenever page splits are incurred. (And on that note, if your field was contained in multiple indexes, the data is literally copied to multiple memory addresses, so as a security measure there is no reliability.)

Now your example schema doesn’t include any indexes and therefore the table itself is stored in a Heap logical data structure actually. But even the assignment of the RowId and the lack of sorting of the data causes the underlying memory address to have a level of nondeterminism as well, again especially when a page split occurs.

Finally, keep in mind SQL Server isn’t the only thing necessarily using that disk, so one memory address that gets deallocated from a value in a SQL Server Instance, could get reallocated to something completely outside and unrelated to that same SQL Server Instance, via the Operating System.

Method 3

There is very little or no official documentation on this particular topic that I could find out however there is a function called sp_clean_db_free_space which indirectly hints at these operations and what happens post that. As per documentation:

Delete operations from a table or update operations that cause a row
to move can immediately free up space on a page by removing references
to the row. However, under certain circumstances, the row can
physically remain on the data page as a ghost record. Ghost records
are periodically removed by a background process. This residual data
is not returned by the Database Engine in response to queries.
However, in environments in which the physical security of the data or
backup files is at risk, you can use sp_clean_db_free_space to clean
these ghost records.

There are few more guidelines about this as below:

Whether Microsoft SQL Server will delete data / overwrite data strongly depends on the following circumstances:

  • Is the table a heap or a clustered index
  • In a clustered index Microsoft SQL Server must forces an insert on the same page (because of the key)
  • If there is NOT enough place for the record the old range may be overwritten if there is enough place
  • If there is NOT enough place it will force a page split

To get rid of the "old" entries REBUILD your indexe(s)

I hope this adds some more values into the answer provided by J.D.

You may also browse through more articles written by Mr. Paul S Randal here.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply