Paul White @SQL_Kiwi, Twitter Profile

Paul White @SQL_Kiwi

2 years ago

The latest SQL Server cumulative updates (2022 CU 10, 2019 CU23) contain an interesting bug fix, which @josh_the_coder noticed earlier today: Before this update, in some rare cases, if IDENTITY_INSERT is set to ON, the identity value for the target table is set to the highest value in the source table when joining tables, even if the highest value doesn't satisfy the join condition or the predicate. This update improves the current design and fixes this issue. Note: To apply this update, you need to set the QUERY_OPTIMIZER_HOTFIXES database scoped configuration to ON. To turn off this update, you can enable trace flag 13193. The cases turn out not to be so rare: SQL Server assigns the new identity value when IDENTITY_INSERT is ON by calling an internal function setidentity() in a Compute Scalar plan operator. It is the placement of this operator in the plan that is the problem. If it appears before a filtering condition, it might be called more times than necessary, using the wrong source values. In other words, the destination will behave as if identity values had been assigned when they ultimately weren't, due to the later filtering. In many cases, SQL Server is saved by deferred evaluation of expressions in a Compute Scalar. This is where SQL Server waits until another operator needs the result of the computation before performing it, despite the position of the Compute Scalar operator in the plan. (This optimisation was added in 2005 and only applies to row mode Compute Scalar operators.) Now, the only operator that will need the result of the setidentity() call is the Insert operator adding rows to the target table with IDENTITY_INSERT. This means any filtering of rows will occur before the insert, and no unnecessary setting of identity values occurs. Yay! This is all well and good, unless there is a blocking operator in the plan after the setidentity() Compute Scalar but before the filtering. A blocking operator has to materialize the data it receives in one way or another - in a hash table or sort memory for example - and this materialization necessarily needs the result of the setidentity() call. Oops. When this happens, setidentity() is called when it shouldn't be. The target table then behaves as if identity values have been used when they were not. The fix is to ensure the Compute Scalar appears on the 'write cursor' (left) side of the insert plan, where SQL Server has full control, and any user-coded filtering of rows is bound to have already happened. Without the fix, the Compute Scalar can appear surprisingly early on the 'read cursor' side of the plan where rows are read and processed to see which ones qualify for the update. The optimizer tends to push Compute Scalars as far to the right as possible for several reasons, including computed column matching. This will never apply to the special setidentity() Compute Scalar of course, but general rules apply generally. I wrote a toy demo that materializes the setidentity() call in the build-side hash table of a hash join: dbfiddle.uk/v7Ci0_pU Unfortunately, db<>fiddle is still at 2022 RTM so I can't show the fix with query optimizer hotfixes enabled there, but as you can imagine it ensures the Compute Scalar stays in a safe place on the left side of the plan. For more about deferred expression evaluation see my 2012 article, Compute Scalars, Expressions and Execution Plan Performance sql.kiwi/2012/09/comput…

3 10 40 4K 10

DataDrivenCosmos📊🛸 @jmans89

2 years ago

@SQL_Kiwi @josh_the_coder So prior to the fix, could this bug have resulted in duplicate identity keys being inserted into a table? Because that's what happened to us this spring and MSFT basically told us it was a known issue but they had no fix at the time.

1 0 3 242 0

Daniel Maenle @DanTheSQLMan

2 years ago

@SQL_Kiwi @josh_the_coder

0 0 1 173 0

Download Gif

Brandon La Sage @BrandonLaSage

2 years ago

@SQL_Kiwi @josh_the_coder Ah, joins aren't important. Cartesian forever.../s

0 0 0 65 0