To explain this a bit more to those who have asked: The documentation for CAST & CONVERT says: Starting with SQL Server 2012 (11.x), when using supplementary character (SC) collations, a CAST operation from nchar or nvarchar to an nchar or nvarchar type of smaller length won't truncate inside a surrogate pair. Instead, the operation truncates before the supplementary character. Under an SC collation, each character has a variable byte length. For example, N'X' takes 2 bytes, while N'😎' uses 4 bytes. The string N'X😎' (both characters together) requires 2 + 4 = 6 bytes of storage, corresponding to nvarchar(3). The number in brackets is the number of byte-pairs, not the number of characters. Three byte-pairs = 2 * 3 = 6 bytes. Yes, it's confusing but that's how it is. -- Returns 6 (bytes) and 2 (characters) DECLARE @y nvarchar(3) = N'X😎'; SELECT DATALENGTH(@y), LEN(@y); If we convert the string to a smaller type, truncation occurs: -- Returns "X", 2 (bytes), and 1 (character) DECLARE @y nvarchar(3) = N'X😎'; SELECT CONVERT(nvarchar(2), @y), DATALENGTH(CONVERT(nvarchar(2), @y)), LEN(CONVERT(nvarchar(2), @y)); This is the correct behaviour. Truncation stores as many whole characters from the string as will fit in the destination type. nvarchar(2) offers 2 * 2 = 4 bytes, which is only enough to store the first character, N'X'. There is not enough space left to store the 4 bytes of N'😎'. The bug occurs when the source is a LOB type, the destination is nvarchar (or nchar), truncation is needed, and the truncation point occurs within a (variable length) character. Using the same example with a LOB source: -- Returns "X�", 4 (bytes), and 2 (characters) DECLARE @y nvarchar(max) = N'X😎'; SELECT CONVERT(nvarchar(2), @y), DATALENGTH(CONVERT(nvarchar(2), @y)), LEN(CONVERT(nvarchar(2), @y)); The funny-looking second character is the first 2 bytes of N'😎' (which requires 4 bytes). Conversion has split this character in half! This is the bug.
To explain this a bit more to those who have asked: The documentation for CAST & CONVERT says: Starting with SQL Server 2012 (11.x), when using supplementary character (SC) collations, a CAST operation from nchar or nvarchar to an nchar or nvarchar type of smaller length won't truncate inside a surrogate pair. Instead, the operation truncates before the supplementary character. Under an SC collation, each character has a variable byte length. For example, N'X' takes 2 bytes, while N'😎' uses 4 bytes. The string N'X😎' (both characters together) requires 2 + 4 = 6 bytes of storage, corresponding to nvarchar(3). The number in brackets is the number of byte-pairs, not the number of characters. Three byte-pairs = 2 * 3 = 6 bytes. Yes, it's confusing but that's how it is. -- Returns 6 (bytes) and 2 (characters) DECLARE @y nvarchar(3) = N'X😎'; SELECT DATALENGTH(@y), LEN(@y); If we convert the string to a smaller type, truncation occurs: -- Returns "X", 2 (bytes), and 1 (character) DECLARE @y nvarchar(3) = N'X😎'; SELECT CONVERT(nvarchar(2), @y), DATALENGTH(CONVERT(nvarchar(2), @y)), LEN(CONVERT(nvarchar(2), @y)); This is the correct behaviour. Truncation stores as many whole characters from the string as will fit in the destination type. nvarchar(2) offers 2 * 2 = 4 bytes, which is only enough to store the first character, N'X'. There is not enough space left to store the 4 bytes of N'😎'. The bug occurs when the source is a LOB type, the destination is nvarchar (or nchar), truncation is needed, and the truncation point occurs within a (variable length) character. Using the same example with a LOB source: -- Returns "X�", 4 (bytes), and 2 (characters) DECLARE @y nvarchar(max) = N'X😎'; SELECT CONVERT(nvarchar(2), @y), DATALENGTH(CONVERT(nvarchar(2), @y)), LEN(CONVERT(nvarchar(2), @y)); The funny-looking second character is the first 2 bytes of N'😎' (which requires 4 bytes). Conversion has split this character in half! This is the bug.
@SQL_Kiwi If I could be half as smart as you, I'd be twice as smart as I am now.