SQL Server declare value on Merge condition Problem

前言

假如要判断资料是否存在于资料表中,存在就更新,不存在就新增.

这时我们可以使用Merge来帮助我们完成.

当两个资料表有複杂的比对的特性时,MERGE陈述式的条件式行为表现最佳。

有了Merge我们就不用使用IF EXISTS.

一切都是这么完美...

直到到有一天Merge在Prod撞到一个问题..

此文章同步发布在Blog SQL Server Merge condition on declare value problem

问题描述

使用语法user defined table type & Table如下

CREATE TABLE [dbo].[PriceLimitation]([CategoryID] [int] NOT NULL,[ProdcutGroupID] [smallint] NOT NULL,[UserID] [int] NOT NULL,[StakeAmount] [numeric](18, 4) NOT NULL,[ProductID] [smallint] NOT NULL, CONSTRAINT [PK_PriceLimitation] PRIMARY KEY CLUSTERED (    [UserID] ASC,[CategoryID] ASC,[ProductID] ASC,[ProdcutGroupID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]GOCREATE TYPE [dbo].[uftt_PriceLimit] AS TABLE([UserID] [int] NOT NULL,[StakeAmount] [numeric](18, 4) NOT NULL,PRIMARY KEY CLUSTERED ([UserID] ASC)WITH (IGNORE_DUP_KEY = OFF))

呼叫执行SP[dbo].[CalculateLimitation]

CREATE OR ALTER PROC [dbo].[CalculateStake]@CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint,@PriceLimit [uftt_PriceLimit] readonlyASBEGINSET NOCOUNT ON;MERGE INTO [dbo].[PriceLimitation] t1USING @PriceLimit t2ON t1.UserID = t2.UserID             AND t1.ProdcutGroupID= @ProdcutGroupID            AND t1.CategoryID=@CategoryID             AND t1.ProductID = @ProductIDWHEN MATCHED THENUPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmountWHEN NOT MATCHED THENINSERT VALUES(@CategoryID, @ProdcutGroupID, t2.UserID ,t2.StakeAmount, @ProductID);END

主要传入参数判断更新或新增[dbo].[PriceLimitation]

我们在ELK发现在执行SP时很少概率会发生PRIMARY KEY重複问题.

Violation of PRIMARY KEY constraint 'PK_PriceLimitation'. Cannot insert duplicate key in object 'dbo.PriceLimitation'.

但这张表PK使用栏位都有正确在Merge条件上,所以当下我们尝是在DEV重现此问题,但一直无法成功

后来发现此问题在高併发时才会发生,所以我们使用Query Stress来帮助我们模拟高併发请求时的状态.

Query Stress重现问题

撰写了模拟SQL并利用Query Stress重现问题.

模拟SQL脚本

DECLARE @CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint,      @PriceLimit [uftt_PriceLimit] declare @from intSELECT @from = ROUND(RAND(CAST(NEWID() as varbinary)) * 500,0)SELECT @CategoryID =  CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 4 +1SELECT @ProductID =  CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 5 +1SELECT @ProdcutGroupID =  CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 731,0) as int) % 20 +1;with cte as(select @from as numunion allselect num + 1 as num from cte where num < @from+500) insert into @PriceLimit ([UserID],[StakeAmount],CategoryID,[ProductID],[ProdcutGroupID])select num,100,@CategoryID,@ProductID,@ProdcutGroupIDfrom cteoption(MAXRECURSION 0);exec  [dbo].[CalculateStake] @CategoryID,@ProductID,@ProdcutGroupID,@PriceLimit

我使用模拟参数是

Iterator:30Thread:100

来模拟高併发时资料库请求状况,就能发现这时已经会出现Prod的Exception.

找到问题寻求解法

对于目前Prod问题已经迈出一大步了,因为现在问题可以重现,在网路上找了许多文章还是没找到解法....

后面在MSDN时看到关键一段话,关于merge-transact-sql.

请务必只从目标资料表指定用于比对用途的资料行。 也就是说,从目标资料表中指定要与来源资料表的对应资料行进行比较的资料行。 请勿尝试在 ON 子句中筛选出目标资料表的资料列 (例如指定 AND NOT target_table.column_x = value) 来改善查询效能。 这样做可能会传回非预期且不正确的结果。

后面有看到有篇文章在介绍use-where-clause-with-merge

我就尝试把sp写法改成只利用两个Table可以JOIN栏位当作条件,发现Duplicate PK问题就可以解决了....但发现另一个更麻烦问题.

SP改写后

CREATE OR ALTER PROC [dbo].[CalculateStake]@CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint,@PriceLimit [uftt_PriceLimit] readonlyASBEGINSET NOCOUNT ON;MERGE INTO [dbo].[PriceLimitation] t1USING @PriceLimit t2ON t1.UserID = t2.UserID WHEN MATCHED               AND t1.ProdcutGroupID= @ProdcutGroupID            AND t1.CategoryID=@CategoryID             AND t1.ProductID = @ProductID             THENUPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmountWHEN NOT MATCHED THENINSERT VALUES(@CategoryID, @ProdcutGroupID, t2.UserID ,t2.StakeAmount, @ProductID);END

新写法的执行计画在对于大资料表时会很没效率....

改写后遇到的问题(不好的执行计画)

一般SP在执行过后都会把使用的执行计画快取起来,所以我们可以透过DMV来查看执行执行计画.

SELECT Cacheobjtype, Objtype, TEXT, query_planFROM sys.dm_exec_cached_plans t1CROSS APPLY sys.dm_exec_sql_text(plan_handle) t2CROSS APPLY sys.dm_exec_query_plan(plan_handle) t3where t2.objectid = object_id('dbo.CalculateStake', 'p')

新和旧SP写法执行计画如下图.

旧写法

新写法

造成上面差异原因,因为新写法透过统计资讯使用效能较差的执行计画(能看到上面使用Merge Join明明传入结果集资料并不多)且在WHEN MATCHED进行第二次判断...

所以效能就变很差,现在已经找到此问题点了,我就在思考那有没有办法兼具效能又可解决此问题呢?

最终版SP写法

最后我就思考何不如把传入参数全部加入user defined table type

这样就可以利用传入参数当作ON条件也可以得到精準执行计画.user defined table type所有栏位可以跟Table的Clustered Index Match.
drop proc[dbo].[CalculateStake]drop type [dbo].[uftt_PriceLimit] CREATE TYPE [dbo].[uftt_PriceLimit] AS TABLE([CategoryID] [int] NOT NULL,[ProdcutGroupID] [smallint] NOT NULL,[UserID] [int] NOT NULL,[StakeAmount] [numeric](18, 4) NOT NULL,[ProductID] [smallint] NOT NULL,PRIMARY KEY CLUSTERED ([UserID] ASC,[CategoryID] ASC,[ProductID] ASC,[ProdcutGroupID] ASC)WITH (IGNORE_DUP_KEY = OFF))CREATE OR ALTER PROC [dbo].[CalculateStake]@PriceLimit [uftt_PriceLimit] readonlyASBEGINSET NOCOUNT ON;MERGE INTO [dbo].[PriceLimitation] t1USING @PriceLimit t2ON t1.UserID = t2.UserID             AND t1.ProdcutGroupID= t2.ProdcutGroupID            AND t1.CategoryID=t2.CategoryID             AND t1.ProductID =t2.ProductIDWHEN MATCHED THENUPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmountWHEN NOT MATCHED THENINSERT VALUES(t2.CategoryID, t2.ProdcutGroupID, t2.UserID ,t2.StakeAmount, t2.ProductID);END

测试脚本改成把参数透过uftt_PriceLimit传入

DECLARE @CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint,      @PriceLimit [uftt_PriceLimit] declare @from intSELECT @from = ROUND(RAND(CAST(NEWID() as varbinary)) * 500,0)SELECT @CategoryID =  CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 4 +1SELECT @ProductID =  CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 5 +1SELECT @ProdcutGroupID =  CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 731,0) as int) % 20 +1;with cte as(select @from as numunion allselect num + 1 as num from cte where num < @from+500) insert into @PriceLimit ([UserID],[StakeAmount],CategoryID,[ProductID],[ProdcutGroupID])select num,100,@CategoryID,@ProductID,@ProdcutGroupIDfrom cteoption(MAXRECURSION 0);exec  [dbo].[CalculateStake] @PriceLimit

请在跑修改后的SP前记得把Table先Truncate掉,这样可以更精準模拟

使用QueryStress模拟参数

Iterator:30Thread:100

执行结果如下

利用Query Stress工具压测发现问题解决且效能不会变差:)

小结:

没想到MergeOn条件有些隐藏限制(对于On写value condition官方只有说会有想不到的问题发生,并没解释原因为何...),但经过这次经验我日后在使用Merge时不会直接在On使用value condition会在中间多垫一层Table这样就可以使用ONJOIN.

另外QueryStress真是一个对于DB压测找问题的好工具,推荐大家去了解使用


关于作者: 网站小编

码农网专注IT技术教程资源分享平台,学习资源下载网站,58码农网包含计算机技术、网站程序源码下载、编程技术论坛、互联网资源下载等产品服务,提供原创、优质、完整内容的专业码农交流分享平台。

热门文章