前言
假如要判断资料是否存在于资料表中,存在就更新,不存在就新增.
这时我们可以使用Merge
来帮助我们完成.
当两个资料表有複杂的比对的特性时,
MERGE
陈述式的条件式行为表现最佳。
有了Merge
我们就不用使用IF EXISTS
.
一切都是这么完美...
直到到有一天Merge
在Prod撞到一个问题..
此文章同步发布在Blog SQL Server Merge condition on declare value problem
问题描述
使用语法user defined table type & Table如下
CREATE TABLE [dbo].[PriceLimitation]([CategoryID] [int] NOT NULL,[ProdcutGroupID] [smallint] NOT NULL,[UserID] [int] NOT NULL,[StakeAmount] [numeric](18, 4) NOT NULL,[ProductID] [smallint] NOT NULL, CONSTRAINT [PK_PriceLimitation] PRIMARY KEY CLUSTERED ( [UserID] ASC,[CategoryID] ASC,[ProductID] ASC,[ProdcutGroupID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY]GOCREATE TYPE [dbo].[uftt_PriceLimit] AS TABLE([UserID] [int] NOT NULL,[StakeAmount] [numeric](18, 4) NOT NULL,PRIMARY KEY CLUSTERED ([UserID] ASC)WITH (IGNORE_DUP_KEY = OFF))
呼叫执行SP[dbo].[CalculateLimitation]
CREATE OR ALTER PROC [dbo].[CalculateStake]@CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint,@PriceLimit [uftt_PriceLimit] readonlyASBEGINSET NOCOUNT ON;MERGE INTO [dbo].[PriceLimitation] t1USING @PriceLimit t2ON t1.UserID = t2.UserID AND t1.ProdcutGroupID= @ProdcutGroupID AND t1.CategoryID=@CategoryID AND t1.ProductID = @ProductIDWHEN MATCHED THENUPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmountWHEN NOT MATCHED THENINSERT VALUES(@CategoryID, @ProdcutGroupID, t2.UserID ,t2.StakeAmount, @ProductID);END
主要传入参数判断更新或新增[dbo].[PriceLimitation]
表
我们在ELK
发现在执行SP时很少概率会发生PRIMARY KEY重複问题.
Violation of PRIMARY KEY constraint 'PK_PriceLimitation'. Cannot insert duplicate key in object 'dbo.PriceLimitation'.
但这张表PK使用栏位都有正确在Merge
条件上,所以当下我们尝是在DEV重现此问题,但一直无法成功
后来发现此问题在高併发时才会发生,所以我们使用Query Stress来帮助我们模拟高併发请求时的状态.
Query Stress重现问题
撰写了模拟SQL并利用Query Stress重现问题.
模拟SQL脚本
DECLARE @CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint, @PriceLimit [uftt_PriceLimit] declare @from intSELECT @from = ROUND(RAND(CAST(NEWID() as varbinary)) * 500,0)SELECT @CategoryID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 4 +1SELECT @ProductID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 5 +1SELECT @ProdcutGroupID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 731,0) as int) % 20 +1;with cte as(select @from as numunion allselect num + 1 as num from cte where num < @from+500) insert into @PriceLimit ([UserID],[StakeAmount],CategoryID,[ProductID],[ProdcutGroupID])select num,100,@CategoryID,@ProductID,@ProdcutGroupIDfrom cteoption(MAXRECURSION 0);exec [dbo].[CalculateStake] @CategoryID,@ProductID,@ProdcutGroupID,@PriceLimit
我使用模拟参数是
Iterator:30Thread:100来模拟高併发时资料库请求状况,就能发现这时已经会出现Prod的Exception
.
找到问题寻求解法
对于目前Prod问题已经迈出一大步了,因为现在问题可以重现,在网路上找了许多文章还是没找到解法....
后面在MSDN时看到关键一段话,关于merge-transact-sql.
请务必只从目标资料表指定用于比对用途的资料行。 也就是说,从目标资料表中指定要与来源资料表的对应资料行进行比较的资料行。 请勿尝试在 ON 子句中筛选出目标资料表的资料列 (例如指定
AND NOT target_table.column_x = value)
来改善查询效能。 这样做可能会传回非预期且不正确的结果。
后面有看到有篇文章在介绍use-where-clause-with-merge
我就尝试把sp写法改成只利用两个Table可以JOIN
栏位当作条件,发现Duplicate PK问题就可以解决了....但发现另一个更麻烦问题.
SP改写后
CREATE OR ALTER PROC [dbo].[CalculateStake]@CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint,@PriceLimit [uftt_PriceLimit] readonlyASBEGINSET NOCOUNT ON;MERGE INTO [dbo].[PriceLimitation] t1USING @PriceLimit t2ON t1.UserID = t2.UserID WHEN MATCHED AND t1.ProdcutGroupID= @ProdcutGroupID AND t1.CategoryID=@CategoryID AND t1.ProductID = @ProductID THENUPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmountWHEN NOT MATCHED THENINSERT VALUES(@CategoryID, @ProdcutGroupID, t2.UserID ,t2.StakeAmount, @ProductID);END
新写法的执行计画在对于大资料表时会很没效率....
改写后遇到的问题(不好的执行计画)
一般SP在执行过后都会把使用的执行计画快取起来,所以我们可以透过DMV来查看执行执行计画.
SELECT Cacheobjtype, Objtype, TEXT, query_planFROM sys.dm_exec_cached_plans t1CROSS APPLY sys.dm_exec_sql_text(plan_handle) t2CROSS APPLY sys.dm_exec_query_plan(plan_handle) t3where t2.objectid = object_id('dbo.CalculateStake', 'p')
新和旧SP写法执行计画如下图.
旧写法
新写法
造成上面差异原因,因为新写法透过统计资讯使用效能较差的执行计画(能看到上面使用Merge Join
明明传入结果集资料并不多)且在WHEN MATCHED
进行第二次判断...
所以效能就变很差,现在已经找到此问题点了,我就在思考那有没有办法兼具效能又可解决此问题呢?
最终版SP写法
最后我就思考何不如把传入参数全部加入user defined table type
ON
条件也可以得到精準执行计画.user defined table type
所有栏位可以跟Table的Clustered Index Match.drop proc[dbo].[CalculateStake]drop type [dbo].[uftt_PriceLimit] CREATE TYPE [dbo].[uftt_PriceLimit] AS TABLE([CategoryID] [int] NOT NULL,[ProdcutGroupID] [smallint] NOT NULL,[UserID] [int] NOT NULL,[StakeAmount] [numeric](18, 4) NOT NULL,[ProductID] [smallint] NOT NULL,PRIMARY KEY CLUSTERED ([UserID] ASC,[CategoryID] ASC,[ProductID] ASC,[ProdcutGroupID] ASC)WITH (IGNORE_DUP_KEY = OFF))CREATE OR ALTER PROC [dbo].[CalculateStake]@PriceLimit [uftt_PriceLimit] readonlyASBEGINSET NOCOUNT ON;MERGE INTO [dbo].[PriceLimitation] t1USING @PriceLimit t2ON t1.UserID = t2.UserID AND t1.ProdcutGroupID= t2.ProdcutGroupID AND t1.CategoryID=t2.CategoryID AND t1.ProductID =t2.ProductIDWHEN MATCHED THENUPDATE SET t1.StakeAmount = t1.StakeAmount + t2.StakeAmountWHEN NOT MATCHED THENINSERT VALUES(t2.CategoryID, t2.ProdcutGroupID, t2.UserID ,t2.StakeAmount, t2.ProductID);END
测试脚本改成把参数透过uftt_PriceLimit
传入
DECLARE @CategoryID int,@ProductID smallint ,@ProdcutGroupID smallint, @PriceLimit [uftt_PriceLimit] declare @from intSELECT @from = ROUND(RAND(CAST(NEWID() as varbinary)) * 500,0)SELECT @CategoryID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 4 +1SELECT @ProductID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 123,0) as int) % 5 +1SELECT @ProdcutGroupID = CAST(ROUND(RAND(CAST(NEWID() as varbinary)) * 731,0) as int) % 20 +1;with cte as(select @from as numunion allselect num + 1 as num from cte where num < @from+500) insert into @PriceLimit ([UserID],[StakeAmount],CategoryID,[ProductID],[ProdcutGroupID])select num,100,@CategoryID,@ProductID,@ProdcutGroupIDfrom cteoption(MAXRECURSION 0);exec [dbo].[CalculateStake] @PriceLimit
请在跑修改后的SP前记得把Table先Truncate掉,这样可以更精準模拟
使用QueryStress模拟参数
Iterator:30Thread:100执行结果如下
利用Query Stress
工具压测发现问题解决且效能不会变差:)
小结:
没想到Merge
在On
条件有些隐藏限制(对于On写value condition官方只有说会有想不到的问题发生,并没解释原因为何...),但经过这次经验我日后在使用Merge
时不会直接在On使用value condition会在中间多垫一层Table
这样就可以使用ON
来JOIN
.
另外QueryStress
真是一个对于DB压测找问题的好工具,推荐大家去了解使用