以Postgresql为主,再聊聊资料库 PostgreSQL Array 的应用

PostgreSQL 有 Array, 让我们在处理资料或是分析资料时,十分地便利.情境:某电商有10万种商品,上个月有一千万笔销售纪录,每笔销售有1~10种商品.我们使用 Array,可以很方便的建立以下的 table.create table it201012a (  id int generated always as identity, items int[] not null);产生测试资料的函数create or replace function ranitemarr()returns int[]as $$  select array_agg(ceil(random() * 100000)::int)    from generate_series(1, ceil(random() * 10)::int);$$language sql;insert into it201012a(items)select ranitemarr()  from generate_series(1,1e7);commit;接下来我们可以开发一个array内元素所有可能组合的函数.在 stackoverflow 中 已经有一个这样的範例https://stackoverflow.com/questions/30515990/postgresql-find-all-possible-combinations-permutations-in-recursive-query递迴啊~~~ 递迴的方式,虽然已经有开发完成,但是颇佔篇幅,今天先不介绍.Python 的 itertools 里面也有 combinations 可以利用https://docs.python.org/zh-cn/3/library/itertools.html#itertools.combinations我们将利用 combinations 来开发三个PlPython函数.create or replace function pyarrcombin (lst int[])returns setof int[]as $$  from itertools import combinations  return [list(t) for r in range(1, len(lst)+1) for t in combinations(lst,r)]$$ language plpython3u;create or replace function pyarrcombin (lst int[], n int)returns setof int[]as $$  from itertools import combinations  return [list(t) for t in combinations(lst,n)]$$ language plpython3u;create or replace function pyarrcombin (lst int[], n int, f int)returns setof int[]as $$  from itertools import combinations  return [list(t) for t in combinations(lst,n) if f in t]$$ language plpython3u;-------看一下使用案例产生全部的组合select pyarrcombin(array[1,2,3]);+-------------+| pyarrcombin |+-------------+| {1}         || {2}         || {3}         || {1,2}       || {1,3}       || {2,3}       || {1,2,3}     |+-------------+(7 rows)产生符合指定的数目元素select pyarrcombin(array[1,2,3],2);+-------------+| pyarrcombin |+-------------+| {1,2}       || {1,3}       || {2,3}       |+-------------+(3 rows)产生符合指定的数目元素,且包含指定元素select pyarrcombin(array[1,2,3],2,3);+-------------+| pyarrcombin |+-------------+| {1,3}       || {2,3}       |+-------------+(2 rows)--------使用产生组合的函数,来看应用.先来帮测试table 建立 index 吧.create index on it201012a using gin(items);analyze it201012a;commit;-------select *  from it201012a fetch first 3 rows only;+----+---------------------------------------------------------------+| id |                             items                             |+----+---------------------------------------------------------------+|  1 | {77367,65384,94905,83558,89606,66216,13380,89461,73248,65171} ||  2 | {30193,61007,48154}                                           ||  3 | {42786}                                                       |+----+---------------------------------------------------------------+(3 rows)假设以 61007 ,查询有哪些纪录有包含.select *  from it201012a where items @> array[61007];+---------+---------------------------------------------------------------+|   id    |                             items                             |+---------+---------------------------------------------------------------+|       2 | {30193,61007,48154}                                           ||    2706 | {61007,39231,7845,41394,22950,57433,41700,84929,90262}        |....| 9997914 | {61007}                                                       |+---------+---------------------------------------------------------------+(537 rows)查询与 61007 搭配的品项,出现次数依序降幂. select count(*)     , pyarrcombin(items, 2, 61007)  from it201012a where items @> array[61007] group by 2 order by 1 desc;+-------+---------------+| count |  pyarrcombin  |+-------+---------------+|     2 | {61007,9629}  ||     2 | {42904,61007} ||     2 | {61007,80684} ||     2 | {20941,61007} ||     2 | {61007,10792} ||     2 | {61007,70903} |...|     1 | {1439,61007}  |+-------+---------------+(3245 rows)Time: 62.261 ms可见得 61007 与 其他品项一起购买的情况是有不少,但没有特别明显的组合.----接下来找 61007 的哼哈二将select count(*)     , pyarrcombin(items, 3, 61007)  from it201012a where items @> array[61007] group by 2 order by 1 desc;+-------+---------------------+| count |     pyarrcombin     |+-------+---------------------+|     1 | {97,15451,61007}    ||     1 | {97,16040,61007}    |...|     1 | {99987,61007,68303} |+-------+---------------------+(9911 rows)Time: 77.938 ms速度都还不错.---------销量前三名的品项select a.item     , count(*)   from (select unnest(items) as item          from it201012a) a group by a.item order by 2 desc fetch first 3 rows only;+-------+-------+| item  | count |+-------+-------+| 78311 |   662 || 13034 |   652 || 42643 |   646 |+-------+-------+(3 rows)以78311 来查询select count(*)     , pyarrcombin(items, 2, 78311)  from it201012a where items @> array[78311] group by 2 order by 1 desc;+-------+---------------+| count |  pyarrcombin  |+-------+---------------+|     3 | {76373,78311} ||     2 | {4643,78311}  ||     2 | {65731,78311} ||     2 | {49658,78311} ||     2 | {87291,78311} |...|     1 | {1429,78311}  |+-------+---------------+(4082 rows)Time: 32.002 ms-----------因为产生的测试资料为乱数产生,会与现实的可能有较高相关性的资料不同.今天介绍的应用方式,不只可以用在电商,应用的範围很广,希望能带给大家在资料分析时有新的方式可以应用.

关于作者: 网站小编

码农网专注IT技术教程资源分享平台,学习资源下载网站,58码农网包含计算机技术、网站程序源码下载、编程技术论坛、互联网资源下载等产品服务,提供原创、优质、完整内容的专业码农交流分享平台。

热门文章