浅谈数据库开发（二）——从宽表到存储过程（之一）

roych · 发表于 2019-11-16 02:47:13

前面说了宽表的建立，很多人可能觉得没什么实战，听起来比较枯燥。这次我决定从一个简单的例子讲起。——七天留存率。我们的数据源是这样的：

甲方爸爸要求的结果是这样的：

为了便于后面工作的开展，我们先来解释下这张表的数值来源。左边第一列的日期是第一次登录日期，其它列的日期表示，从第一次登陆后，这些用户是否再次登陆。如果有，则统计它。以第二行为例：
第一个交叉的位置（即1月1日与1月1日的相交处）中的3，表示的是是1月1日有3个用户登录（用户ID分别是1、2、3），1月2日时这三个人又登录了。1月3日时只有2个人（ID2没登录），1月4日三个人又登录，到了1月5日时又是2个人（ID3没登录）……如此类推。
再如第三行：
1月2日第一次登录的只有ID4（1、2和3第一次登录是在1月1日），因此，此后ID4分别在1月5日和1月6日登录了，因此1月2日（行）与1月5日、1月6日（列）交叉处均为1……其它如此类推。

可能有人觉得，这个表那么复杂，到底能干吗呢？其实很简单。它就是用来统计新用户留存率的。由左上角到右下角方向的对角线上的数字表示的是当天的新用户数（第一次登录为新用户）。该对角线上方的数字则表示老用户数。由此可见，1月8-9日没有新增用户。

说完了这个矩阵图的作用后，接下来，我们就开始考虑如何去实现它了。这时候可能有人会不屑地想着：这有啥难的？咱select，然后union一下不就得了？然后你写出了以下语句：

select a.登录时间
,count(distinct case when a.登录时间='2019-1-1' then a.用户ID else null end) as day0
,count(distinct case when a.登录时间='2019-1-2' then a.用户ID else null end) as day1
,count(distinct case when a.登录时间='2019-1-3' then a.用户ID else null end) as day2
,count(distinct case when a.登录时间='2019-1-4' then a.用户ID else null end) as day3
,count(distinct case when a.登录时间='2019-1-5' then a.用户ID else null end) as day4
,count(distinct case when a.登录时间='2019-1-6' then a.用户ID else null end) as day5
,count(distinct case when a.登录时间='2019-1-7' then a.用户ID else null end) as day6
,count(distinct case when a.登录时间='2019-1-8' then a.用户ID else null end) as day7
,count(distinct case when a.登录时间='2019-1-9' then a.用户ID else null end) as day8
from log_info a,(select distinct 用户ID from log_info where 登录时间='2019-1-1') b
where a.用户ID=b.用户ID and a.登录时间>='2019-1-1'
group by a.登录时间
union all
select a.登录时间
,0 as day0
,count(distinct case when a.登录时间='2019-1-2' then a.用户ID else null end) as day1
,count(distinct case when a.登录时间='2019-1-3' then a.用户ID else null end) as day2
,count(distinct case when a.登录时间='2019-1-4' then a.用户ID else null end) as day3
,count(distinct case when a.登录时间='2019-1-5' then a.用户ID else null end) as day4
,count(distinct case when a.登录时间='2019-1-6' then a.用户ID else null end) as day5
,count(distinct case when a.登录时间='2019-1-7' then a.用户ID else null end) as day6
,count(distinct case when a.登录时间='2019-1-8' then a.用户ID else null end) as day7
,count(distinct case when a.登录时间='2019-1-9' then a.用户ID else null end) as day8
from log_info a,(select distinct 用户ID from log_info where 登录时间='2019-1-2') b
where a.用户ID=b.用户ID
--新用户
and a.用户ID not in(select 用户ID from log_info c where c.登录时间='2019-1-1')
group by a.登录时间
/*
这里仅用2个表进行union all
以下从略。
union all
…………
*/

复制代码

从脚本上看似乎没什么问题，运行一下：

你的结果是这样的：

而甲方爸爸要求的第一行是这样的：

换句话说，你把一行的数据拆分为7行了。当然，也不是无可饶恕的问题。我们再sum一下不就得了？然后你会发现，似乎sum不了。原来是第一列的日期得修改下，于是你改成这样：

select 登录时间
,SUM(day0) as day0
,SUM(day1) as day1
,SUM(day2) as day2
,SUM(day3) as day3
,SUM(day4) as day4
,SUM(day5) as day5
,SUM(day6) as day6
,SUM(day7) as day7
,SUM(day8) as day8
from(
select '2019-1-1' as 登录时间
,count(distinct case when a.登录时间='2019-1-1' then a.用户ID else null end) as day0
,count(distinct case when a.登录时间='2019-1-2' then a.用户ID else null end) as day1
,count(distinct case when a.登录时间='2019-1-3' then a.用户ID else null end) as day2
,count(distinct case when a.登录时间='2019-1-4' then a.用户ID else null end) as day3
,count(distinct case when a.登录时间='2019-1-5' then a.用户ID else null end) as day4
,count(distinct case when a.登录时间='2019-1-6' then a.用户ID else null end) as day5
,count(distinct case when a.登录时间='2019-1-7' then a.用户ID else null end) as day6
,count(distinct case when a.登录时间='2019-1-8' then a.用户ID else null end) as day7
,count(distinct case when a.登录时间='2019-1-9' then a.用户ID else null end) as day8
from log_info a,(select distinct 用户ID from log_info where 登录时间='2019-1-1') b
where a.用户ID=b.用户ID and a.登录时间>='2019-1-1'
group by a.登录时间
union all
select '2019-1-2' as 登录时间
,0 as day0
,count(distinct case when a.登录时间='2019-1-2' then a.用户ID else null end) as day1
,count(distinct case when a.登录时间='2019-1-3' then a.用户ID else null end) as day2
,count(distinct case when a.登录时间='2019-1-4' then a.用户ID else null end) as day3
,count(distinct case when a.登录时间='2019-1-5' then a.用户ID else null end) as day4
,count(distinct case when a.登录时间='2019-1-6' then a.用户ID else null end) as day5
,count(distinct case when a.登录时间='2019-1-7' then a.用户ID else null end) as day6
,count(distinct case when a.登录时间='2019-1-8' then a.用户ID else null end) as day7
,count(distinct case when a.登录时间='2019-1-9' then a.用户ID else null end) as day8
from log_info a,(select distinct 用户ID from log_info where 登录时间='2019-1-2') b
where a.用户ID=b.用户ID
--新用户
and a.用户ID not in(select 用户ID from log_info c where c.登录时间='2019-1-1')
group by a.登录时间
/*
这里仅用2个表进行union all
以下从略。
union all
…………
*/
) a
group by 登录时间

复制代码

这次结果总算出来了：

于是，你脸上露出老母亲一样疲惫而欣慰的笑容。你留意到这里只是2行数据，心里想到后面还要复制粘贴7个union all，心里又莫名泛起了一阵悲凉：原来SQL server是一种体力活啊。于是你想，到底有没有别的办法呢？
欲知后事如何，且看下回分解。
附上数据源和模拟结果供大家参考。

相关文章：
浅谈数据库开发（一）：关于宽表
浅谈数据库开发（二）——从宽表到存储过程（之一）
浅谈数据库开发（三）——从宽表到存储过程（之二）
浅谈数据库开发（四）——从宽表到存储过程（之三）
浅谈数据库开发（五）——从宽表到存储过程（之四）
浅谈数据库开发（六）——从宽表到存储过程（之五）

chenwine · 发表于 2021-1-8 15:12:07

技术一流，言语风趣！

GOODWIN · 发表于 2021-10-11 10:19:57

学习

帐号		自动登录	找回密码
密码			注册

主题相关群

地区相关群[您还没设置居住地]

浅谈数据库开发（二）——从宽表到存储过程（之一）

本帖子中包含更多资源

群介绍
主题无相 (随机显示版	同地区群块其他地区)

群介绍
Access免费培训群2 (792054000)