Chapter 6 System Design

系统设计面试题精选: https://www.gitbook.com/book/soulmachine/system-design/details

http://yuanhsh.iteye.com/blog/2211983

http://www.1point3acres.com/bbs/thread-160526-1-1.html

https://github.com/checkcheckzz/system-design-interview

https://www.jiuzhang.com/qa/?channel=2

6.1 Google Calendar

  1. create event, invite user

  2. notifer users at specific time or periodically

Follow up:

if you have a lot of users with the same calendar, how to implment create event, invite user and notify users

https://hellosmallworld123.wordpress.com/2014/04/22/design-a-google-calendar/

精读: https://developers.google.com/google-apps/calendar/concepts/

Example:

The following diagram shows the conceptual relationship between calendars, events, and other related elements:

每个具体事件是一个Instance,重复循环发生的事件由很多Instance组成.

Event propagation


API参考: https://developers.google.com/google-apps/calendar/v3/reference/

https://www.ietf.org/rfc/rfc2445.txt

https://www.ibm.com/developerworks/cn/java/j-lo-ical4j/

农历节日:

https://github.com/infinet/lunar-calendar

Event又分one off event和recurring event两种.

可以参考:
- 谷歌的API: https://developers.google.com/google-apps/calendar/v3/reference/events/insert
- IBM的介绍iCalendar的文章的下面部分: 创建日历/事件/会议, 循环事件.
- python的iCalendar module有详细的说明如何添加event: https://icalendar.readthedocs.io/en/latest/usage.html#overview

这个程序可以把多个google calendar加到一个地方然后按starttime顺序显示出来: https://github.com/gjedeer/gcal-plasmoid/blob/master/contents/code/main.py

这里还有比较详细的讨论: http://www.1point3acres.com/bbs/forum.php?mod=viewthread&tid=204207

系统设计:
设计日历(不用考虑重复事件,用户少). 
如何存储事件EVENT??
实现显示日历创建事件邀请他人(不考虑重复事件),只说使用人数很少,
一直在讨论数据库该怎么设计,没有扩展(这个和面试官关系比较大,大
部分扩展都是根据你说的面试官现问的)祝你好运

https://github.com/UmassJin/Leetcode/blob/master/Design/Google_%E9%9D%A2%E7%BB%8F%E6%80%BB%E7%BB%93.py#L135

比如google calendar,我们要能create personal calendar;我们要能book time slot,并且time slot要可以设置不同的权限,对不同的group显示不同的内容;time slot要能解决event conflict的问题;如果一个用户可以create多个canlander,这些calendar能够confict吗?用户可以tag calendar吗? 怎么做calendar third party integration? 有了这些想法, 剩下的时间就是和面试官讨论其中的一个或者多个功能的实现.讨论requirement的过程就是一个交流的过程,你也可以从交流中得到面试官的喜好,然后实现他想要看到的功能.面试开始的头几分钟,其实就是一个头脑风暴的过程,从点到面,再收回到点. 很多candidate缺少这一个过程,很容易留下communication不足,缺少detail的印象.

https://github.com/UmassJin/Leetcode/blob/master/Design/Google_%E9%9D%A2%E7%BB%8F%E6%80%BB%E7%BB%93.py#L135

(4)阿三. design google calendar .  要求分析如何存data, 如何invoke user 
created events, 如何handle 100000events per second, 然后要写了一部分thread 
safe 的code 实现如何invoke event.

http://www.mitbbs.com/article_t/JobHunting/32919403.html

发信人: beefcurtain5 (beefcurtain5), 信区: JobHunting
标  题: Re: Linkedin 店面和oniste面经
发信站: BBS 未名空间站 (Fri Mar 27 16:44:58 2015, 美东)

而且我不觉得web based calendar system要考你什么HBase, cassandra的.  我觉得
这种题, 更加看看你OOD design...

我会先问一下这calendar system主要有什么functionality? 这是给个人的? 还是那
种像meetup嘻嘻的东西?

然后再解释一下主要有哪些base class, 比如 events class, user class, date 
class(这可有可无)

然后在说说你会provide哪些API, 比如说, addEvent(String jsonString), addUser
(String jsonString), registerForEvent(int userId, int eventId)

event和user之间的关系, 你是另外存一个table, 还是存在event和user object里?

像你这种什么HBase, cassandra, 都只是一个data store的方式, 根本不是重点

--------------------------------------------------------------------------
event和user之间的关系, 你是另外存一个table, 还是存在event和user object里?
这个你说怎么做比较好?

要返回一个user在某天的所有event,你说怎么做比较好?
--------------------------------------------------------------------------
难点就是在这里.  和你什么hbase没什么关系, 而且, 人家linkedin好像用什么
voldermont什么的

the tricky part is if the event is changed/user registed/unregisted certain 
events, how would u reflect the change on the webpage...

what i would probably say: if data is stored in, say mysql, then u can 
create views (u can create views for certain date, certain month...) and the
API would just query the view. 

nosql么, 我就不大知道了.  你当时是怎么回答的?

https://github.com/voldemort/voldemort

http://www.1point3acres.com/bbs/thread-201354-1-1.html
第二轮, 国人妹子, 先是OOD设计,让设计一个日历.截止算法题,给一堆人的日历记录,找出可以开会的时间段,就是merge interval.

http://nemofq.com/hackathon/
Cal(也就是.ics格式文件)输出,这种格式是日历地一种标准格式,被绝大多数地日历应用支持(例如Google Calendar/iCloud Calendar/Outlook Calendar),以一个指向.ics文件地链接添加后,日历应用会自动每隔一段时间去抓取更新(Google是大约每隔8小时,iCloud可以自定义),这样只要在这些日历应用内订阅再设置提醒就能得到完整地体验了,大大简化了我们的工作量.

这篇文章写得也不错:http://www.cnblogs.com/jcli/p/calendar_recur_rule.html
他提到了两个难点我深表赞同:

设计实现一个「日历」服务产品,我觉得有两个难点,一是「重复事件规则计算」,二是「到点事件准时提醒」.

其实日历设计就像你说的, 就是个周期事件, 应该很简单. 但是在做做的过程中有几个难点:

  1. 用户在打开应用时, 当页面定位在某个月(周, 日)视图上时, 怎么样从这个用户中所有的周期事件中快速计算出所有的当前视图事件? 不可能每次请求一次, 全部事件计算一次.

  2. 日历的主要功能是提醒. 怎么样准时, 无误的把每分钟要提醒的短信, 邮件发出去, 这又是个量变到质变的难点, 如果当前数据库有几千万的周期事件?

https://github.com/UmassJin/Leetcode/blob/master/Experience/google%E9%9D%A2%E7%BB%8F%E9%9B%86%E5%90%88.py

终于找到了个人版的终极解决方案: https://github.com/shadowwalker2718/Extensible

From: http://ext.ensible.com/

数据库的设计:
https://github.com/shadowwalker2718/Extensible/blob/master/examples/server/setup.sql

https://github.com/bmoeskau/Extensible/blob/master/recurrence-overview.md

Storing recurring events as individual rows is a recipe for disaster.

I will always persist a recurrence pattern, not individual recurrence instances. Event instances will be calculated at runtime.

mysql -u root -p123456

mysql> show tables;
+--------------------+
| Tables_in_calendar |
+--------------------+
| calendars          |
| events             |
| exceptions         |
+--------------------+
3 rows in set (0.00 sec)
mysql> select * from exceptions;
Empty set (0.00 sec)

mysql> select * from calendars;
+----+--------+-------+--------+
| id | title  | color | hidden |
+----+--------+-------+--------+
|  1 | Home   |     2 |      0 |
|  2 | Work   |    22 |      0 |
|  3 | School |     7 |      0 |
|  4 | Sports |    26 |      0 |
+----+--------+-------+--------+
4 rows in set (0.00 sec)

mysql> select * from events;
+----+--------------------------+---------------------+---------------------+---------+-------------+------------+----------+--------------------+------------------+----------+----------------------------------+----------+
| id | title                    | start               | end                 | all_day | calendar_id | app_id     | location | notes              | url              | reminder | rrule                            | duration |
+----+--------------------------+---------------------+---------------------+---------+-------------+------------+----------+--------------------+------------------+----------+----------------------------------+----------+
| 44 | Vacation                 | 2017-01-14 00:00:00 | 2017-01-24 00:00:00 |       0 |           1 | remote     |          | Have fun           |                  |          | NULL                             |     NULL |
| 45 | Lunch with Matt          | 2017-02-03 11:30:00 | 2017-02-03 13:00:00 |       0 |           2 | remote     | Chuy's   | Order the queso    | http://chuys.com | 15       | NULL                             |     NULL |
| 46 | Project due              | 2017-02-03 00:15:00 | 2017-02-03 00:15:00 |       0 |           3 | remote     |          |                    |                  |          | NULL                             |     NULL |
| 47 | Sarah's birthday         | 2017-02-03 00:00:00 | 2017-02-03 00:00:00 |       1 |           1 | remote     |          | Need to get a gift |                  |          | NULL                             |     NULL |
| 48 | A long one...            | 2017-01-22 00:00:00 | 2017-02-13 00:00:00 |       1 |           2 | remote     |          |                    |                  |          | NULL                             |     NULL |
| 49 | Daily 10 times           | 2017-02-03 08:00:00 | 2017-02-13 00:00:00 |       0 |           1 | recurrence |          |                    |                  |          | FREQ=DAILY;COUNT=10              |      120 |
| 50 | Weekly 8 times           | 2017-02-03 01:00:00 | 2017-03-31 00:00:00 |       1 |           2 | recurrence |          |                    |                  |          | FREQ=WEEKLY;COUNT=8              |        0 |
| 51 | Weekdays                 | 2017-01-13 13:00:00 | 9999-12-31 00:00:00 |       0 |           3 | recurrence |          |                    |                  |          | FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR |       60 |
| 52 | Last Friday of month     | 2016-12-30 00:00:00 | 9999-12-31 00:00:00 |       1 |           4 | recurrence |          |                    |                  |          | FREQ=MONTHLY;BYDAY=-1FR          |        0 |
| 53 | Weekend days             | 2017-01-13 00:00:00 | 9999-12-31 00:00:00 |       1 |           1 | recurrence |          |                    |                  |          | FREQ=WEEKLY;BYDAY=SU,SA          |        0 |
| 54 | Multi-day, every Tuesday | 2016-12-30 00:00:00 | 9999-12-31 00:00:00 |       1 |           4 | recurrence |          |                    |                  |          | FREQ=WEEKLY;BYDAY=TU             |     2879 |
| 55 | First day of each month  | 2016-12-30 00:00:00 | 9999-12-31 00:00:00 |       1 |           2 | recurrence |          |                    |                  |          | FREQ=MONTHLY;BYMONTHDAY=1        |       60 |
| 56 | Every third Wednesday    | 2016-12-30 00:00:00 | 9999-12-31 00:00:00 |       1 |           3 | recurrence |          |                    |                  |          | FREQ=MONTHLY;BYDAY=3WE           |        0 |
+----+--------------------------+---------------------+---------------------+---------+-------------+------------+----------+--------------------+------------------+----------+----------------------------------+----------+
13 rows in set (0.00 sec)

mysql>

Outlook Calendar里面涉及到event collision的设置:

Google Calendar里面涉及到event collision的设置:

Google Calendar的calendar list是集成在一个view上面显示. 但是Outlook里的calendar list默认是单独显示的.

只有在选择schedule view的时候才集成起来显示:

http://stackoverflow.com/questions/29890631/how-to-make-sure-no-two-recurring-events-will-overlap
http://stackoverflow.com/questions/2904087/algorithm-how-to-check-intersections-of-recurring-events-definitions

相似:

Calendar和crontab还不大一样,calendar的事件一般不能有重叠.

第一题是给你个startTime,endTime和frequency, 你设计一个让病人吃药的schedule

http://www.1point3acres.com//bbs/forum.php?mod=viewthread&tid=193838


这道题主要考察OO Design, Database Modelling. 关于recurring events collision的问题,首先google calendar和outlook calendar都没有禁止event overlap,还有这个算法上也很难做到. 下面就面试会提到的一些问题做出解答.

首先问清楚是个人使用的calendar还是公司用的.如果是个人版的,attendee就自己,主要功能就是提醒,不存在什么user group问题,权限问题,priviate calendar问题,事件数量也很少.但是可以考考数据库设计模式.

web calendar的query有个特点就是window不会太大,页面的查询时间不会超过1个月,所以可以动态计算rrule.

6.1.1 How to create event?

  1. add a new row in events table
    参见: https://github.com/shadowwalker2718/Extensible/blob/master/examples/server/php/api/events-recurrence.php#L318

  2. add 1/many new row in attendee_user, attendee_group

  3. send email to ask attendee users and groups for acceptance

本着存储rrule的原则,create event其实不难,难点在查询,因为查询的时候如果遇到rrule要动态解析.

上面的设计是企业版的,如果是个人版,其实第一步就够了.

6.1.2 How to make sure there is no collision when creating an event?

这个通常是个人版才有的问题. 首先如果是公司的话,event overlap应该是允许的.比如你自己有一个appointment,然后老板又发了一个appointment来,你可以不用删除先前那个自己的appointment,但是你肯定要去参加老板的那个apppointment.

但是,如果真的需要这个功能,判断是否和现有时间段有冲突,那么要先看这个event是single event还是recurring event.

6.1.2.1 Single Event

需要用到的函数:

  • is_overlap(SEVENT, SEVENT)

  • is_overlap(SEVENT, REVENT)

如果是single event,那么令其start和end datetime是s1,e1, 先去数据库查:

select * from events where start between (s1, e1) or end betwen (s1, e1)

如果结果有single event,那么就肯定有冲突.
如果结果没有single event,但是有recurring event,那么需要调用一个类似function generateInstances($event, $viewStartDate, $viewEndDate)的函数.这个generateInstances函数是查询用的,返回所有的instances.如果只需要判断是否overlap,就在loop里面检查是否有instance的window的和[s1,e1]重叠.通常不需要loop所有的instance就能返回结果.

参见代码: https://github.com/shadowwalker2718/Extensible/blob/master/examples/server/php/api/events-recurrence.php#L105

主要的逻辑在这个while loop里面: https://github.com/shadowwalker2718/Extensible/blob/master/examples/server/php/api/events-recurrence.php#L158

6.1.2.2 Recurring Event

需要用到的函数:

  • is_overlap(REVENT, REVENT)

如果是recurring event,同样令其start和end datetime是s1,e1, 同样去数据库查

select * from events where start between (s1, e1) or end betwen (s1, e1)

如果结果里面有single event,用上面的方法is_overlap(SEVENT, REVENT)先找single event和recurring event是否冲突.
如果结果里面有recurring event,用下面的方法:

如果FREQ相同(同为daily, weekly, monthly, yearly),其实就是步长相同,那么比较第一个instance是否overlap就可以了.
如果FREQ不同,就要按这个说的来check: http://stackoverflow.com/questions/741687/detecting-conflicts-in-recurring-events

First of all, you can improve this by just causing an early function-return:

def conflicts?(other)
  conflicts = 0
  recurrences.each do |my_rec|
    other.recurrences.each do |other_rec|
      start, finish = other_rec.first, other_rec.last
      return true if my_rec.include?(start) || my_rec.include?(finish)
    end
  end
  false
end

This will however not improve the average performance of your algorithm but will only cause one comparision if there is a conflict. The only option you got is to detect “simple” collisions early. So like

  • Store the type of recurrence (weekly, daily, monthly) into the recurrence object.
  • If both are daily recurrences, find the first day where the might be a potential conflict. Example: daily, a: January-July, b: May-October should only check May,1st for a time-conflict. If there doesn’t happen one, you do not need to check for any other conflicts.
  • Do the same for different constellations (week-week, day-week, day-year).
  • Avoid to write day-week and week-day - week_day(x,y) is the same as day_week(y,x).
  • If you don’t find a matching method, you will have to use the method given above as a fallback.

As you can see the latter is much more work - AND the worst-case execution time might be the same (since it uses the original algorithm as a fallback). Worst-case might be caused by an “irregular” reccurence (“each day one hour later”) for example.

6.1.3 How to represent and store recurring events?

如何表示和存储重复事件?

RRULE是表示重复的时间点,我们再添加一个duration,就可以表示重复事件了.

比如用RRULE表示美国总统大选日.Every four years, the first Tuesday after a Monday in November, 3 occurrences (U.S. Presidential Election day):

>>> pprint(list(rrule(YEARLY, interval=4, count=10,
            bymonth=11,byweekday=TU,
            bymonthday=(2,3,4,5,6,7,8),
            dtstart=parse("19961105T090000"))))
[datetime.datetime(1996, 11, 5, 9, 0),
 datetime.datetime(2000, 11, 7, 9, 0),
 datetime.datetime(2004, 11, 2, 9, 0),
 datetime.datetime(2008, 11, 4, 9, 0),
 datetime.datetime(2012, 11, 6, 9, 0),
 datetime.datetime(2016, 11, 8, 9, 0),
 datetime.datetime(2020, 11, 3, 9, 0),
 datetime.datetime(2024, 11, 5, 9, 0),
 datetime.datetime(2028, 11, 7, 9, 0),
 datetime.datetime(2032, 11, 2, 9, 0)]

上面的时间是大选日9点,如果表示时间段还要加一个duration,比如2个小时,所以在数据库里面一定要把duration存下来.

6.1.4 How to handle 10K events per second?

这里是如何通知用户. 基本概念请看:https://developers.google.com/google-apps/calendar/concepts/reminders

关键是有两种通知消息,第一种叫reminder,是通知attendee事件快要发生了,第二种是notification,通知事件的变更(改时间,改地址或者cancel等),还有attendee是否接受appointment.

6.1.4.1 Reminder

如果是桌面软件,比如Outlook, BBG terminal,就每隔一段时间(比如10分钟)提醒一次.这样需要记录上次reminder的时间和总共提醒的次数.

如果是email reminder一般就提醒一次,不然会被当成spam. 这样只需要记录提供过没有(Boolean).

  • 每个用户有一个私有的default reminder setting. 没有其他的reminder setting的时候就使用这个.比如对于subscribe的calendar就按照用户私有的这个reminder setting来.在google calendar里,可能有一个系统默认的reminder setting for ALL users.

  • calendar可以设置一个reminder setting.对自己create的calendar是有reminder setting的,但是对subscribe的calendar是没有的.见下图:

为此我们可以建立一个单独的表,存储calendar和reminder的对应关系.

Reminders can be specified for whole calendars and for individual events. Users can set default reminders for each of their calendars; these defaults apply to all events within that calendar. However, users can also override these defaults for individual events, replacing them with a different set of reminders.

我们需要一个单独的daemon负责发送reminder的事情. 因为reminder的最小粒度是一分钟,所以这个循环至少每分钟check一次. 类似于这样一个: https://github.com/shadowwalker2718/webcal/blob/master/tools/send_reminders.php

其中有个变量$DAYS_IN_ADVANCE很重要,表示At most, how many days in advance can a reminder be sent (max)? 这个是reminder的最大粒度.

在Google Calendar里面,这个最大值是4 weeks. 如果太大了性能肯定有问题.

如果你试图改变4为更大的值,是会失败的.

此外reminder发送之后,发送的时间也是要存起来,不然会重复发送.

这个程序的workflow是:

  1. 先查询用户setting table,看是否有用户设置不接受通知的.只找那些接受通知的用户.这样过滤掉一部分events.
  2. 从今天开始循环$DAYS_IN_ADVANCE天,找出每天的instance,根据它的起始时间,reminder setting的时间,last reminder sent time来决定是否发送reminder.
  3. 发送reminder并更新last reminder sent time

这是reminders table的schema:

CREATE TABLE webcal_reminders (
  cal_id INT  NOT NULL default '0',
  /* timestamp that specifies send datetime. */
  /* Use this or cal_offset, but not both */
  cal_date INT NOT NULL default '0',
  /* offset in minutes from the selected edge */
  cal_offset INT NOT NULL default '0',
  /* S=Start, E=End. Specifies which edge of entry this reminder applies to */
  cal_related CHAR(1) NOT NULL default 'S',
  /* specifies whether reminder is sent before or after selected edge */
  cal_before CHAR(1) NOT NULL default 'Y',
  /* timestamp of last sent reminder */
  cal_last_sent INT NOT NULL default '0',
  /* number of times to repeat in addition to original occurance */
  cal_repeats INT NOT NULL default '0',
  /* time in ISO 8601 format that specifies time between repeated reminders */
  cal_duration INT NOT NULL default '0',
  /* number of times this reminder has been sent */
  cal_times_sent INT NOT NULL default '0',
  /* action as imported, may be used in the future */
  cal_action VARCHAR(12) NOT NULL default 'EMAIL',
  PRIMARY KEY ( cal_id )
);

完整的schema请看: https://github.com/shadowwalker2718/webcal/blob/master/install/sql/tables-mysql.sql

上面的设计适用小系统,如果是google级别的大系统,可能要用到 Delay Scheduler (参见:concurrency.)

6.1.4.2 Notification

6.1.5 How to Delete an event one instance or all instances?

6.2 Hangman

http://inventwithpython.com/chapter9.html

http://inventwithpython.com/hangman.py


http://www.1point3acres.com/bbs/forum.php?mod=redirect&goto=findpost&ptid=212481&pid=2648712&fromuid=91137

http://www.jiuzhang.com/qa/2655/

面系统设计被问到这个题应该如何回答呢?

这整个看起来就像一个OOD Design, 看到面经上有同学提到面试官的要求:
1. 要求用户可以输钱,赢钱
2. 用户增加到5M +

整个题目看起来非常像OOD design, 没有很多系统设计的考点. 弱的问一下这个题应该从哪些方面去考虑呢? 然后有哪些需要scale的问题呢?

自己想了一些,希望有idea的同学补充一下. (里面有很多没有展开说,只是想在这里有个基本设计的idea)

Scenario:
用户登录玩 hangman, 5M user
QPS :来源于用户登录/充值,lookup 操作,start game, store game.

service?
user service : register/login/look up(check balance, game history)/ update profile/buy god(充值)

word dict (根据难度分组,easy, medium, hard, 或者level1-10)

load game (different style game ?)
根据用户选择的难度,load game. 然后开始.

如果有5个Million的game,要根据用户的情况生成不同的session?
如果用户过多,我就多产生一些page, 例如
hangman.com/game1-session1
hangman.com/game1-session2
hangman.com/game2-session12
hangman.com/game1-session13

如何存储?
实际上就是一个serialize 和 deserialize 的过程,存储当前游戏的状态,到了哪一步,还有多少次猜的机会,还有什么character available.

如果用户掉线?
如果游戏还没完,检测到用户掉线,就把当前游戏存进DB.(因为涉及到输钱赢钱的问题,不可能用户看快输了就拔线= =),以及想到此处可以和面试官讨论并且用memcache优化.

这里有一个简单的hangman http://www.webhangman.com/hangman-highscores.php
其实就是猜词语,例如给你5次机会,你每次可以猜一个字母,比如Google, 你每次可以猜a-z之间的一个. 你如果猜错(猜的word不在google里,机会-1)
如果你在5次之内才对,就算赢.
为什么叫hangman 是因为,这个上吊的man, 可以5次画完,每猜错就画一笔,如果画完这个上吊的小儿就算输.

https://www.codecademy.com/courses/javascript-intermediate-tpoPb/0/1

https://www.careercup.com/question?id=14964682

https://www.glassdoor.com/Interview/Design-a-hangman-game-QTN_178702.htm

https://www.glassdoor.com/Interview/Design-a-scalable-server-for-the-hangman-game-QTN_208360.htm

http://users.csc.calpoly.edu/~jdalbey/308/ProjectReqs/Hangman/SRS_Doc.html

https://github.com/ankjai/fullstack-nanodegree-design-a-game

http://lessons.livecode.com/m/2592/l/19175-hangman

https://github.com/h1994st/Interview-Result

https://zh.wikipedia.org/wiki/%E7%8C%9C%E5%96%AE%E8%A9%9E%E9%81%8A%E6%88%B2

6.2.4 User System Design