Standard for evaluation
可行解 Work Solution 25%
特定问题 Special Case 20%
分析能力 Analysis 25%
权衡 Tradeoff 15%
知识储备 Knowledge Base 15%
4S analysis
Scenario
- Features: list all the features and sort them by priority
- DAU -> QPS: based on daily active user (DAU), calculate the query per second (QPS, read QPS, write QPS), including avg QPS and peak QPS (usually about 2 or 3 times of avg QPS)
- DAU: assuming ~100M
- QPS: DAU * query estimation per day / 86400 (Note:86400s = 24h)
QPS = 100M*5/86400
- Interfaces
Service: design the service based on QPS
- Split the system into several subsystems
Storage
- How to store/access data: SQL / NoSQL / File System
- SQL: transaction, complex table with multiple columns
- NoSQL: graph relationship (e.g., friendship)
- File System: picture, video, etc.
- Schema: details the table structure
Scale
- Sharding / Optimize / Special Case
Design Twitter
- Scenario:
- Register / Login
- User Profile Display / Edit
- Upload Image / Video
- Search
- Post / Share a tweet*
- Timeline / News Feed*
- Follow / Unfollow a user*
- Service
- User service: register/login/user profile display/edit
- Search service: search
- Media service: upload image/video
- Tweet service: post/share/timeline/newsfeed
- Friendship service: follow/unfollow
- Storage:
Decide where to store:
- User service: SQL
- Search service: index -> File System
- Media service: File System
- Tweet service: NoSQL (tweet content -> NoSQL)
- Friendship service: NoSQL (it is also OK to put it in SQL, but NoSQL is better)
Details the schema of the table
READ MORE: when to use SQL/NoSQL????
Details the service
Tweet service: post/share/timeline/newsfeed
Friendship service: follow/unfollow
1.How to store and get "News Feed"
1)Pull Model
[Idea] Once the user checks his news feed, pull all followings’ timelines and merge them.
[Process]
[Problem] We need to do n times DB read (n = # of followings), which is very slow and not acceptable.
[Solution] Cache all user’s timeline (memcached)
[Follow Up]: Thundering herd problem: millions requests coming in at once, the DB and server cannot handle it.
[Solution] https://code.facebook.com/posts/1653074404941839/under-the-hood-broadcasting-live-video-to-millions/
[Follow Up]: 点赞,转发,评论,都会修改这条 Tweet 的基本信息,如何更新? •
[Solution] Keywords: write through, write back, look aside
[Follow Up]: Cache 失效如何破? 因为内存不够或者Cache决策错误,热点信息被踢出了Cache,会发生什么?
一大波僵尸袭来——DB会瞬间收到一大波对该数据请求,然后就可以安心的挂了
[Solution] Facebook Lease Get(from Facebook Paper) • Read more: http://bit.ly/1jDzKZK
2)Push Model
[Idea] Once a user posts a new tweet, push this tweet to all his follower’s newsfeed.
[Process]
[Problem] A lot of followers problem: If the following has hundreds of thousands of followers, this process will take a very long time and the user might not be able to see his/her post until several days later.
[Solution] For the following who owns a huge number of fans, when his/her fans check newsfeed, use pull model to update the newsfeed. But do not write the following’s timeline into follower’s newsfeed. Let push model to finish this part.
同类型的类似问题
Design Facebook
Design Instagram
Design Friend Circle (朋友圈)
Design Google Reader(RSS Reader)
课后作业:对比MySQL 和 Memcached 的 QPS • Memcached QPS / MySQL QPS ~ 100
How to check mutual friends
Related Reading:
Scaling Twitter: Making Twitter 10000 Percent Faster
How Twitter Stores 250 Million Tweets A Day Using MySQL
Compare NoSQL DB:
Compare NoSQL DB:



No comments:
Post a Comment