Skip to the content.

Design the Twitter timeline and search

Note: This document links directly to relevant areas found in the system design topics to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.

Design the Facebook feed and Design Facebook search are similar questions.

Step 1: Outline use cases and constraints

Gather requirements and scope the problem. Ask questions to clarify use cases and constraints. Discuss assumptions.

Without an interviewer to address clarifying questions, we’ll define some use cases and constraints.

Use cases

We’ll scope the problem to handle only the following use cases

Out of scope

Constraints and assumptions

State assumptions

General

Timeline

Search

Calculate usage

Clarify with your interviewer if you should run back-of-the-envelope usage calculations.

Handy conversion guide:

Step 2: Create a high level design

Outline a high level design with all important components.

Imgur

Step 3: Design core components

Dive into details for each core component.

Use case: User posts a tweet

We could store the user’s own tweets to populate the user timeline (activity from the user) in a relational database. We should discuss the use cases and tradeoffs between choosing SQL or NoSQL.

Delivering tweets and building the home timeline (activity from people the user is following) is trickier. Fanning out tweets to all followers (60 thousand tweets delivered on fanout per second) will overload a traditional relational database. We’ll probably want to choose a data store with fast writes such as a NoSQL database or Memory Cache. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a>

We could store media such as photos or videos on an Object Store.

Clarify with your interviewer how much code you are expected to write.

If our Memory Cache is Redis, we could use a native Redis list with the following structure:

           tweet n+2                   tweet n+1                   tweet n
| 8 bytes   8 bytes  1 byte | 8 bytes   8 bytes  1 byte | 8 bytes   8 bytes  1 byte |
| tweet_id  user_id  meta   | tweet_id  user_id  meta   | tweet_id  user_id  meta   |

The new tweet would be placed in the Memory Cache, which populates the user’s home timeline (activity from people the user is following).

We’ll use a public REST API:

$ curl -X POST --data '{ "user_id": "123", "auth_token": "ABC123", \
    "status": "hello world!", "media_ids": "ABC987" }' \
    https://twitter.com/api/v1/tweet

Response:

{
    "created_at": "Wed Sep 05 00:37:15 +0000 2012",
    "status": "hello world!",
    "tweet_id": "987",
    "user_id": "123",
    ...
}

For internal communications, we could use Remote Procedure Calls.

Use case: User views the home timeline

REST API:

$ curl https://twitter.com/api/v1/home_timeline?user_id=123

Response:

{
    "user_id": "456",
    "tweet_id": "123",
    "status": "foo"
},
{
    "user_id": "789",
    "tweet_id": "456",
    "status": "bar"
},
{
    "user_id": "789",
    "tweet_id": "579",
    "status": "baz"
},

Use case: User views the user timeline

The REST API would be similar to the home timeline, except all tweets would come from the user as opposed to the people the user is following.

Use case: User searches keywords

REST API:

$ curl https://twitter.com/api/v1/search?query=hello+world

The response would be similar to that of the home timeline, except for tweets matching the given query.

Step 4: Scale the design

Identify and address bottlenecks, given the constraints.

Imgur

Important: Do not simply jump right into the final design from the initial design!

State you would 1) Benchmark/Load Test, 2) Profile for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See Design a system that scales to millions of users on AWS as a sample on how to iteratively scale the initial design.

It’s important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a Load Balancer with multiple Web Servers? CDN? Master-Slave Replicas? What are the alternatives and Trade-Offs for each?

We’ll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.

To avoid repeating discussions, refer to the following system design topics for main talking points, tradeoffs, and alternatives:

The Fanout Service is a potential bottleneck. Twitter users with millions of followers could take several minutes to have their tweets go through the fanout process. This could lead to race conditions with @replies to the tweet, which we could mitigate by re-ordering the tweets at serve time.

We could also avoid fanning out tweets from highly-followed users. Instead, we could search to find tweets for highly-followed users, merge the search results with the user’s home timeline results, then re-order the tweets at serve time.

Additional optimizations include:

We’ll also want to address the bottleneck with the SQL Database.

Although the Memory Cache should reduce the load on the database, it is unlikely the SQL Read Replicas alone would be enough to handle the cache misses. We’ll probably need to employ additional SQL scaling patterns.

The high volume of writes would overwhelm a single SQL Write Master-Slave, also pointing to a need for additional scaling techniques.

We should also consider moving some data to a NoSQL Database.

Additional talking points

Additional topics to dive into, depending on the problem scope and time remaining.

NoSQL

Caching

Asynchronism and microservices

Communications

Security

Refer to the security section.

Latency numbers

See Latency numbers every programmer should know.

Ongoing