Baseline System Design — Databases, Part 1/?

Adventures In Toy Facebook Continue

Curt Corginia
4 min readSep 1, 2023
Photo by Campaign Creators on Unsplash. I could have used my own marker, but how would I erase it from the mirror?

Last time I wrote about Facebook fanout, but got bogged down in trying to figure out what Facebook actually did. A requirement for Facebook is that users can publish posts; another requirement is that users can receive posts (“news”). We can implement this with the push/fanout-on-write approach, or the pull/fanout-on-load approach. In the former, the newsfeed is generated during read time. In the latter, the newsfeed is precomputed during write time and new posts are delivered to friends’ caches immediately after publication.

An advantage of using a cache is speed, but a disadvantage is the “hotkey problem” — it is really slow for celebrities who have many friends. We can use a hybrid approach and have the majority of users use fanout-on-write while celebrities use fanout-on-load, but this is not necessarily what Facebook actually does.

The Database Tangent Begins

I have purchased Grokking The System Design Interview, a kind of $60 EBook not to be confused with the similarly named Educative.io course, and whatever weird history caused DesignGurus to split from them before Educative apparently rolled out their own ripoff (to be clear, I am not sure if that is actually what happened and do not want to summon their lawyers; that is just how it looks to me). One thing I noticed is that the “Design Facebook Newsfeed” page quickly dives into database design, without really explaining what SQL is or how to read a database diagram. They include their own section on why you might choose SQL or NoSQL, but it seems they treat SQL itself as prerequisite knowledge.

So…let’s see:

This HackerNoon article is more or less an advertisement for software I have never used, but I do like how they include a diagram example.

Primary Keys and Foreign Keys are explained in the resource below, from a website called learnsql. I really like how the website includes actual code, in addition to its tables.

This is a huge simplification, but a primary key uniquely identifies each row in a table, and a foreign key links a table to another. A varchar is a set of character data of indeterminate length.

DesignGurus lays out a database design like this: The objects are User, Entity (such as page or group), and FeedItem/post. This is a little different from the database design proposed by a site called systemdesignprep, but please check out their site if you want to see a hand-drawn diagram.

Their user table has user id (primary key), username, email, creation date, and description. Their user post table has post id (primary key), text, media URL, and timestamp. Please consult the link itself to view the other tables and foreign keys:

The SQL vs NoSQL Question

A Quora post written ten years ago by a Facebook engineer says they use a variety of NoSQL solutions, including HBase, Hadoop, and Memcached; MySQL is their primary data store. This information could very well be out of date.

DesignGurus, in their solution, suggests that you COULD use a relational database (hence their diagram). Alex Xu’s solution in System Design Interviews, unless I misread something, does not seem to touch on the SQL vs. NoSQL question at all.

This is, once again, a huge oversimplification but SQL databases are relational and NoSQL databases are non-relational. Interviewers in system design questions commonly ask whether you would choose SQL or NoSQL, and why. SQL, from that integrate.io source, is ACID (atomicity, consistency, isolation, durability) compliant and NoSQL is not necessarily ACID-compliant; what you get with NoSQL is horizontal scalability. In the words of that IBM article I just linked, you can add additional servers in a NoSQL schema to increase load, whereas a SQL schema is vertically scalable and you would have to migrate to a larger server.

I had thought the trade-off was that NoSQL is simply faster, and SQL is ACID-compliant, but now I think I was wrong.

Closing Thoughts

For your entertainment, you can read Sebastian Carlos’ s***post about SQL. Just don’t take it seriously.

Sebastian Carlos is a man of focus, git commitment, and sheer will. He is currently on a one-man crusade to prove that the software engineer community on Medium has no sense of humor.

--

--

Curt Corginia

Founder, CEO, CTO, COO, and janitor at a company I made up called CORGICorporation