update
@ -0,0 +1,4 @@
|
||||
# 系统设计
|
||||
|
||||
- [缓存系统](../cache/)
|
||||
- [系统设计入门](system-design-primer/index.md)
|
@ -0,0 +1 @@
|
||||
*.ipynb linguist-language=Python
|
@ -0,0 +1,64 @@
|
||||
# Byte-compiled / optimized / DLL files
|
||||
*.epub
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
|
||||
# C extensions
|
||||
*.so
|
||||
|
||||
# Distribution / packaging
|
||||
.Python
|
||||
env/
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# PyInstaller
|
||||
# Usually these files are written by a python script from a template
|
||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
||||
*.manifest
|
||||
*.spec
|
||||
|
||||
# Installer logs
|
||||
pip-log.txt
|
||||
pip-delete-this-directory.txt
|
||||
|
||||
# Unit test / coverage reports
|
||||
htmlcov/
|
||||
.tox/
|
||||
.coverage
|
||||
.cache
|
||||
nosetests.xml
|
||||
coverage.xml
|
||||
|
||||
# Translations
|
||||
*.mo
|
||||
*.pot
|
||||
|
||||
# Django stuff:
|
||||
*.log
|
||||
|
||||
# Sphinx documentation
|
||||
docs/_build/
|
||||
|
||||
# PyBuilder
|
||||
target/
|
||||
|
||||
# IPython notebook
|
||||
.ipynb_checkpoints
|
||||
|
||||
# Repo scratch directory
|
||||
scratch/
|
||||
|
||||
# IPython Notebook templates
|
||||
template.ipynb
|
@ -0,0 +1,78 @@
|
||||
Contributing
|
||||
============
|
||||
|
||||
Contributions are welcome!
|
||||
|
||||
**Please carefully read this page to make the code review process go as smoothly as possible and to maximize the likelihood of your contribution being merged.**
|
||||
|
||||
## Bug Reports
|
||||
|
||||
For bug reports or requests [submit an issue](https://github.com/donnemartin/system-design-primer/issues).
|
||||
|
||||
## Pull Requests
|
||||
|
||||
The preferred way to contribute is to fork the
|
||||
[main repository](https://github.com/donnemartin/system-design-primer) on GitHub.
|
||||
|
||||
1. Fork the [main repository](https://github.com/donnemartin/system-design-primer). Click on the 'Fork' button near the top of the page. This creates a copy of the code under your account on the GitHub server.
|
||||
|
||||
2. Clone this copy to your local disk:
|
||||
|
||||
$ git clone git@github.com:YourLogin/system-design-primer.git
|
||||
$ cd system-design-primer
|
||||
|
||||
3. Create a branch to hold your changes and start making changes. Don't work in the `master` branch!
|
||||
|
||||
$ git checkout -b my-feature
|
||||
|
||||
4. Work on this copy on your computer using Git to do the version control. When you're done editing, run the following to record your changes in Git:
|
||||
|
||||
$ git add modified_files
|
||||
$ git commit
|
||||
|
||||
5. Push your changes to GitHub with:
|
||||
|
||||
$ git push -u origin my-feature
|
||||
|
||||
6. Finally, go to the web page of your fork of the `system-design-primer` repo and click 'Pull Request' to send your changes for review.
|
||||
|
||||
### GitHub Pull Requests Docs
|
||||
|
||||
If you are not familiar with pull requests, review the [pull request docs](https://help.github.com/articles/using-pull-requests/).
|
||||
|
||||
## Translations
|
||||
|
||||
We'd like for the guide to be available in many languages. Here is the process for maintaining translations:
|
||||
|
||||
* This original version and content of the guide is maintained in English.
|
||||
* Translations follow the content of the original. Contributors must speak at least some English, so that translations do not diverge.
|
||||
* Each translation has a maintainer to update the translation as the original evolves and to review others' changes. This doesn't require a lot of time, but a review by the maintainer is important to maintain quality.
|
||||
|
||||
See [Translations](TRANSLATIONS.md).
|
||||
|
||||
### Changes to translations
|
||||
|
||||
* Changes to content should be made to the English version first, and then translated to each other language.
|
||||
* Changes that improve translations should be made directly on the file for that language. Pull requests should only modify one language at a time.
|
||||
* Submit a pull request with changes to the file in that language. Each language has a maintainer, who reviews changes in that language. Then the primary maintainer [@donnemartin](https://github.com/donnemartin) merges it in.
|
||||
* Prefix pull requests and issues with language codes if they are for that translation only, e.g. "es: Improve grammar", so maintainers can find them easily.
|
||||
* Tag the translation maintainer for a code review, see the list of [translation maintainers](TRANSLATIONS.md).
|
||||
* You will need to get a review from a native speaker (preferably the language maintainer) before your pull request is merged.
|
||||
|
||||
### Adding translations to new languages
|
||||
|
||||
Translations to new languages are always welcome! Keep in mind a translation must be maintained.
|
||||
|
||||
* Do you have time to be a maintainer for a new language? Please see the list of [translations](TRANSLATIONS.md) and tell us so we know we can count on you in the future.
|
||||
* Check the [translations](TRANSLATIONS.md), issues, and pull requests to see if a translation is in progress or stalled. If it's in progress, offer to help. If it's stalled, consider becoming the maintainer if you can commit to it.
|
||||
* If a translation has not yet been started, file an issue for your language so people know you are working on it and we'll coordinate. Confirm you are native level in the language and are willing to maintain the translation, so it's not orphaned.
|
||||
* To get started, fork the repo, then submit a pull request to the main repo with the single file README-xx.md added, where xx is the language code. Use standard [IETF language tags](https://www.w3.org/International/articles/language-tags/), i.e. the same as is used by Wikipedia, *not* the code for a single country. These are usually just the two-letter lowercase code, for example, `fr` for French and `uk` for Ukrainian (not `ua`, which is for the country). For languages that have variations, use the shortest tag, such as `zh-Hant`.
|
||||
* Feel free to invite friends to help your original translation by having them fork your repo, then merging their pull requests to your forked repo. Translations are difficult and usually have errors that others need to find.
|
||||
* Add links to your translation at the top of every README-XX.md file. For consistency, the link should be added in alphabetical order by ISO code, and the anchor text should be in the native language.
|
||||
* When you've fully translated the English README.md, comment on the pull request in the main repo that it's ready to be merged.
|
||||
* You'll need to have a complete and reviewed translation of the English README.md before your translation will be merged into the `master` branch.
|
||||
* Once accepted, your pull request will be squashed into a single commit into the `master` branch.
|
||||
|
||||
### Translation template credits
|
||||
|
||||
Thanks to [The Art of Command Line](https://github.com/jlevy/the-art-of-command-line) for the translation template.
|
@ -0,0 +1,9 @@
|
||||
I am providing code and resources in this repository to you under an open source
|
||||
license. Because this is my personal repository, the license you receive to my
|
||||
code and resources is from me and not my employer (Facebook).
|
||||
|
||||
Copyright 2017 Donne Martin
|
||||
|
||||
Creative Commons Attribution 4.0 International License (CC BY 4.0)
|
||||
|
||||
http://creativecommons.org/licenses/by/4.0/
|
@ -0,0 +1,163 @@
|
||||
# Translations
|
||||
|
||||
**Thank you to our awesome translation maintainers!**
|
||||
|
||||
## Contributing
|
||||
|
||||
See the [Contributing Guidelines](CONTRIBUTING.md).
|
||||
|
||||
## Translation Statuses
|
||||
|
||||
* 🎉 **Live**: Merged into `master` branch
|
||||
* ⏳ **In Progress**: Under active translation for eventual merge into `master` branch
|
||||
* ❗ **Stalled***: Needs an active maintainer ✋
|
||||
|
||||
**Within the past 2 months, there has been 1) No active work in the translation fork, and 2) No discussions from previous maintainer(s) in the discussion thread.*
|
||||
|
||||
Languages not listed here have not been started, [contribute](CONTRIBUTING.md)!
|
||||
|
||||
Languages are grouped by status and are listed in alphabetical order.
|
||||
|
||||
## Live
|
||||
|
||||
### 🎉 Japanese
|
||||
|
||||
* [README-ja.md](README-ja.md)
|
||||
* Maintainer(s): [@tsukukobaan](https://github.com/tsukukobaan) 👏
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/100
|
||||
|
||||
### 🎉 Simplified Chinese
|
||||
|
||||
* [zh-Hans.md](README-zh-Hans.md)
|
||||
* Maintainer(s): [@sqrthree](https://github.com/sqrthree) 👏
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/38
|
||||
|
||||
### 🎉 Traditional Chinese
|
||||
|
||||
* [README-zh-TW.md](README-zh-TW.md)
|
||||
* Maintainer(s): [@kevingo](https://github.com/kevingo) 👏
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/88
|
||||
|
||||
## In Progress
|
||||
|
||||
### ⏳ Korean
|
||||
|
||||
* Maintainer(s): [@bonomoon](https://github.com/bonomoon), [@mingrammer](https://github.com/mingrammer) 👏
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/102
|
||||
* Translation Fork: https://github.com/bonomoon/system-design-primer, https://github.com/donnemartin/system-design-primer/pull/103
|
||||
|
||||
### ⏳ Russian
|
||||
|
||||
* Maintainer(s): [@voitau](https://github.com/voitau), [@DmitryOlkhovoi](https://github.com/DmitryOlkhovoi) 👏
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/87
|
||||
* Translation Fork: https://github.com/voitau/system-design-primer/blob/master/README-ru.md
|
||||
|
||||
## Stalled
|
||||
|
||||
**Notes**:
|
||||
|
||||
* If you're able to commit to being an active maintainer for a language, let us know in the discussion thread for your language and update this file with a pull request.
|
||||
* If you're listed here as a "Previous Maintainer" but can commit to being an active maintainer, also let us know.
|
||||
* See the [Contributing Guidelines](CONTRIBUTING.md).
|
||||
|
||||
### ❗ Arabic
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@aymns](https://github.com/aymns)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/170
|
||||
* Translation Fork: https://github.com/aymns/system-design-primer/blob/develop/README-ar.md
|
||||
|
||||
### ❗ Bengali
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@nutboltu](https://github.com/nutboltu)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/220
|
||||
* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/240
|
||||
|
||||
### ❗ Brazilian Portuguese
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@IuryAlves](https://github.com/IuryAlves)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/40
|
||||
* Translation Fork: https://github.com/IuryAlves/system-design-primer, https://github.com/donnemartin/system-design-primer/pull/67
|
||||
|
||||
### ❗ French
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@spuyet](https://github.com/spuyet)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/250
|
||||
* Translation Fork: https://github.com/spuyet/system-design-primer/blob/add-french-translation/README-fr.md
|
||||
|
||||
### ❗ German
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@Allaman](https://github.com/Allaman)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/186
|
||||
* Translation Fork: None
|
||||
|
||||
### ❗ Greek
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@Belonias](https://github.com/Belonias)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/130
|
||||
* Translation Fork: None
|
||||
|
||||
### ❗ Hebrew
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@EladLeev](https://github.com/EladLeev)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/272
|
||||
* Translation Fork: https://github.com/EladLeev/system-design-primer/tree/he-translate
|
||||
|
||||
### ❗ Italian
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@pgoodjohn](https://github.com/pgoodjohn)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/104
|
||||
* Translation Fork: https://github.com/pgoodjohn/system-design-primer
|
||||
|
||||
### ❗ Persian
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@hadisinaee](https://github.com/hadisinaee)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/pull/112
|
||||
* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/112
|
||||
|
||||
### ❗ Spanish
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@eamanu](https://github.com/eamanu)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/136
|
||||
* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/189
|
||||
|
||||
### ❗ Thai
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@iphayao](https://github.com/iphayao)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/187
|
||||
* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/221
|
||||
|
||||
### ❗ Turkish
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@hwclass](https://github.com/hwclass), [@canerbaran](https://github.com/canerbaran), [@emrahtoy](https://github.com/emrahtoy)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/39
|
||||
* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/239
|
||||
|
||||
### ❗ Ukrainian
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@Kietzmann](https://github.com/Kietzmann), [@Acarus](https://github.com/Acarus)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/248
|
||||
* Translation Fork: https://github.com/Acarus/system-design-primer
|
||||
|
||||
### ❗ Vietnamese
|
||||
|
||||
* Maintainer(s): **Help Wanted** ✋
|
||||
* Previous Maintainer(s): [@tranlyvu](https://github.com/tranlyvu), [@duynguyenhoang](https://github.com/duynguyenhoang)
|
||||
* Discussion Thread: https://github.com/donnemartin/system-design-primer/issues/127
|
||||
* Translation Fork: https://github.com/donnemartin/system-design-primer/pull/241, https://github.com/donnemartin/system-design-primer/pull/327
|
||||
|
||||
## Not Started
|
||||
|
||||
Languages not listed here have not been started, [contribute](CONTRIBUTING.md)!
|
@ -0,0 +1,3 @@
|
||||
title: System Design Primer
|
||||
creator: Donne Martin
|
||||
date: 2018
|
@ -0,0 +1,54 @@
|
||||
#! /usr/bin/env bash
|
||||
|
||||
generate_from_stdin() {
|
||||
outfile=$1
|
||||
language=$2
|
||||
|
||||
echo "Generating '$language' ..."
|
||||
|
||||
pandoc --metadata-file=epub-metadata.yaml --metadata=lang:$2 --from=markdown -o $1 <&0
|
||||
|
||||
echo "Done! You can find the '$language' book at ./$outfile"
|
||||
}
|
||||
|
||||
generate_with_solutions () {
|
||||
tmpfile=$(mktemp /tmp/sytem-design-primer-epub-generator.XXX)
|
||||
|
||||
cat ./README.md >> $tmpfile
|
||||
|
||||
for dir in ./solutions/system_design/*; do
|
||||
case $dir in *template*) continue;; esac
|
||||
case $dir in *__init__.py*) continue;; esac
|
||||
: [[ -d "$dir" ]] && ( cd "$dir" && cat ./README.md >> $tmpfile && echo "" >> $tmpfile )
|
||||
done
|
||||
|
||||
cat $tmpfile | generate_from_stdin 'README.epub' 'en'
|
||||
|
||||
rm "$tmpfile"
|
||||
}
|
||||
|
||||
generate () {
|
||||
name=$1
|
||||
language=$2
|
||||
|
||||
cat $name.md | generate_from_stdin $name.epub $language
|
||||
}
|
||||
|
||||
# Check if dependencies exist
|
||||
check_dependencies () {
|
||||
for dependency in "${dependencies[@]}"
|
||||
do
|
||||
if ! [ -x "$(command -v $dependency)" ]; then
|
||||
echo "Error: $dependency is not installed." >&2
|
||||
exit 1
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
dependencies=("pandoc")
|
||||
|
||||
check_dependencies
|
||||
generate_with_solutions
|
||||
generate README-ja ja
|
||||
generate README-zh-Hans zh-Hans
|
||||
generate README-zh-TW zh-TW
|
After Width: | Height: | Size: 100 KiB |
After Width: | Height: | Size: 210 KiB |
After Width: | Height: | Size: 108 KiB |
After Width: | Height: | Size: 87 KiB |
After Width: | Height: | Size: 168 KiB |
After Width: | Height: | Size: 243 KiB |
After Width: | Height: | Size: 66 KiB |
After Width: | Height: | Size: 21 KiB |
After Width: | Height: | Size: 212 KiB |
After Width: | Height: | Size: 189 KiB |
After Width: | Height: | Size: 167 KiB |
After Width: | Height: | Size: 44 KiB |
After Width: | Height: | Size: 482 KiB |
After Width: | Height: | Size: 249 KiB |
After Width: | Height: | Size: 290 KiB |
After Width: | Height: | Size: 1.8 MiB |
After Width: | Height: | Size: 543 KiB |
After Width: | Height: | Size: 194 KiB |
After Width: | Height: | Size: 36 KiB |
After Width: | Height: | Size: 127 KiB |
After Width: | Height: | Size: 160 KiB |
After Width: | Height: | Size: 41 KiB |
After Width: | Height: | Size: 62 KiB |
After Width: | Height: | Size: 105 KiB |
After Width: | Height: | Size: 315 KiB |
After Width: | Height: | Size: 334 KiB |
After Width: | Height: | Size: 180 KiB |
After Width: | Height: | Size: 111 KiB |
After Width: | Height: | Size: 22 KiB |
After Width: | Height: | Size: 18 KiB |
After Width: | Height: | Size: 139 KiB |
After Width: | Height: | Size: 145 KiB |
After Width: | Height: | Size: 58 KiB |
After Width: | Height: | Size: 147 KiB |
After Width: | Height: | Size: 20 KiB |
After Width: | Height: | Size: 1.3 MiB |
After Width: | Height: | Size: 167 KiB |
@ -0,0 +1,206 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/system-design-primer)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Design a call center"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Constraints and assumptions\n",
|
||||
"\n",
|
||||
"* What levels of employees are in the call center?\n",
|
||||
" * Operator, supervisor, director\n",
|
||||
"* Can we assume operators always get the initial calls?\n",
|
||||
" * Yes\n",
|
||||
"* If there is no available operators or the operator can't handle the call, does the call go to the supervisors?\n",
|
||||
" * Yes\n",
|
||||
"* If there is no available supervisors or the supervisor can't handle the call, does the call go to the directors?\n",
|
||||
" * Yes\n",
|
||||
"* Can we assume the directors can handle all calls?\n",
|
||||
" * Yes\n",
|
||||
"* What happens if nobody can answer the call?\n",
|
||||
" * It gets queued\n",
|
||||
"* Do we need to handle 'VIP' calls where we put someone to the front of the line?\n",
|
||||
" * No\n",
|
||||
"* Can we assume inputs are valid or do we have to validate them?\n",
|
||||
" * Assume they're valid"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting call_center.py\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile call_center.py\n",
|
||||
"from abc import ABCMeta, abstractmethod\n",
|
||||
"from collections import deque\n",
|
||||
"from enum import Enum\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Rank(Enum):\n",
|
||||
"\n",
|
||||
" OPERATOR = 0\n",
|
||||
" SUPERVISOR = 1\n",
|
||||
" DIRECTOR = 2\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Employee(metaclass=ABCMeta):\n",
|
||||
"\n",
|
||||
" def __init__(self, employee_id, name, rank, call_center):\n",
|
||||
" self.employee_id = employee_id\n",
|
||||
" self.name = name\n",
|
||||
" self.rank = rank\n",
|
||||
" self.call = None\n",
|
||||
" self.call_center = call_center\n",
|
||||
"\n",
|
||||
" def take_call(self, call):\n",
|
||||
" \"\"\"Assume the employee will always successfully take the call.\"\"\"\n",
|
||||
" self.call = call\n",
|
||||
" self.call.employee = self\n",
|
||||
" self.call.state = CallState.IN_PROGRESS\n",
|
||||
"\n",
|
||||
" def complete_call(self):\n",
|
||||
" self.call.state = CallState.COMPLETE\n",
|
||||
" self.call_center.notify_call_completed(self.call)\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def escalate_call(self):\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" def _escalate_call(self):\n",
|
||||
" self.call.state = CallState.READY\n",
|
||||
" call = self.call\n",
|
||||
" self.call = None\n",
|
||||
" self.call_center.notify_call_escalated(call)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Operator(Employee):\n",
|
||||
"\n",
|
||||
" def __init__(self, employee_id, name):\n",
|
||||
" super(Operator, self).__init__(employee_id, name, Rank.OPERATOR)\n",
|
||||
"\n",
|
||||
" def escalate_call(self):\n",
|
||||
" self.call.level = Rank.SUPERVISOR\n",
|
||||
" self._escalate_call()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Supervisor(Employee):\n",
|
||||
"\n",
|
||||
" def __init__(self, employee_id, name):\n",
|
||||
" super(Operator, self).__init__(employee_id, name, Rank.SUPERVISOR)\n",
|
||||
"\n",
|
||||
" def escalate_call(self):\n",
|
||||
" self.call.level = Rank.DIRECTOR\n",
|
||||
" self._escalate_call()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Director(Employee):\n",
|
||||
"\n",
|
||||
" def __init__(self, employee_id, name):\n",
|
||||
" super(Operator, self).__init__(employee_id, name, Rank.DIRECTOR)\n",
|
||||
"\n",
|
||||
" def escalate_call(self):\n",
|
||||
" raise NotImplemented('Directors must be able to handle any call')\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CallState(Enum):\n",
|
||||
"\n",
|
||||
" READY = 0\n",
|
||||
" IN_PROGRESS = 1\n",
|
||||
" COMPLETE = 2\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Call(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, rank):\n",
|
||||
" self.state = CallState.READY\n",
|
||||
" self.rank = rank\n",
|
||||
" self.employee = None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class CallCenter(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, operators, supervisors, directors):\n",
|
||||
" self.operators = operators\n",
|
||||
" self.supervisors = supervisors\n",
|
||||
" self.directors = directors\n",
|
||||
" self.queued_calls = deque()\n",
|
||||
"\n",
|
||||
" def dispatch_call(self, call):\n",
|
||||
" if call.rank not in (Rank.OPERATOR, Rank.SUPERVISOR, Rank.DIRECTOR):\n",
|
||||
" raise ValueError('Invalid call rank: {}'.format(call.rank))\n",
|
||||
" employee = None\n",
|
||||
" if call.rank == Rank.OPERATOR:\n",
|
||||
" employee = self._dispatch_call(call, self.operators)\n",
|
||||
" if call.rank == Rank.SUPERVISOR or employee is None:\n",
|
||||
" employee = self._dispatch_call(call, self.supervisors)\n",
|
||||
" if call.rank == Rank.DIRECTOR or employee is None:\n",
|
||||
" employee = self._dispatch_call(call, self.directors)\n",
|
||||
" if employee is None:\n",
|
||||
" self.queued_calls.append(call)\n",
|
||||
"\n",
|
||||
" def _dispatch_call(self, call, employees):\n",
|
||||
" for employee in employees:\n",
|
||||
" if employee.call is None:\n",
|
||||
" employee.take_call(call)\n",
|
||||
" return employee\n",
|
||||
" return None\n",
|
||||
"\n",
|
||||
" def notify_call_escalated(self, call): # ...\n",
|
||||
" def notify_call_completed(self, call): # ...\n",
|
||||
" def dispatch_queued_call_to_newly_freed_employee(self, call, employee): # ..."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
@ -0,0 +1,122 @@
|
||||
from abc import ABCMeta, abstractmethod
|
||||
from collections import deque
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class Rank(Enum):
|
||||
|
||||
OPERATOR = 0
|
||||
SUPERVISOR = 1
|
||||
DIRECTOR = 2
|
||||
|
||||
|
||||
class Employee(metaclass=ABCMeta):
|
||||
|
||||
def __init__(self, employee_id, name, rank, call_center):
|
||||
self.employee_id = employee_id
|
||||
self.name = name
|
||||
self.rank = rank
|
||||
self.call = None
|
||||
self.call_center = call_center
|
||||
|
||||
def take_call(self, call):
|
||||
"""Assume the employee will always successfully take the call."""
|
||||
self.call = call
|
||||
self.call.employee = self
|
||||
self.call.state = CallState.IN_PROGRESS
|
||||
|
||||
def complete_call(self):
|
||||
self.call.state = CallState.COMPLETE
|
||||
self.call_center.notify_call_completed(self.call)
|
||||
|
||||
@abstractmethod
|
||||
def escalate_call(self):
|
||||
pass
|
||||
|
||||
def _escalate_call(self):
|
||||
self.call.state = CallState.READY
|
||||
call = self.call
|
||||
self.call = None
|
||||
self.call_center.notify_call_escalated(call)
|
||||
|
||||
|
||||
class Operator(Employee):
|
||||
|
||||
def __init__(self, employee_id, name):
|
||||
super(Operator, self).__init__(employee_id, name, Rank.OPERATOR)
|
||||
|
||||
def escalate_call(self):
|
||||
self.call.level = Rank.SUPERVISOR
|
||||
self._escalate_call()
|
||||
|
||||
|
||||
class Supervisor(Employee):
|
||||
|
||||
def __init__(self, employee_id, name):
|
||||
super(Operator, self).__init__(employee_id, name, Rank.SUPERVISOR)
|
||||
|
||||
def escalate_call(self):
|
||||
self.call.level = Rank.DIRECTOR
|
||||
self._escalate_call()
|
||||
|
||||
|
||||
class Director(Employee):
|
||||
|
||||
def __init__(self, employee_id, name):
|
||||
super(Operator, self).__init__(employee_id, name, Rank.DIRECTOR)
|
||||
|
||||
def escalate_call(self):
|
||||
raise NotImplementedError('Directors must be able to handle any call')
|
||||
|
||||
|
||||
class CallState(Enum):
|
||||
|
||||
READY = 0
|
||||
IN_PROGRESS = 1
|
||||
COMPLETE = 2
|
||||
|
||||
|
||||
class Call(object):
|
||||
|
||||
def __init__(self, rank):
|
||||
self.state = CallState.READY
|
||||
self.rank = rank
|
||||
self.employee = None
|
||||
|
||||
|
||||
class CallCenter(object):
|
||||
|
||||
def __init__(self, operators, supervisors, directors):
|
||||
self.operators = operators
|
||||
self.supervisors = supervisors
|
||||
self.directors = directors
|
||||
self.queued_calls = deque()
|
||||
|
||||
def dispatch_call(self, call):
|
||||
if call.rank not in (Rank.OPERATOR, Rank.SUPERVISOR, Rank.DIRECTOR):
|
||||
raise ValueError('Invalid call rank: {}'.format(call.rank))
|
||||
employee = None
|
||||
if call.rank == Rank.OPERATOR:
|
||||
employee = self._dispatch_call(call, self.operators)
|
||||
if call.rank == Rank.SUPERVISOR or employee is None:
|
||||
employee = self._dispatch_call(call, self.supervisors)
|
||||
if call.rank == Rank.DIRECTOR or employee is None:
|
||||
employee = self._dispatch_call(call, self.directors)
|
||||
if employee is None:
|
||||
self.queued_calls.append(call)
|
||||
|
||||
def _dispatch_call(self, call, employees):
|
||||
for employee in employees:
|
||||
if employee.call is None:
|
||||
employee.take_call(call)
|
||||
return employee
|
||||
return None
|
||||
|
||||
def notify_call_escalated(self, call):
|
||||
pass
|
||||
|
||||
def notify_call_completed(self, call):
|
||||
pass
|
||||
|
||||
def dispatch_queued_call_to_newly_freed_employee(self, call, employee):
|
||||
pass
|
@ -0,0 +1,195 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/system-design-primer)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Design a deck of cards"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Constraints and assumptions\n",
|
||||
"\n",
|
||||
"* Is this a generic deck of cards for games like poker and black jack?\n",
|
||||
" * Yes, design a generic deck then extend it to black jack\n",
|
||||
"* Can we assume the deck has 52 cards (2-10, Jack, Queen, King, Ace) and 4 suits?\n",
|
||||
" * Yes\n",
|
||||
"* Can we assume inputs are valid or do we have to validate them?\n",
|
||||
" * Assume they're valid"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting deck_of_cards.py\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile deck_of_cards.py\n",
|
||||
"from abc import ABCMeta, abstractmethod\n",
|
||||
"from enum import Enum\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Suit(Enum):\n",
|
||||
"\n",
|
||||
" HEART = 0\n",
|
||||
" DIAMOND = 1\n",
|
||||
" CLUBS = 2\n",
|
||||
" SPADE = 3\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Card(metaclass=ABCMeta):\n",
|
||||
"\n",
|
||||
" def __init__(self, value, suit):\n",
|
||||
" self.value = value\n",
|
||||
" self.suit = suit\n",
|
||||
" self.is_available = True\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" @abstractmethod\n",
|
||||
" def value(self):\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
" @value.setter\n",
|
||||
" @abstractmethod\n",
|
||||
" def value(self, other):\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class BlackJackCard(Card):\n",
|
||||
"\n",
|
||||
" def __init__(self, value, suit):\n",
|
||||
" super(BlackJackCard, self).__init__(value, suit)\n",
|
||||
"\n",
|
||||
" def is_ace(self):\n",
|
||||
" return self._value == 1\n",
|
||||
"\n",
|
||||
" def is_face_card(self):\n",
|
||||
" \"\"\"Jack = 11, Queen = 12, King = 13\"\"\"\n",
|
||||
" return 10 < self._value <= 13\n",
|
||||
"\n",
|
||||
" @property\n",
|
||||
" def value(self):\n",
|
||||
" if self.is_ace() == 1:\n",
|
||||
" return 1\n",
|
||||
" elif self.is_face_card():\n",
|
||||
" return 10\n",
|
||||
" else:\n",
|
||||
" return self._value\n",
|
||||
"\n",
|
||||
" @value.setter\n",
|
||||
" def value(self, new_value):\n",
|
||||
" if 1 <= new_value <= 13:\n",
|
||||
" self._value = new_value\n",
|
||||
" else:\n",
|
||||
" raise ValueError('Invalid card value: {}'.format(new_value))\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Hand(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, cards):\n",
|
||||
" self.cards = cards\n",
|
||||
"\n",
|
||||
" def add_card(self, card):\n",
|
||||
" self.cards.append(card)\n",
|
||||
"\n",
|
||||
" def score(self):\n",
|
||||
" total_value = 0\n",
|
||||
" for card in self.cards:\n",
|
||||
" total_value += card.value\n",
|
||||
" return total_value\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class BlackJackHand(Hand):\n",
|
||||
"\n",
|
||||
" BLACKJACK = 21\n",
|
||||
"\n",
|
||||
" def __init__(self, cards):\n",
|
||||
" super(BlackJackHand, self).__init__(cards)\n",
|
||||
"\n",
|
||||
" def score(self):\n",
|
||||
" min_over = sys.MAXSIZE\n",
|
||||
" max_under = -sys.MAXSIZE\n",
|
||||
" for score in self.possible_scores():\n",
|
||||
" if self.BLACKJACK < score < min_over:\n",
|
||||
" min_over = score\n",
|
||||
" elif max_under < score <= self.BLACKJACK:\n",
|
||||
" max_under = score\n",
|
||||
" return max_under if max_under != -sys.MAXSIZE else min_over\n",
|
||||
"\n",
|
||||
" def possible_scores(self):\n",
|
||||
" \"\"\"Return a list of possible scores, taking Aces into account.\"\"\"\n",
|
||||
" # ...\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Deck(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, cards):\n",
|
||||
" self.cards = cards\n",
|
||||
" self.deal_index = 0\n",
|
||||
"\n",
|
||||
" def remaining_cards(self):\n",
|
||||
" return len(self.cards) - deal_index\n",
|
||||
"\n",
|
||||
" def deal_card():\n",
|
||||
" try:\n",
|
||||
" card = self.cards[self.deal_index]\n",
|
||||
" card.is_available = False\n",
|
||||
" self.deal_index += 1\n",
|
||||
" except IndexError:\n",
|
||||
" return None\n",
|
||||
" return card\n",
|
||||
"\n",
|
||||
" def shuffle(self): # ..."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
@ -0,0 +1,117 @@
|
||||
from abc import ABCMeta, abstractmethod
|
||||
from enum import Enum
|
||||
import sys
|
||||
|
||||
|
||||
class Suit(Enum):
|
||||
|
||||
HEART = 0
|
||||
DIAMOND = 1
|
||||
CLUBS = 2
|
||||
SPADE = 3
|
||||
|
||||
|
||||
class Card(metaclass=ABCMeta):
|
||||
|
||||
def __init__(self, value, suit):
|
||||
self.value = value
|
||||
self.suit = suit
|
||||
self.is_available = True
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def value(self):
|
||||
pass
|
||||
|
||||
@value.setter
|
||||
@abstractmethod
|
||||
def value(self, other):
|
||||
pass
|
||||
|
||||
|
||||
class BlackJackCard(Card):
|
||||
|
||||
def __init__(self, value, suit):
|
||||
super(BlackJackCard, self).__init__(value, suit)
|
||||
|
||||
def is_ace(self):
|
||||
return True if self._value == 1 else False
|
||||
|
||||
def is_face_card(self):
|
||||
"""Jack = 11, Queen = 12, King = 13"""
|
||||
return True if 10 < self._value <= 13 else False
|
||||
|
||||
@property
|
||||
def value(self):
|
||||
if self.is_ace() == 1:
|
||||
return 1
|
||||
elif self.is_face_card():
|
||||
return 10
|
||||
else:
|
||||
return self._value
|
||||
|
||||
@value.setter
|
||||
def value(self, new_value):
|
||||
if 1 <= new_value <= 13:
|
||||
self._value = new_value
|
||||
else:
|
||||
raise ValueError('Invalid card value: {}'.format(new_value))
|
||||
|
||||
|
||||
class Hand(object):
|
||||
|
||||
def __init__(self, cards):
|
||||
self.cards = cards
|
||||
|
||||
def add_card(self, card):
|
||||
self.cards.append(card)
|
||||
|
||||
def score(self):
|
||||
total_value = 0
|
||||
for card in self.cards:
|
||||
total_value += card.value
|
||||
return total_value
|
||||
|
||||
|
||||
class BlackJackHand(Hand):
|
||||
|
||||
BLACKJACK = 21
|
||||
|
||||
def __init__(self, cards):
|
||||
super(BlackJackHand, self).__init__(cards)
|
||||
|
||||
def score(self):
|
||||
min_over = sys.MAXSIZE
|
||||
max_under = -sys.MAXSIZE
|
||||
for score in self.possible_scores():
|
||||
if self.BLACKJACK < score < min_over:
|
||||
min_over = score
|
||||
elif max_under < score <= self.BLACKJACK:
|
||||
max_under = score
|
||||
return max_under if max_under != -sys.MAXSIZE else min_over
|
||||
|
||||
def possible_scores(self):
|
||||
"""Return a list of possible scores, taking Aces into account."""
|
||||
pass
|
||||
|
||||
|
||||
class Deck(object):
|
||||
|
||||
def __init__(self, cards):
|
||||
self.cards = cards
|
||||
self.deal_index = 0
|
||||
|
||||
def remaining_cards(self):
|
||||
return len(self.cards) - self.deal_index
|
||||
|
||||
def deal_card(self):
|
||||
try:
|
||||
card = self.cards[self.deal_index]
|
||||
card.is_available = False
|
||||
self.deal_index += 1
|
||||
except IndexError:
|
||||
return None
|
||||
return card
|
||||
|
||||
def shuffle(self):
|
||||
pass
|
@ -0,0 +1,121 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/system-design-primer)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Design a hash map"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Constraints and assumptions\n",
|
||||
"\n",
|
||||
"* For simplicity, are the keys integers only?\n",
|
||||
" * Yes\n",
|
||||
"* For collision resolution, can we use chaining?\n",
|
||||
" * Yes\n",
|
||||
"* Do we have to worry about load factors?\n",
|
||||
" * No\n",
|
||||
"* Can we assume inputs are valid or do we have to validate them?\n",
|
||||
" * Assume they're valid\n",
|
||||
"* Can we assume this fits memory?\n",
|
||||
" * Yes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting hash_map.py\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile hash_map.py\n",
|
||||
"class Item(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, key, value):\n",
|
||||
" self.key = key\n",
|
||||
" self.value = value\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class HashTable(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, size):\n",
|
||||
" self.size = size\n",
|
||||
" self.table = [[] for _ in range(self.size)]\n",
|
||||
"\n",
|
||||
" def _hash_function(self, key):\n",
|
||||
" return key % self.size\n",
|
||||
"\n",
|
||||
" def set(self, key, value):\n",
|
||||
" hash_index = self._hash_function(key)\n",
|
||||
" for item in self.table[hash_index]:\n",
|
||||
" if item.key == key:\n",
|
||||
" item.value = value\n",
|
||||
" return\n",
|
||||
" self.table[hash_index].append(Item(key, value))\n",
|
||||
"\n",
|
||||
" def get(self, key):\n",
|
||||
" hash_index = self._hash_function(key)\n",
|
||||
" for item in self.table[hash_index]:\n",
|
||||
" if item.key == key:\n",
|
||||
" return item.value\n",
|
||||
" raise KeyError('Key not found')\n",
|
||||
"\n",
|
||||
" def remove(self, key):\n",
|
||||
" hash_index = self._hash_function(key)\n",
|
||||
" for index, item in enumerate(self.table[hash_index]):\n",
|
||||
" if item.key == key:\n",
|
||||
" del self.table[hash_index][index]\n",
|
||||
" return\n",
|
||||
" raise KeyError('Key not found')"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
@ -0,0 +1,38 @@
|
||||
class Item(object):
|
||||
|
||||
def __init__(self, key, value):
|
||||
self.key = key
|
||||
self.value = value
|
||||
|
||||
|
||||
class HashTable(object):
|
||||
|
||||
def __init__(self, size):
|
||||
self.size = size
|
||||
self.table = [[] for _ in range(self.size)]
|
||||
|
||||
def _hash_function(self, key):
|
||||
return key % self.size
|
||||
|
||||
def set(self, key, value):
|
||||
hash_index = self._hash_function(key)
|
||||
for item in self.table[hash_index]:
|
||||
if item.key == key:
|
||||
item.value = value
|
||||
return
|
||||
self.table[hash_index].append(Item(key, value))
|
||||
|
||||
def get(self, key):
|
||||
hash_index = self._hash_function(key)
|
||||
for item in self.table[hash_index]:
|
||||
if item.key == key:
|
||||
return item.value
|
||||
raise KeyError('Key not found')
|
||||
|
||||
def remove(self, key):
|
||||
hash_index = self._hash_function(key)
|
||||
for index, item in enumerate(self.table[hash_index]):
|
||||
if item.key == key:
|
||||
del self.table[hash_index][index]
|
||||
return
|
||||
raise KeyError('Key not found')
|
@ -0,0 +1,141 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/system-design-primer)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Design an LRU cache"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Constraints and assumptions\n",
|
||||
"\n",
|
||||
"* What are we caching?\n",
|
||||
" * We are caching the results of web queries\n",
|
||||
"* Can we assume inputs are valid or do we have to validate them?\n",
|
||||
" * Assume they're valid\n",
|
||||
"* Can we assume this fits memory?\n",
|
||||
" * Yes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting lru_cache.py\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile lru_cache.py\n",
|
||||
"class Node(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, results):\n",
|
||||
" self.results = results\n",
|
||||
" self.prev = None\n",
|
||||
" self.next = None\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class LinkedList(object):\n",
|
||||
"\n",
|
||||
" def __init__(self):\n",
|
||||
" self.head = None\n",
|
||||
" self.tail = None\n",
|
||||
"\n",
|
||||
" def move_to_front(self, node): # ...\n",
|
||||
" def append_to_front(self, node): # ...\n",
|
||||
" def remove_from_tail(self): # ...\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Cache(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, MAX_SIZE):\n",
|
||||
" self.MAX_SIZE = MAX_SIZE\n",
|
||||
" self.size = 0\n",
|
||||
" self.lookup = {} # key: query, value: node\n",
|
||||
" self.linked_list = LinkedList()\n",
|
||||
"\n",
|
||||
" def get(self, query)\n",
|
||||
" \"\"\"Get the stored query result from the cache.\n",
|
||||
" \n",
|
||||
" Accessing a node updates its position to the front of the LRU list.\n",
|
||||
" \"\"\"\n",
|
||||
" node = self.lookup.get(query)\n",
|
||||
" if node is None:\n",
|
||||
" return None\n",
|
||||
" self.linked_list.move_to_front(node)\n",
|
||||
" return node.results\n",
|
||||
"\n",
|
||||
" def set(self, results, query):\n",
|
||||
" \"\"\"Set the result for the given query key in the cache.\n",
|
||||
" \n",
|
||||
" When updating an entry, updates its position to the front of the LRU list.\n",
|
||||
" If the entry is new and the cache is at capacity, removes the oldest entry\n",
|
||||
" before the new entry is added.\n",
|
||||
" \"\"\"\n",
|
||||
" node = self.lookup.get(query)\n",
|
||||
" if node is not None:\n",
|
||||
" # Key exists in cache, update the value\n",
|
||||
" node.results = results\n",
|
||||
" self.linked_list.move_to_front(node)\n",
|
||||
" else:\n",
|
||||
" # Key does not exist in cache\n",
|
||||
" if self.size == self.MAX_SIZE:\n",
|
||||
" # Remove the oldest entry from the linked list and lookup\n",
|
||||
" self.lookup.pop(self.linked_list.tail.query, None)\n",
|
||||
" self.linked_list.remove_from_tail()\n",
|
||||
" else:\n",
|
||||
" self.size += 1\n",
|
||||
" # Add the new key and value\n",
|
||||
" new_node = Node(results)\n",
|
||||
" self.linked_list.append_to_front(new_node)\n",
|
||||
" self.lookup[query] = new_node"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
@ -0,0 +1,66 @@
|
||||
class Node(object):
|
||||
|
||||
def __init__(self, results):
|
||||
self.results = results
|
||||
self.next = next
|
||||
|
||||
|
||||
class LinkedList(object):
|
||||
|
||||
def __init__(self):
|
||||
self.head = None
|
||||
self.tail = None
|
||||
|
||||
def move_to_front(self, node):
|
||||
pass
|
||||
|
||||
def append_to_front(self, node):
|
||||
pass
|
||||
|
||||
def remove_from_tail(self):
|
||||
pass
|
||||
|
||||
|
||||
class Cache(object):
|
||||
|
||||
def __init__(self, MAX_SIZE):
|
||||
self.MAX_SIZE = MAX_SIZE
|
||||
self.size = 0
|
||||
self.lookup = {} # key: query, value: node
|
||||
self.linked_list = LinkedList()
|
||||
|
||||
def get(self, query):
|
||||
"""Get the stored query result from the cache.
|
||||
|
||||
Accessing a node updates its position to the front of the LRU list.
|
||||
"""
|
||||
node = self.lookup.get(query)
|
||||
if node is None:
|
||||
return None
|
||||
self.linked_list.move_to_front(node)
|
||||
return node.results
|
||||
|
||||
def set(self, results, query):
|
||||
"""Set the result for the given query key in the cache.
|
||||
|
||||
When updating an entry, updates its position to the front of the LRU list.
|
||||
If the entry is new and the cache is at capacity, removes the oldest entry
|
||||
before the new entry is added.
|
||||
"""
|
||||
node = self.lookup.get(query)
|
||||
if node is not None:
|
||||
# Key exists in cache, update the value
|
||||
node.results = results
|
||||
self.linked_list.move_to_front(node)
|
||||
else:
|
||||
# Key does not exist in cache
|
||||
if self.size == self.MAX_SIZE:
|
||||
# Remove the oldest entry from the linked list and lookup
|
||||
self.lookup.pop(self.linked_list.tail.query, None)
|
||||
self.linked_list.remove_from_tail()
|
||||
else:
|
||||
self.size += 1
|
||||
# Add the new key and value
|
||||
new_node = Node(results)
|
||||
self.linked_list.append_to_front(new_node)
|
||||
self.lookup[query] = new_node
|
@ -0,0 +1,171 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/system-design-primer)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Design an online chat"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Constraints and assumptions\n",
|
||||
"\n",
|
||||
"* Assume we'll focus on the following workflows:\n",
|
||||
" * Text conversations only\n",
|
||||
" * Users\n",
|
||||
" * Add a user\n",
|
||||
" * Remove a user\n",
|
||||
" * Update a user\n",
|
||||
" * Add to a user's friends list\n",
|
||||
" * Add friend request\n",
|
||||
" * Approve friend request\n",
|
||||
" * Reject friend request\n",
|
||||
" * Remove from a user's friends list\n",
|
||||
" * Create a group chat\n",
|
||||
" * Invite friends to a group chat\n",
|
||||
" * Post a message to a group chat\n",
|
||||
" * Private 1-1 chat\n",
|
||||
" * Invite a friend to a private chat\n",
|
||||
" * Post a meesage to a private chat\n",
|
||||
"* No need to worry about scaling initially"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting online_chat.py\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile online_chat.py\n",
|
||||
"from abc import ABCMeta\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class UserService(object):\n",
|
||||
"\n",
|
||||
" def __init__(self):\n",
|
||||
" self.users_by_id = {} # key: user id, value: User\n",
|
||||
"\n",
|
||||
" def add_user(self, user_id, name, pass_hash): # ...\n",
|
||||
" def remove_user(self, user_id): # ...\n",
|
||||
" def add_friend_request(self, from_user_id, to_user_id): # ...\n",
|
||||
" def approve_friend_request(self, from_user_id, to_user_id): # ...\n",
|
||||
" def reject_friend_request(self, from_user_id, to_user_id): # ...\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class User(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, user_id, name, pass_hash):\n",
|
||||
" self.user_id = user_id\n",
|
||||
" self.name = name\n",
|
||||
" self.pass_hash = pass_hash\n",
|
||||
" self.friends_by_id = {} # key: friend id, value: User\n",
|
||||
" self.friend_ids_to_private_chats = {} # key: friend id, value: private chats\n",
|
||||
" self.group_chats_by_id = {} # key: chat id, value: GroupChat\n",
|
||||
" self.received_friend_requests_by_friend_id = {} # key: friend id, value: AddRequest\n",
|
||||
" self.sent_friend_requests_by_friend_id = {} # key: friend id, value: AddRequest\n",
|
||||
"\n",
|
||||
" def message_user(self, friend_id, message): # ...\n",
|
||||
" def message_group(self, group_id, message): # ...\n",
|
||||
" def send_friend_request(self, friend_id): # ...\n",
|
||||
" def receive_friend_request(self, friend_id): # ...\n",
|
||||
" def approve_friend_request(self, friend_id): # ...\n",
|
||||
" def reject_friend_request(self, friend_id): # ...\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Chat(metaclass=ABCMeta):\n",
|
||||
"\n",
|
||||
" def __init__(self, chat_id):\n",
|
||||
" self.chat_id = chat_id\n",
|
||||
" self.users = []\n",
|
||||
" self.messages = []\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class PrivateChat(Chat):\n",
|
||||
"\n",
|
||||
" def __init__(self, first_user, second_user):\n",
|
||||
" super(PrivateChat, self).__init__()\n",
|
||||
" self.users.append(first_user)\n",
|
||||
" self.users.append(second_user)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class GroupChat(Chat):\n",
|
||||
"\n",
|
||||
" def add_user(self, user): # ...\n",
|
||||
" def remove_user(self, user): # ... \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Message(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, message_id, message, timestamp):\n",
|
||||
" self.message_id = message_id\n",
|
||||
" self.message = message\n",
|
||||
" self.timestamp = timestamp\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class AddRequest(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, from_user_id, to_user_id, request_status, timestamp):\n",
|
||||
" self.from_user_id = from_user_id\n",
|
||||
" self.to_user_id = to_user_id\n",
|
||||
" self.request_status = request_status\n",
|
||||
" self.timestamp = timestamp\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class RequestStatus(Enum):\n",
|
||||
"\n",
|
||||
" UNREAD = 0\n",
|
||||
" READ = 1\n",
|
||||
" ACCEPTED = 2\n",
|
||||
" REJECTED = 3"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
@ -0,0 +1,104 @@
|
||||
from abc import ABCMeta
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class UserService(object):
|
||||
|
||||
def __init__(self):
|
||||
self.users_by_id = {} # key: user id, value: User
|
||||
|
||||
def add_user(self, user_id, name, pass_hash):
|
||||
pass
|
||||
|
||||
def remove_user(self, user_id):
|
||||
pass
|
||||
|
||||
def add_friend_request(self, from_user_id, to_user_id):
|
||||
pass
|
||||
|
||||
def approve_friend_request(self, from_user_id, to_user_id):
|
||||
pass
|
||||
|
||||
def reject_friend_request(self, from_user_id, to_user_id):
|
||||
pass
|
||||
|
||||
|
||||
class User(object):
|
||||
|
||||
def __init__(self, user_id, name, pass_hash):
|
||||
self.user_id = user_id
|
||||
self.name = name
|
||||
self.pass_hash = pass_hash
|
||||
self.friends_by_id = {} # key: friend id, value: User
|
||||
self.friend_ids_to_private_chats = {} # key: friend id, value: private chats
|
||||
self.group_chats_by_id = {} # key: chat id, value: GroupChat
|
||||
self.received_friend_requests_by_friend_id = {} # key: friend id, value: AddRequest
|
||||
self.sent_friend_requests_by_friend_id = {} # key: friend id, value: AddRequest
|
||||
|
||||
def message_user(self, friend_id, message):
|
||||
pass
|
||||
|
||||
def message_group(self, group_id, message):
|
||||
pass
|
||||
|
||||
def send_friend_request(self, friend_id):
|
||||
pass
|
||||
|
||||
def receive_friend_request(self, friend_id):
|
||||
pass
|
||||
|
||||
def approve_friend_request(self, friend_id):
|
||||
pass
|
||||
|
||||
def reject_friend_request(self, friend_id):
|
||||
pass
|
||||
|
||||
|
||||
class Chat(metaclass=ABCMeta):
|
||||
|
||||
def __init__(self, chat_id):
|
||||
self.chat_id = chat_id
|
||||
self.users = []
|
||||
self.messages = []
|
||||
|
||||
|
||||
class PrivateChat(Chat):
|
||||
|
||||
def __init__(self, first_user, second_user):
|
||||
super(PrivateChat, self).__init__()
|
||||
self.users.append(first_user)
|
||||
self.users.append(second_user)
|
||||
|
||||
|
||||
class GroupChat(Chat):
|
||||
|
||||
def add_user(self, user):
|
||||
pass
|
||||
|
||||
def remove_user(self, user):
|
||||
pass
|
||||
|
||||
|
||||
class Message(object):
|
||||
|
||||
def __init__(self, message_id, message, timestamp):
|
||||
self.message_id = message_id
|
||||
self.message = message
|
||||
self.timestamp = timestamp
|
||||
|
||||
|
||||
class AddRequest(object):
|
||||
|
||||
def __init__(self, from_user_id, to_user_id, request_status, timestamp):
|
||||
self.from_user_id = from_user_id
|
||||
self.to_user_id = to_user_id
|
||||
self.request_status = request_status
|
||||
self.timestamp = timestamp
|
||||
|
||||
|
||||
class RequestStatus(Enum):
|
||||
|
||||
UNREAD = 0
|
||||
READ = 1
|
||||
ACCEPTED = 2
|
||||
REJECTED = 3
|
@ -0,0 +1,204 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook was prepared by [Donne Martin](https://github.com/donnemartin). Source and license info is on [GitHub](https://github.com/donnemartin/system-design-primer)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Design a parking lot"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Constraints and assumptions\n",
|
||||
"\n",
|
||||
"* What types of vehicles should we support?\n",
|
||||
" * Motorcycle, Car, Bus\n",
|
||||
"* Does each vehicle type take up a different amount of parking spots?\n",
|
||||
" * Yes\n",
|
||||
" * Motorcycle spot -> Motorcycle\n",
|
||||
" * Compact spot -> Motorcycle, Car\n",
|
||||
" * Large spot -> Motorcycle, Car\n",
|
||||
" * Bus can park if we have 5 consecutive \"large\" spots\n",
|
||||
"* Does the parking lot have multiple levels?\n",
|
||||
" * Yes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Overwriting parking_lot.py\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%%writefile parking_lot.py\n",
|
||||
"from abc import ABCMeta, abstractmethod\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class VehicleSize(Enum):\n",
|
||||
"\n",
|
||||
" MOTORCYCLE = 0\n",
|
||||
" COMPACT = 1\n",
|
||||
" LARGE = 2\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Vehicle(metaclass=ABCMeta):\n",
|
||||
"\n",
|
||||
" def __init__(self, vehicle_size, license_plate, spot_size):\n",
|
||||
" self.vehicle_size = vehicle_size\n",
|
||||
" self.license_plate = license_plate\n",
|
||||
" self.spot_size = spot_size\n",
|
||||
" self.spots_taken = []\n",
|
||||
"\n",
|
||||
" def clear_spots(self):\n",
|
||||
" for spot in self.spots_taken:\n",
|
||||
" spot.remove_vehicle(self)\n",
|
||||
" self.spots_taken = []\n",
|
||||
"\n",
|
||||
" def take_spot(self, spot):\n",
|
||||
" self.spots_taken.append(spot)\n",
|
||||
"\n",
|
||||
" @abstractmethod\n",
|
||||
" def can_fit_in_spot(self, spot):\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Motorcycle(Vehicle):\n",
|
||||
"\n",
|
||||
" def __init__(self, license_plate):\n",
|
||||
" super(Motorcycle, self).__init__(VehicleSize.MOTORCYCLE, license_plate, spot_size=1)\n",
|
||||
"\n",
|
||||
" def can_fit_in_spot(self, spot):\n",
|
||||
" return True\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Car(Vehicle):\n",
|
||||
"\n",
|
||||
" def __init__(self, license_plate):\n",
|
||||
" super(Car, self).__init__(VehicleSize.COMPACT, license_plate, spot_size=1)\n",
|
||||
"\n",
|
||||
" def can_fit_in_spot(self, spot):\n",
|
||||
" return True if (spot.size == LARGE or spot.size == COMPACT) else False\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Bus(Vehicle):\n",
|
||||
"\n",
|
||||
" def __init__(self, license_plate):\n",
|
||||
" super(Bus, self).__init__(VehicleSize.LARGE, license_plate, spot_size=5)\n",
|
||||
"\n",
|
||||
" def can_fit_in_spot(self, spot):\n",
|
||||
" return True if spot.size == LARGE else False\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class ParkingLot(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, num_levels):\n",
|
||||
" self.num_levels = num_levels\n",
|
||||
" self.levels = []\n",
|
||||
"\n",
|
||||
" def park_vehicle(self, vehicle):\n",
|
||||
" for level in levels:\n",
|
||||
" if level.park_vehicle(vehicle):\n",
|
||||
" return True\n",
|
||||
" return False\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class Level(object):\n",
|
||||
"\n",
|
||||
" SPOTS_PER_ROW = 10\n",
|
||||
"\n",
|
||||
" def __init__(self, floor, total_spots):\n",
|
||||
" self.floor = floor\n",
|
||||
" self.num_spots = total_spots\n",
|
||||
" self.available_spots = 0\n",
|
||||
" self.parking_spots = []\n",
|
||||
"\n",
|
||||
" def spot_freed(self):\n",
|
||||
" self.available_spots += 1\n",
|
||||
"\n",
|
||||
" def park_vehicle(self, vehicle):\n",
|
||||
" spot = self._find_available_spot(vehicle)\n",
|
||||
" if spot is None:\n",
|
||||
" return None\n",
|
||||
" else:\n",
|
||||
" spot.park_vehicle(vehicle)\n",
|
||||
" return spot\n",
|
||||
"\n",
|
||||
" def _find_available_spot(self, vehicle):\n",
|
||||
" \"\"\"Find an available spot where vehicle can fit, or return None\"\"\"\n",
|
||||
" # ...\n",
|
||||
"\n",
|
||||
" def _park_starting_at_spot(self, spot, vehicle):\n",
|
||||
" \"\"\"Occupy starting at spot.spot_number to vehicle.spot_size.\"\"\"\n",
|
||||
" # ...\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"class ParkingSpot(object):\n",
|
||||
"\n",
|
||||
" def __init__(self, level, row, spot_number, spot_size, vehicle_size):\n",
|
||||
" self.level = level\n",
|
||||
" self.row = row\n",
|
||||
" self.spot_number = spot_number\n",
|
||||
" self.spot_size = spot_size\n",
|
||||
" self.vehicle_size = vehicle_size\n",
|
||||
" self.vehicle = None\n",
|
||||
"\n",
|
||||
" def is_available(self):\n",
|
||||
" return True if self.vehicle is None else False\n",
|
||||
"\n",
|
||||
" def can_fit_vehicle(self, vehicle):\n",
|
||||
" if self.vehicle is not None:\n",
|
||||
" return False\n",
|
||||
" return vehicle.can_fit_in_spot(self)\n",
|
||||
"\n",
|
||||
" def park_vehicle(self, vehicle): # ...\n",
|
||||
" def remove_vehicle(self): # ..."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.4.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
@ -0,0 +1,125 @@
|
||||
from abc import ABCMeta, abstractmethod
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class VehicleSize(Enum):
|
||||
|
||||
MOTORCYCLE = 0
|
||||
COMPACT = 1
|
||||
LARGE = 2
|
||||
|
||||
|
||||
class Vehicle(metaclass=ABCMeta):
|
||||
|
||||
def __init__(self, vehicle_size, license_plate, spot_size):
|
||||
self.vehicle_size = vehicle_size
|
||||
self.license_plate = license_plate
|
||||
self.spot_size
|
||||
self.spots_taken = []
|
||||
|
||||
def clear_spots(self):
|
||||
for spot in self.spots_taken:
|
||||
spot.remove_vehicle(self)
|
||||
self.spots_taken = []
|
||||
|
||||
def take_spot(self, spot):
|
||||
self.spots_taken.append(spot)
|
||||
|
||||
@abstractmethod
|
||||
def can_fit_in_spot(self, spot):
|
||||
pass
|
||||
|
||||
|
||||
class Motorcycle(Vehicle):
|
||||
|
||||
def __init__(self, license_plate):
|
||||
super(Motorcycle, self).__init__(VehicleSize.MOTORCYCLE, license_plate, spot_size=1)
|
||||
|
||||
def can_fit_in_spot(self, spot):
|
||||
return True
|
||||
|
||||
|
||||
class Car(Vehicle):
|
||||
|
||||
def __init__(self, license_plate):
|
||||
super(Car, self).__init__(VehicleSize.COMPACT, license_plate, spot_size=1)
|
||||
|
||||
def can_fit_in_spot(self, spot):
|
||||
return spot.size in (VehicleSize.LARGE, VehicleSize.COMPACT)
|
||||
|
||||
|
||||
class Bus(Vehicle):
|
||||
|
||||
def __init__(self, license_plate):
|
||||
super(Bus, self).__init__(VehicleSize.LARGE, license_plate, spot_size=5)
|
||||
|
||||
def can_fit_in_spot(self, spot):
|
||||
return spot.size == VehicleSize.LARGE
|
||||
|
||||
|
||||
class ParkingLot(object):
|
||||
|
||||
def __init__(self, num_levels):
|
||||
self.num_levels = num_levels
|
||||
self.levels = [] # List of Levels
|
||||
|
||||
def park_vehicle(self, vehicle):
|
||||
for level in self.levels:
|
||||
if level.park_vehicle(vehicle):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
class Level(object):
|
||||
|
||||
SPOTS_PER_ROW = 10
|
||||
|
||||
def __init__(self, floor, total_spots):
|
||||
self.floor = floor
|
||||
self.num_spots = total_spots
|
||||
self.available_spots = 0
|
||||
self.spots = [] # List of ParkingSpots
|
||||
|
||||
def spot_freed(self):
|
||||
self.available_spots += 1
|
||||
|
||||
def park_vehicle(self, vehicle):
|
||||
spot = self._find_available_spot(vehicle)
|
||||
if spot is None:
|
||||
return None
|
||||
else:
|
||||
spot.park_vehicle(vehicle)
|
||||
return spot
|
||||
|
||||
def _find_available_spot(self, vehicle):
|
||||
"""Find an available spot where vehicle can fit, or return None"""
|
||||
pass
|
||||
|
||||
def _park_starting_at_spot(self, spot, vehicle):
|
||||
"""Occupy starting at spot.spot_number to vehicle.spot_size."""
|
||||
pass
|
||||
|
||||
|
||||
class ParkingSpot(object):
|
||||
|
||||
def __init__(self, level, row, spot_number, spot_size, vehicle_size):
|
||||
self.level = level
|
||||
self.row = row
|
||||
self.spot_number = spot_number
|
||||
self.spot_size = spot_size
|
||||
self.vehicle_size = vehicle_size
|
||||
self.vehicle = None
|
||||
|
||||
def is_available(self):
|
||||
return True if self.vehicle is None else False
|
||||
|
||||
def can_fit_vehicle(self, vehicle):
|
||||
if self.vehicle is not None:
|
||||
return False
|
||||
return vehicle.can_fit_in_spot(self)
|
||||
|
||||
def park_vehicle(self, vehicle):
|
||||
pass
|
||||
|
||||
def remove_vehicle(self):
|
||||
pass
|
@ -0,0 +1,441 @@
|
||||
# Design Mint.com
|
||||
|
||||
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
|
||||
|
||||
## Step 1: Outline use cases and constraints
|
||||
|
||||
> Gather requirements and scope the problem.
|
||||
> Ask questions to clarify use cases and constraints.
|
||||
> Discuss assumptions.
|
||||
|
||||
Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
|
||||
|
||||
### Use cases
|
||||
|
||||
#### We'll scope the problem to handle only the following use cases
|
||||
|
||||
* **User** connects to a financial account
|
||||
* **Service** extracts transactions from the account
|
||||
* Updates daily
|
||||
* Categorizes transactions
|
||||
* Allows manual category override by the user
|
||||
* No automatic re-categorization
|
||||
* Analyzes monthly spending, by category
|
||||
* **Service** recommends a budget
|
||||
* Allows users to manually set a budget
|
||||
* Sends notifications when approaching or exceeding budget
|
||||
* **Service** has high availability
|
||||
|
||||
#### Out of scope
|
||||
|
||||
* **Service** performs additional logging and analytics
|
||||
|
||||
### Constraints and assumptions
|
||||
|
||||
#### State assumptions
|
||||
|
||||
* Traffic is not evenly distributed
|
||||
* Automatic daily update of accounts applies only to users active in the past 30 days
|
||||
* Adding or removing financial accounts is relatively rare
|
||||
* Budget notifications don't need to be instant
|
||||
* 10 million users
|
||||
* 10 budget categories per user = 100 million budget items
|
||||
* Example categories:
|
||||
* Housing = $1,000
|
||||
* Food = $200
|
||||
* Gas = $100
|
||||
* Sellers are used to determine transaction category
|
||||
* 50,000 sellers
|
||||
* 30 million financial accounts
|
||||
* 5 billion transactions per month
|
||||
* 500 million read requests per month
|
||||
* 10:1 write to read ratio
|
||||
* Write-heavy, users make transactions daily, but few visit the site daily
|
||||
|
||||
#### Calculate usage
|
||||
|
||||
**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
|
||||
|
||||
* Size per transaction:
|
||||
* `user_id` - 8 bytes
|
||||
* `created_at` - 5 bytes
|
||||
* `seller` - 32 bytes
|
||||
* `amount` - 5 bytes
|
||||
* Total: ~50 bytes
|
||||
* 250 GB of new transaction content per month
|
||||
* 50 bytes per transaction * 5 billion transactions per month
|
||||
* 9 TB of new transaction content in 3 years
|
||||
* Assume most are new transactions instead of updates to existing ones
|
||||
* 2,000 transactions per second on average
|
||||
* 200 read requests per second on average
|
||||
|
||||
Handy conversion guide:
|
||||
|
||||
* 2.5 million seconds per month
|
||||
* 1 request per second = 2.5 million requests per month
|
||||
* 40 requests per second = 100 million requests per month
|
||||
* 400 requests per second = 1 billion requests per month
|
||||
|
||||
## Step 2: Create a high level design
|
||||
|
||||
> Outline a high level design with all important components.
|
||||
|
||||

|
||||
|
||||
## Step 3: Design core components
|
||||
|
||||
> Dive into details for each core component.
|
||||
|
||||
### Use case: User connects to a financial account
|
||||
|
||||
We could store info on the 10 million users in a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms). We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
|
||||
|
||||
* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* The **Web Server** forwards the request to the **Accounts API** server
|
||||
* The **Accounts API** server updates the **SQL Database** `accounts` table with the newly entered account info
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
The `accounts` table could have the following structure:
|
||||
|
||||
```
|
||||
id int NOT NULL AUTO_INCREMENT
|
||||
created_at datetime NOT NULL
|
||||
last_update datetime NOT NULL
|
||||
account_url varchar(255) NOT NULL
|
||||
account_login varchar(32) NOT NULL
|
||||
account_password_hash char(64) NOT NULL
|
||||
user_id int NOT NULL
|
||||
PRIMARY KEY(id)
|
||||
FOREIGN KEY(user_id) REFERENCES users(id)
|
||||
```
|
||||
|
||||
We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id`, `user_id `, and `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
|
||||
|
||||
We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
|
||||
|
||||
```
|
||||
$ curl -X POST --data '{ "user_id": "foo", "account_url": "bar", \
|
||||
"account_login": "baz", "account_password": "qux" }' \
|
||||
https://mint.com/api/v1/account
|
||||
```
|
||||
|
||||
For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
|
||||
|
||||
Next, the service extracts transactions from the account.
|
||||
|
||||
### Use case: Service extracts transactions from the account
|
||||
|
||||
We'll want to extract information from an account in these cases:
|
||||
|
||||
* The user first links the account
|
||||
* The user manually refreshes the account
|
||||
* Automatically each day for users who have been active in the past 30 days
|
||||
|
||||
Data flow:
|
||||
|
||||
* The **Client** sends a request to the **Web Server**
|
||||
* The **Web Server** forwards the request to the **Accounts API** server
|
||||
* The **Accounts API** server places a job on a **Queue** such as [Amazon SQS](https://aws.amazon.com/sqs/) or [RabbitMQ](https://www.rabbitmq.com/)
|
||||
* Extracting transactions could take awhile, we'd probably want to do this [asynchronously with a queue](https://github.com/donnemartin/system-design-primer#asynchronism), although this introduces additional complexity
|
||||
* The **Transaction Extraction Service** does the following:
|
||||
* Pulls from the **Queue** and extracts transactions for the given account from the financial institution, storing the results as raw log files in the **Object Store**
|
||||
* Uses the **Category Service** to categorize each transaction
|
||||
* Uses the **Budget Service** to calculate aggregate monthly spending by category
|
||||
* The **Budget Service** uses the **Notification Service** to let users know if they are nearing or have exceeded their budget
|
||||
* Updates the **SQL Database** `transactions` table with categorized transactions
|
||||
* Updates the **SQL Database** `monthly_spending` table with aggregate monthly spending by category
|
||||
* Notifies the user the transactions have completed through the **Notification Service**:
|
||||
* Uses a **Queue** (not pictured) to asynchronously send out notifications
|
||||
|
||||
The `transactions` table could have the following structure:
|
||||
|
||||
```
|
||||
id int NOT NULL AUTO_INCREMENT
|
||||
created_at datetime NOT NULL
|
||||
seller varchar(32) NOT NULL
|
||||
amount decimal NOT NULL
|
||||
user_id int NOT NULL
|
||||
PRIMARY KEY(id)
|
||||
FOREIGN KEY(user_id) REFERENCES users(id)
|
||||
```
|
||||
|
||||
We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id`, `user_id `, and `created_at`.
|
||||
|
||||
The `monthly_spending` table could have the following structure:
|
||||
|
||||
```
|
||||
id int NOT NULL AUTO_INCREMENT
|
||||
month_year date NOT NULL
|
||||
category varchar(32)
|
||||
amount decimal NOT NULL
|
||||
user_id int NOT NULL
|
||||
PRIMARY KEY(id)
|
||||
FOREIGN KEY(user_id) REFERENCES users(id)
|
||||
```
|
||||
|
||||
We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id` and `user_id `.
|
||||
|
||||
#### Category service
|
||||
|
||||
For the **Category Service**, we can seed a seller-to-category dictionary with the most popular sellers. If we estimate 50,000 sellers and estimate each entry to take less than 255 bytes, the dictionary would only take about 12 MB of memory.
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
```python
|
||||
class DefaultCategories(Enum):
|
||||
|
||||
HOUSING = 0
|
||||
FOOD = 1
|
||||
GAS = 2
|
||||
SHOPPING = 3
|
||||
...
|
||||
|
||||
seller_category_map = {}
|
||||
seller_category_map['Exxon'] = DefaultCategories.GAS
|
||||
seller_category_map['Target'] = DefaultCategories.SHOPPING
|
||||
...
|
||||
```
|
||||
|
||||
For sellers not initially seeded in the map, we could use a crowdsourcing effort by evaluating the manual category overrides our users provide. We could use a heap to quickly lookup the top manual override per seller in O(1) time.
|
||||
|
||||
```python
|
||||
class Categorizer(object):
|
||||
|
||||
def __init__(self, seller_category_map, seller_category_crowd_overrides_map):
|
||||
self.seller_category_map = seller_category_map
|
||||
self.seller_category_crowd_overrides_map = \
|
||||
seller_category_crowd_overrides_map
|
||||
|
||||
def categorize(self, transaction):
|
||||
if transaction.seller in self.seller_category_map:
|
||||
return self.seller_category_map[transaction.seller]
|
||||
elif transaction.seller in self.seller_category_crowd_overrides_map:
|
||||
self.seller_category_map[transaction.seller] = \
|
||||
self.seller_category_crowd_overrides_map[transaction.seller].peek_min()
|
||||
return self.seller_category_map[transaction.seller]
|
||||
return None
|
||||
```
|
||||
|
||||
Transaction implementation:
|
||||
|
||||
```python
|
||||
class Transaction(object):
|
||||
|
||||
def __init__(self, created_at, seller, amount):
|
||||
self.created_at = created_at
|
||||
self.seller = seller
|
||||
self.amount = amount
|
||||
```
|
||||
|
||||
### Use case: Service recommends a budget
|
||||
|
||||
To start, we could use a generic budget template that allocates category amounts based on income tiers. Using this approach, we would not have to store the 100 million budget items identified in the constraints, only those that the user overrides. If a user overrides a budget category, which we could store the override in the `TABLE budget_overrides`.
|
||||
|
||||
```python
|
||||
class Budget(object):
|
||||
|
||||
def __init__(self, income):
|
||||
self.income = income
|
||||
self.categories_to_budget_map = self.create_budget_template()
|
||||
|
||||
def create_budget_template(self):
|
||||
return {
|
||||
DefaultCategories.HOUSING: self.income * .4,
|
||||
DefaultCategories.FOOD: self.income * .2,
|
||||
DefaultCategories.GAS: self.income * .1,
|
||||
DefaultCategories.SHOPPING: self.income * .2,
|
||||
...
|
||||
}
|
||||
|
||||
def override_category_budget(self, category, amount):
|
||||
self.categories_to_budget_map[category] = amount
|
||||
```
|
||||
|
||||
For the **Budget Service**, we can potentially run SQL queries on the `transactions` table to generate the `monthly_spending` aggregate table. The `monthly_spending` table would likely have much fewer rows than the total 5 billion transactions, since users typically have many transactions per month.
|
||||
|
||||
As an alternative, we can run **MapReduce** jobs on the raw transaction files to:
|
||||
|
||||
* Categorize each transaction
|
||||
* Generate aggregate monthly spending by category
|
||||
|
||||
Running analyses on the transaction files could significantly reduce the load on the database.
|
||||
|
||||
We could call the **Budget Service** to re-run the analysis if the user updates a category.
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
Sample log file format, tab delimited:
|
||||
|
||||
```
|
||||
user_id timestamp seller amount
|
||||
```
|
||||
|
||||
**MapReduce** implementation:
|
||||
|
||||
```python
|
||||
class SpendingByCategory(MRJob):
|
||||
|
||||
def __init__(self, categorizer):
|
||||
self.categorizer = categorizer
|
||||
self.current_year_month = calc_current_year_month()
|
||||
...
|
||||
|
||||
def calc_current_year_month(self):
|
||||
"""Return the current year and month."""
|
||||
...
|
||||
|
||||
def extract_year_month(self, timestamp):
|
||||
"""Return the year and month portions of the timestamp."""
|
||||
...
|
||||
|
||||
def handle_budget_notifications(self, key, total):
|
||||
"""Call notification API if nearing or exceeded budget."""
|
||||
...
|
||||
|
||||
def mapper(self, _, line):
|
||||
"""Parse each log line, extract and transform relevant lines.
|
||||
|
||||
Argument line will be of the form:
|
||||
|
||||
user_id timestamp seller amount
|
||||
|
||||
Using the categorizer to convert seller to category,
|
||||
emit key value pairs of the form:
|
||||
|
||||
(user_id, 2016-01, shopping), 25
|
||||
(user_id, 2016-01, shopping), 100
|
||||
(user_id, 2016-01, gas), 50
|
||||
"""
|
||||
user_id, timestamp, seller, amount = line.split('\t')
|
||||
category = self.categorizer.categorize(seller)
|
||||
period = self.extract_year_month(timestamp)
|
||||
if period == self.current_year_month:
|
||||
yield (user_id, period, category), amount
|
||||
|
||||
def reducer(self, key, value):
|
||||
"""Sum values for each key.
|
||||
|
||||
(user_id, 2016-01, shopping), 125
|
||||
(user_id, 2016-01, gas), 50
|
||||
"""
|
||||
total = sum(values)
|
||||
yield key, sum(values)
|
||||
```
|
||||
|
||||
## Step 4: Scale the design
|
||||
|
||||
> Identify and address bottlenecks, given the constraints.
|
||||
|
||||

|
||||
|
||||
**Important: Do not simply jump right into the final design from the initial design!**
|
||||
|
||||
State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
|
||||
|
||||
It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
|
||||
|
||||
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
|
||||
|
||||
*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
|
||||
|
||||
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
|
||||
* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
|
||||
* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
|
||||
* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
|
||||
* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
|
||||
* [Cache](https://github.com/donnemartin/system-design-primer#cache)
|
||||
* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
|
||||
* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
|
||||
* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
|
||||
* [Asynchronism](https://github.com/donnemartin/system-design-primer#asynchronism)
|
||||
* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
|
||||
* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
|
||||
|
||||
We'll add an additional use case: **User** accesses summaries and transactions.
|
||||
|
||||
User sessions, aggregate stats by category, and recent transactions could be placed in a **Memory Cache** such as Redis or Memcached.
|
||||
|
||||
* The **Client** sends a read request to the **Web Server**
|
||||
* The **Web Server** forwards the request to the **Read API** server
|
||||
* Static content can be served from the **Object Store** such as S3, which is cached on the **CDN**
|
||||
* The **Read API** server does the following:
|
||||
* Checks the **Memory Cache** for the content
|
||||
* If the url is in the **Memory Cache**, returns the cached contents
|
||||
* Else
|
||||
* If the url is in the **SQL Database**, fetches the contents
|
||||
* Updates the **Memory Cache** with the contents
|
||||
|
||||
Refer to [When to update the cache](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) for tradeoffs and alternatives. The approach above describes [cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside).
|
||||
|
||||
Instead of keeping the `monthly_spending` aggregate table in the **SQL Database**, we could create a separate **Analytics Database** using a data warehousing solution such as Amazon Redshift or Google BigQuery.
|
||||
|
||||
We might only want to store a month of `transactions` data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 250 GB of new content per month.
|
||||
|
||||
To address the 200 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
|
||||
|
||||
2,000 *average* transaction writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**. We might need to employ additional SQL scaling patterns:
|
||||
|
||||
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
|
||||
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
|
||||
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
|
||||
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
|
||||
|
||||
We should also consider moving some data to a **NoSQL Database**.
|
||||
|
||||
## Additional talking points
|
||||
|
||||
> Additional topics to dive into, depending on the problem scope and time remaining.
|
||||
|
||||
#### NoSQL
|
||||
|
||||
* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
|
||||
* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
|
||||
* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
|
||||
* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
|
||||
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
|
||||
|
||||
### Caching
|
||||
|
||||
* Where to cache
|
||||
* [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
|
||||
* [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
|
||||
* [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
|
||||
* [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
|
||||
* [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
|
||||
* What to cache
|
||||
* [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
|
||||
* [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
|
||||
* When to update the cache
|
||||
* [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
|
||||
* [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
|
||||
* [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
|
||||
* [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
|
||||
|
||||
### Asynchronism and microservices
|
||||
|
||||
* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
|
||||
* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
|
||||
* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
|
||||
* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
|
||||
|
||||
### Communications
|
||||
|
||||
* Discuss tradeoffs:
|
||||
* External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
|
||||
* Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
|
||||
* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
|
||||
|
||||
### Security
|
||||
|
||||
Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
|
||||
|
||||
### Latency numbers
|
||||
|
||||
See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
|
||||
|
||||
### Ongoing
|
||||
|
||||
* Continue benchmarking and monitoring your system to address bottlenecks as they come up
|
||||
* Scaling is an iterative process
|
After Width: | Height: | Size: 290 KiB |
After Width: | Height: | Size: 119 KiB |
@ -0,0 +1,57 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
from mrjob.job import MRJob
|
||||
|
||||
|
||||
class SpendingByCategory(MRJob):
|
||||
|
||||
def __init__(self, categorizer):
|
||||
self.categorizer = categorizer
|
||||
...
|
||||
|
||||
def current_year_month(self):
|
||||
"""Return the current year and month."""
|
||||
...
|
||||
|
||||
def extract_year_month(self, timestamp):
|
||||
"""Return the year and month portions of the timestamp."""
|
||||
...
|
||||
|
||||
def handle_budget_notifications(self, key, total):
|
||||
"""Call notification API if nearing or exceeded budget."""
|
||||
...
|
||||
|
||||
def mapper(self, _, line):
|
||||
"""Parse each log line, extract and transform relevant lines.
|
||||
|
||||
Emit key value pairs of the form:
|
||||
|
||||
(2016-01, shopping), 25
|
||||
(2016-01, shopping), 100
|
||||
(2016-01, gas), 50
|
||||
"""
|
||||
timestamp, category, amount = line.split('\t')
|
||||
period = self. extract_year_month(timestamp)
|
||||
if period == self.current_year_month():
|
||||
yield (period, category), amount
|
||||
|
||||
def reducer(self, key, values):
|
||||
"""Sum values for each key.
|
||||
|
||||
(2016-01, shopping), 125
|
||||
(2016-01, gas), 50
|
||||
"""
|
||||
total = sum(values)
|
||||
self.handle_budget_notifications(key, total)
|
||||
yield key, sum(values)
|
||||
|
||||
def steps(self):
|
||||
"""Run the map and reduce steps."""
|
||||
return [
|
||||
self.mr(mapper=self.mapper,
|
||||
reducer=self.reducer)
|
||||
]
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
SpendingByCategory.run()
|
@ -0,0 +1,50 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class DefaultCategories(Enum):
|
||||
|
||||
HOUSING = 0
|
||||
FOOD = 1
|
||||
GAS = 2
|
||||
SHOPPING = 3
|
||||
# ...
|
||||
|
||||
|
||||
seller_category_map = {}
|
||||
seller_category_map['Exxon'] = DefaultCategories.GAS
|
||||
seller_category_map['Target'] = DefaultCategories.SHOPPING
|
||||
|
||||
|
||||
class Categorizer(object):
|
||||
|
||||
def __init__(self, seller_category_map, seller_category_overrides_map):
|
||||
self.seller_category_map = seller_category_map
|
||||
self.seller_category_overrides_map = seller_category_overrides_map
|
||||
|
||||
def categorize(self, transaction):
|
||||
if transaction.seller in self.seller_category_map:
|
||||
return self.seller_category_map[transaction.seller]
|
||||
if transaction.seller in self.seller_category_overrides_map:
|
||||
seller_category_map[transaction.seller] = \
|
||||
self.manual_overrides[transaction.seller].peek_min()
|
||||
return self.seller_category_map[transaction.seller]
|
||||
return None
|
||||
|
||||
|
||||
class Transaction(object):
|
||||
|
||||
def __init__(self, timestamp, seller, amount):
|
||||
self.timestamp = timestamp
|
||||
self.seller = seller
|
||||
self.amount = amount
|
||||
|
||||
|
||||
class Budget(object):
|
||||
|
||||
def __init__(self, template_categories_to_budget_map):
|
||||
self.categories_to_budget_map = template_categories_to_budget_map
|
||||
|
||||
def override_category_budget(self, category, amount):
|
||||
self.categories_to_budget_map[category] = amount
|
@ -0,0 +1,332 @@
|
||||
# Design Pastebin.com (or Bit.ly)
|
||||
|
||||
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
|
||||
|
||||
**Design Bit.ly** - is a similar question, except pastebin requires storing the paste contents instead of the original unshortened url.
|
||||
|
||||
## Step 1: Outline use cases and constraints
|
||||
|
||||
> Gather requirements and scope the problem.
|
||||
> Ask questions to clarify use cases and constraints.
|
||||
> Discuss assumptions.
|
||||
|
||||
Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
|
||||
|
||||
### Use cases
|
||||
|
||||
#### We'll scope the problem to handle only the following use cases
|
||||
|
||||
* **User** enters a block of text and gets a randomly generated link
|
||||
* Expiration
|
||||
* Default setting does not expire
|
||||
* Can optionally set a timed expiration
|
||||
* **User** enters a paste's url and views the contents
|
||||
* **User** is anonymous
|
||||
* **Service** tracks analytics of pages
|
||||
* Monthly visit stats
|
||||
* **Service** deletes expired pastes
|
||||
* **Service** has high availability
|
||||
|
||||
#### Out of scope
|
||||
|
||||
* **User** registers for an account
|
||||
* **User** verifies email
|
||||
* **User** logs into a registered account
|
||||
* **User** edits the document
|
||||
* **User** can set visibility
|
||||
* **User** can set the shortlink
|
||||
|
||||
### Constraints and assumptions
|
||||
|
||||
#### State assumptions
|
||||
|
||||
* Traffic is not evenly distributed
|
||||
* Following a short link should be fast
|
||||
* Pastes are text only
|
||||
* Page view analytics do not need to be realtime
|
||||
* 10 million users
|
||||
* 10 million paste writes per month
|
||||
* 100 million paste reads per month
|
||||
* 10:1 read to write ratio
|
||||
|
||||
#### Calculate usage
|
||||
|
||||
**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
|
||||
|
||||
* Size per paste
|
||||
* 1 KB content per paste
|
||||
* `shortlink` - 7 bytes
|
||||
* `expiration_length_in_minutes` - 4 bytes
|
||||
* `created_at` - 5 bytes
|
||||
* `paste_path` - 255 bytes
|
||||
* total = ~1.27 KB
|
||||
* 12.7 GB of new paste content per month
|
||||
* 1.27 KB per paste * 10 million pastes per month
|
||||
* ~450 GB of new paste content in 3 years
|
||||
* 360 million shortlinks in 3 years
|
||||
* Assume most are new pastes instead of updates to existing ones
|
||||
* 4 paste writes per second on average
|
||||
* 40 read requests per second on average
|
||||
|
||||
Handy conversion guide:
|
||||
|
||||
* 2.5 million seconds per month
|
||||
* 1 request per second = 2.5 million requests per month
|
||||
* 40 requests per second = 100 million requests per month
|
||||
* 400 requests per second = 1 billion requests per month
|
||||
|
||||
## Step 2: Create a high level design
|
||||
|
||||
> Outline a high level design with all important components.
|
||||
|
||||

|
||||
|
||||
## Step 3: Design core components
|
||||
|
||||
> Dive into details for each core component.
|
||||
|
||||
### Use case: User enters a block of text and gets a randomly generated link
|
||||
|
||||
We could use a [relational database](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms) as a large hash table, mapping the generated url to a file server and path containing the paste file.
|
||||
|
||||
Instead of managing a file server, we could use a managed **Object Store** such as Amazon S3 or a [NoSQL document store](https://github.com/donnemartin/system-design-primer#document-store).
|
||||
|
||||
An alternative to a relational database acting as a large hash table, we could use a [NoSQL key-value store](https://github.com/donnemartin/system-design-primer#key-value-store). We should discuss the [tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql). The following discussion uses the relational database approach.
|
||||
|
||||
* The **Client** sends a create paste request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* The **Web Server** forwards the request to the **Write API** server
|
||||
* The **Write API** server does the following:
|
||||
* Generates a unique url
|
||||
* Checks if the url is unique by looking at the **SQL Database** for a duplicate
|
||||
* If the url is not unique, it generates another url
|
||||
* If we supported a custom url, we could use the user-supplied (also check for a duplicate)
|
||||
* Saves to the **SQL Database** `pastes` table
|
||||
* Saves the paste data to the **Object Store**
|
||||
* Returns the url
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
The `pastes` table could have the following structure:
|
||||
|
||||
```
|
||||
shortlink char(7) NOT NULL
|
||||
expiration_length_in_minutes int NOT NULL
|
||||
created_at datetime NOT NULL
|
||||
paste_path varchar(255) NOT NULL
|
||||
PRIMARY KEY(shortlink)
|
||||
```
|
||||
|
||||
Setting the primary key to be based on the `shortlink` column creates an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) that the database uses to enforce uniqueness. We'll create an additional index on `created_at` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
|
||||
|
||||
To generate the unique url, we could:
|
||||
|
||||
* Take the [**MD5**](https://en.wikipedia.org/wiki/MD5) hash of the user's ip_address + timestamp
|
||||
* MD5 is a widely used hashing function that produces a 128-bit hash value
|
||||
* MD5 is uniformly distributed
|
||||
* Alternatively, we could also take the MD5 hash of randomly-generated data
|
||||
* [**Base 62**](https://www.kerstner.at/2012/07/shortening-strings-using-base-62-encoding/) encode the MD5 hash
|
||||
* Base 62 encodes to `[a-zA-Z0-9]` which works well for urls, eliminating the need for escaping special characters
|
||||
* There is only one hash result for the original input and Base 62 is deterministic (no randomness involved)
|
||||
* Base 64 is another popular encoding but provides issues for urls because of the additional `+` and `/` characters
|
||||
* The following [Base 62 pseudocode](http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener) runs in O(k) time where k is the number of digits = 7:
|
||||
|
||||
```python
|
||||
def base_encode(num, base=62):
|
||||
digits = []
|
||||
while num > 0
|
||||
remainder = modulo(num, base)
|
||||
digits.push(remainder)
|
||||
num = divide(num, base)
|
||||
digits = digits.reverse
|
||||
```
|
||||
|
||||
* Take the first 7 characters of the output, which results in 62^7 possible values and should be sufficient to handle our constraint of 360 million shortlinks in 3 years:
|
||||
|
||||
```python
|
||||
url = base_encode(md5(ip_address+timestamp))[:URL_LENGTH]
|
||||
```
|
||||
|
||||
We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
|
||||
|
||||
```
|
||||
$ curl -X POST --data '{ "expiration_length_in_minutes": "60", \
|
||||
"paste_contents": "Hello World!" }' https://pastebin.com/api/v1/paste
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```
|
||||
{
|
||||
"shortlink": "foobar"
|
||||
}
|
||||
```
|
||||
|
||||
For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
|
||||
|
||||
### Use case: User enters a paste's url and views the contents
|
||||
|
||||
* The **Client** sends a get paste request to the **Web Server**
|
||||
* The **Web Server** forwards the request to the **Read API** server
|
||||
* The **Read API** server does the following:
|
||||
* Checks the **SQL Database** for the generated url
|
||||
* If the url is in the **SQL Database**, fetch the paste contents from the **Object Store**
|
||||
* Else, return an error message for the user
|
||||
|
||||
REST API:
|
||||
|
||||
```
|
||||
$ curl https://pastebin.com/api/v1/paste?shortlink=foobar
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```
|
||||
{
|
||||
"paste_contents": "Hello World"
|
||||
"created_at": "YYYY-MM-DD HH:MM:SS"
|
||||
"expiration_length_in_minutes": "60"
|
||||
}
|
||||
```
|
||||
|
||||
### Use case: Service tracks analytics of pages
|
||||
|
||||
Since realtime analytics are not a requirement, we could simply **MapReduce** the **Web Server** logs to generate hit counts.
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
```python
|
||||
class HitCounts(MRJob):
|
||||
|
||||
def extract_url(self, line):
|
||||
"""Extract the generated url from the log line."""
|
||||
...
|
||||
|
||||
def extract_year_month(self, line):
|
||||
"""Return the year and month portions of the timestamp."""
|
||||
...
|
||||
|
||||
def mapper(self, _, line):
|
||||
"""Parse each log line, extract and transform relevant lines.
|
||||
|
||||
Emit key value pairs of the form:
|
||||
|
||||
(2016-01, url0), 1
|
||||
(2016-01, url0), 1
|
||||
(2016-01, url1), 1
|
||||
"""
|
||||
url = self.extract_url(line)
|
||||
period = self.extract_year_month(line)
|
||||
yield (period, url), 1
|
||||
|
||||
def reducer(self, key, values):
|
||||
"""Sum values for each key.
|
||||
|
||||
(2016-01, url0), 2
|
||||
(2016-01, url1), 1
|
||||
"""
|
||||
yield key, sum(values)
|
||||
```
|
||||
|
||||
### Use case: Service deletes expired pastes
|
||||
|
||||
To delete expired pastes, we could just scan the **SQL Database** for all entries whose expiration timestamp are older than the current timestamp. All expired entries would then be deleted (or marked as expired) from the table.
|
||||
|
||||
## Step 4: Scale the design
|
||||
|
||||
> Identify and address bottlenecks, given the constraints.
|
||||
|
||||

|
||||
|
||||
**Important: Do not simply jump right into the final design from the initial design!**
|
||||
|
||||
State you would do this iteratively: 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
|
||||
|
||||
It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
|
||||
|
||||
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
|
||||
|
||||
*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
|
||||
|
||||
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
|
||||
* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
|
||||
* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
|
||||
* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
|
||||
* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
|
||||
* [Cache](https://github.com/donnemartin/system-design-primer#cache)
|
||||
* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
|
||||
* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
|
||||
* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
|
||||
* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
|
||||
* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
|
||||
|
||||
The **Analytics Database** could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
|
||||
|
||||
An **Object Store** such as Amazon S3 can comfortably handle the constraint of 12.7 GB of new content per month.
|
||||
|
||||
To address the 40 *average* read requests per second (higher at peak), traffic for popular content should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. The **SQL Read Replicas** should be able to handle the cache misses, as long as the replicas are not bogged down with replicating writes.
|
||||
|
||||
4 *average* paste writes per second (with higher at peak) should be do-able for a single **SQL Write Master-Slave**. Otherwise, we'll need to employ additional SQL scaling patterns:
|
||||
|
||||
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
|
||||
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
|
||||
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
|
||||
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
|
||||
|
||||
We should also consider moving some data to a **NoSQL Database**.
|
||||
|
||||
## Additional talking points
|
||||
|
||||
> Additional topics to dive into, depending on the problem scope and time remaining.
|
||||
|
||||
#### NoSQL
|
||||
|
||||
* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
|
||||
* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
|
||||
* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
|
||||
* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
|
||||
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
|
||||
|
||||
### Caching
|
||||
|
||||
* Where to cache
|
||||
* [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
|
||||
* [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
|
||||
* [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
|
||||
* [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
|
||||
* [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
|
||||
* What to cache
|
||||
* [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
|
||||
* [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
|
||||
* When to update the cache
|
||||
* [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
|
||||
* [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
|
||||
* [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
|
||||
* [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
|
||||
|
||||
### Asynchronism and microservices
|
||||
|
||||
* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
|
||||
* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
|
||||
* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
|
||||
* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
|
||||
|
||||
### Communications
|
||||
|
||||
* Discuss tradeoffs:
|
||||
* External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
|
||||
* Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
|
||||
* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
|
||||
|
||||
### Security
|
||||
|
||||
Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
|
||||
|
||||
### Latency numbers
|
||||
|
||||
See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
|
||||
|
||||
### Ongoing
|
||||
|
||||
* Continue benchmarking and monitoring your system to address bottlenecks as they come up
|
||||
* Scaling is an iterative process
|
After Width: | Height: | Size: 211 KiB |
@ -0,0 +1,46 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
from mrjob.job import MRJob
|
||||
|
||||
|
||||
class HitCounts(MRJob):
|
||||
|
||||
def extract_url(self, line):
|
||||
"""Extract the generated url from the log line."""
|
||||
pass
|
||||
|
||||
def extract_year_month(self, line):
|
||||
"""Return the year and month portions of the timestamp."""
|
||||
pass
|
||||
|
||||
def mapper(self, _, line):
|
||||
"""Parse each log line, extract and transform relevant lines.
|
||||
|
||||
Emit key value pairs of the form:
|
||||
|
||||
(2016-01, url0), 1
|
||||
(2016-01, url0), 1
|
||||
(2016-01, url1), 1
|
||||
"""
|
||||
url = self.extract_url(line)
|
||||
period = self.extract_year_month(line)
|
||||
yield (period, url), 1
|
||||
|
||||
def reducer(self, key, values):
|
||||
"""Sum values for each key.
|
||||
|
||||
(2016-01, url0), 2
|
||||
(2016-01, url1), 1
|
||||
"""
|
||||
yield key, sum(values)
|
||||
|
||||
def steps(self):
|
||||
"""Run the map and reduce steps."""
|
||||
return [
|
||||
self.mr(mapper=self.mapper,
|
||||
reducer=self.reducer)
|
||||
]
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
HitCounts.run()
|
After Width: | Height: | Size: 83 KiB |
@ -0,0 +1,306 @@
|
||||
# Design a key-value cache to save the results of the most recent web server queries
|
||||
|
||||
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
|
||||
|
||||
## Step 1: Outline use cases and constraints
|
||||
|
||||
> Gather requirements and scope the problem.
|
||||
> Ask questions to clarify use cases and constraints.
|
||||
> Discuss assumptions.
|
||||
|
||||
Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
|
||||
|
||||
### Use cases
|
||||
|
||||
#### We'll scope the problem to handle only the following use cases
|
||||
|
||||
* **User** sends a search request resulting in a cache hit
|
||||
* **User** sends a search request resulting in a cache miss
|
||||
* **Service** has high availability
|
||||
|
||||
### Constraints and assumptions
|
||||
|
||||
#### State assumptions
|
||||
|
||||
* Traffic is not evenly distributed
|
||||
* Popular queries should almost always be in the cache
|
||||
* Need to determine how to expire/refresh
|
||||
* Serving from cache requires fast lookups
|
||||
* Low latency between machines
|
||||
* Limited memory in cache
|
||||
* Need to determine what to keep/remove
|
||||
* Need to cache millions of queries
|
||||
* 10 million users
|
||||
* 10 billion queries per month
|
||||
|
||||
#### Calculate usage
|
||||
|
||||
**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
|
||||
|
||||
* Cache stores ordered list of key: query, value: results
|
||||
* `query` - 50 bytes
|
||||
* `title` - 20 bytes
|
||||
* `snippet` - 200 bytes
|
||||
* Total: 270 bytes
|
||||
* 2.7 TB of cache data per month if all 10 billion queries are unique and all are stored
|
||||
* 270 bytes per search * 10 billion searches per month
|
||||
* Assumptions state limited memory, need to determine how to expire contents
|
||||
* 4,000 requests per second
|
||||
|
||||
Handy conversion guide:
|
||||
|
||||
* 2.5 million seconds per month
|
||||
* 1 request per second = 2.5 million requests per month
|
||||
* 40 requests per second = 100 million requests per month
|
||||
* 400 requests per second = 1 billion requests per month
|
||||
|
||||
## Step 2: Create a high level design
|
||||
|
||||
> Outline a high level design with all important components.
|
||||
|
||||

|
||||
|
||||
## Step 3: Design core components
|
||||
|
||||
> Dive into details for each core component.
|
||||
|
||||
### Use case: User sends a request resulting in a cache hit
|
||||
|
||||
Popular queries can be served from a **Memory Cache** such as Redis or Memcached to reduce read latency and to avoid overloading the **Reverse Index Service** and **Document Service**. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
|
||||
|
||||
Since the cache has limited capacity, we'll use a least recently used (LRU) approach to expire older entries.
|
||||
|
||||
* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* The **Web Server** forwards the request to the **Query API** server
|
||||
* The **Query API** server does the following:
|
||||
* Parses the query
|
||||
* Removes markup
|
||||
* Breaks up the text into terms
|
||||
* Fixes typos
|
||||
* Normalizes capitalization
|
||||
* Converts the query to use boolean operations
|
||||
* Checks the **Memory Cache** for the content matching the query
|
||||
* If there's a hit in the **Memory Cache**, the **Memory Cache** does the following:
|
||||
* Updates the cached entry's position to the front of the LRU list
|
||||
* Returns the cached contents
|
||||
* Else, the **Query API** does the following:
|
||||
* Uses the **Reverse Index Service** to find documents matching the query
|
||||
* The **Reverse Index Service** ranks the matching results and returns the top ones
|
||||
* Uses the **Document Service** to return titles and snippets
|
||||
* Updates the **Memory Cache** with the contents, placing the entry at the front of the LRU list
|
||||
|
||||
#### Cache implementation
|
||||
|
||||
The cache can use a doubly-linked list: new items will be added to the head while items to expire will be removed from the tail. We'll use a hash table for fast lookups to each linked list node.
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
**Query API Server** implementation:
|
||||
|
||||
```python
|
||||
class QueryApi(object):
|
||||
|
||||
def __init__(self, memory_cache, reverse_index_service):
|
||||
self.memory_cache = memory_cache
|
||||
self.reverse_index_service = reverse_index_service
|
||||
|
||||
def parse_query(self, query):
|
||||
"""Remove markup, break text into terms, deal with typos,
|
||||
normalize capitalization, convert to use boolean operations.
|
||||
"""
|
||||
...
|
||||
|
||||
def process_query(self, query):
|
||||
query = self.parse_query(query)
|
||||
results = self.memory_cache.get(query)
|
||||
if results is None:
|
||||
results = self.reverse_index_service.process_search(query)
|
||||
self.memory_cache.set(query, results)
|
||||
return results
|
||||
```
|
||||
|
||||
**Node** implementation:
|
||||
|
||||
```python
|
||||
class Node(object):
|
||||
|
||||
def __init__(self, query, results):
|
||||
self.query = query
|
||||
self.results = results
|
||||
```
|
||||
|
||||
**LinkedList** implementation:
|
||||
|
||||
```python
|
||||
class LinkedList(object):
|
||||
|
||||
def __init__(self):
|
||||
self.head = None
|
||||
self.tail = None
|
||||
|
||||
def move_to_front(self, node):
|
||||
...
|
||||
|
||||
def append_to_front(self, node):
|
||||
...
|
||||
|
||||
def remove_from_tail(self):
|
||||
...
|
||||
```
|
||||
|
||||
**Cache** implementation:
|
||||
|
||||
```python
|
||||
class Cache(object):
|
||||
|
||||
def __init__(self, MAX_SIZE):
|
||||
self.MAX_SIZE = MAX_SIZE
|
||||
self.size = 0
|
||||
self.lookup = {} # key: query, value: node
|
||||
self.linked_list = LinkedList()
|
||||
|
||||
def get(self, query)
|
||||
"""Get the stored query result from the cache.
|
||||
|
||||
Accessing a node updates its position to the front of the LRU list.
|
||||
"""
|
||||
node = self.lookup[query]
|
||||
if node is None:
|
||||
return None
|
||||
self.linked_list.move_to_front(node)
|
||||
return node.results
|
||||
|
||||
def set(self, results, query):
|
||||
"""Set the result for the given query key in the cache.
|
||||
|
||||
When updating an entry, updates its position to the front of the LRU list.
|
||||
If the entry is new and the cache is at capacity, removes the oldest entry
|
||||
before the new entry is added.
|
||||
"""
|
||||
node = self.lookup[query]
|
||||
if node is not None:
|
||||
# Key exists in cache, update the value
|
||||
node.results = results
|
||||
self.linked_list.move_to_front(node)
|
||||
else:
|
||||
# Key does not exist in cache
|
||||
if self.size == self.MAX_SIZE:
|
||||
# Remove the oldest entry from the linked list and lookup
|
||||
self.lookup.pop(self.linked_list.tail.query, None)
|
||||
self.linked_list.remove_from_tail()
|
||||
else:
|
||||
self.size += 1
|
||||
# Add the new key and value
|
||||
new_node = Node(query, results)
|
||||
self.linked_list.append_to_front(new_node)
|
||||
self.lookup[query] = new_node
|
||||
```
|
||||
|
||||
#### When to update the cache
|
||||
|
||||
The cache should be updated when:
|
||||
|
||||
* The page contents change
|
||||
* The page is removed or a new page is added
|
||||
* The page rank changes
|
||||
|
||||
The most straightforward way to handle these cases is to simply set a max time that a cached entry can stay in the cache before it is updated, usually referred to as time to live (TTL).
|
||||
|
||||
Refer to [When to update the cache](https://github.com/donnemartin/system-design-primer#when-to-update-the-cache) for tradeoffs and alternatives. The approach above describes [cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside).
|
||||
|
||||
## Step 4: Scale the design
|
||||
|
||||
> Identify and address bottlenecks, given the constraints.
|
||||
|
||||

|
||||
|
||||
**Important: Do not simply jump right into the final design from the initial design!**
|
||||
|
||||
State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
|
||||
|
||||
It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
|
||||
|
||||
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
|
||||
|
||||
*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
|
||||
|
||||
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
|
||||
* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
|
||||
* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
|
||||
* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
|
||||
* [Cache](https://github.com/donnemartin/system-design-primer#cache)
|
||||
* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
|
||||
* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
|
||||
|
||||
### Expanding the Memory Cache to many machines
|
||||
|
||||
To handle the heavy request load and the large amount of memory needed, we'll scale horizontally. We have three main options on how to store the data on our **Memory Cache** cluster:
|
||||
|
||||
* **Each machine in the cache cluster has its own cache** - Simple, although it will likely result in a low cache hit rate.
|
||||
* **Each machine in the cache cluster has a copy of the cache** - Simple, although it is an inefficient use of memory.
|
||||
* **The cache is [sharded](https://github.com/donnemartin/system-design-primer#sharding) across all machines in the cache cluster** - More complex, although it is likely the best option. We could use hashing to determine which machine could have the cached results of a query using `machine = hash(query)`. We'll likely want to use [consistent hashing](https://github.com/donnemartin/system-design-primer#under-development).
|
||||
|
||||
## Additional talking points
|
||||
|
||||
> Additional topics to dive into, depending on the problem scope and time remaining.
|
||||
|
||||
### SQL scaling patterns
|
||||
|
||||
* [Read replicas](https://github.com/donnemartin/system-design-primer#master-slave-replication)
|
||||
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
|
||||
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
|
||||
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
|
||||
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
|
||||
|
||||
#### NoSQL
|
||||
|
||||
* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
|
||||
* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
|
||||
* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
|
||||
* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
|
||||
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
|
||||
|
||||
### Caching
|
||||
|
||||
* Where to cache
|
||||
* [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
|
||||
* [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
|
||||
* [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
|
||||
* [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
|
||||
* [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
|
||||
* What to cache
|
||||
* [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
|
||||
* [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
|
||||
* When to update the cache
|
||||
* [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
|
||||
* [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
|
||||
* [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
|
||||
* [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
|
||||
|
||||
### Asynchronism and microservices
|
||||
|
||||
* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
|
||||
* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
|
||||
* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
|
||||
* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
|
||||
|
||||
### Communications
|
||||
|
||||
* Discuss tradeoffs:
|
||||
* External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
|
||||
* Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
|
||||
* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
|
||||
|
||||
### Security
|
||||
|
||||
Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
|
||||
|
||||
### Latency numbers
|
||||
|
||||
See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
|
||||
|
||||
### Ongoing
|
||||
|
||||
* Continue benchmarking and monitoring your system to address bottlenecks as they come up
|
||||
* Scaling is an iterative process
|
After Width: | Height: | Size: 108 KiB |
After Width: | Height: | Size: 60 KiB |
@ -0,0 +1,90 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
|
||||
class QueryApi(object):
|
||||
|
||||
def __init__(self, memory_cache, reverse_index_cluster):
|
||||
self.memory_cache = memory_cache
|
||||
self.reverse_index_cluster = reverse_index_cluster
|
||||
|
||||
def parse_query(self, query):
|
||||
"""Remove markup, break text into terms, deal with typos,
|
||||
normalize capitalization, convert to use boolean operations.
|
||||
"""
|
||||
...
|
||||
|
||||
def process_query(self, query):
|
||||
query = self.parse_query(query)
|
||||
results = self.memory_cache.get(query)
|
||||
if results is None:
|
||||
results = self.reverse_index_cluster.process_search(query)
|
||||
self.memory_cache.set(query, results)
|
||||
return results
|
||||
|
||||
|
||||
class Node(object):
|
||||
|
||||
def __init__(self, query, results):
|
||||
self.query = query
|
||||
self.results = results
|
||||
|
||||
|
||||
class LinkedList(object):
|
||||
|
||||
def __init__(self):
|
||||
self.head = None
|
||||
self.tail = None
|
||||
|
||||
def move_to_front(self, node):
|
||||
...
|
||||
|
||||
def append_to_front(self, node):
|
||||
...
|
||||
|
||||
def remove_from_tail(self):
|
||||
...
|
||||
|
||||
|
||||
class Cache(object):
|
||||
|
||||
def __init__(self, MAX_SIZE):
|
||||
self.MAX_SIZE = MAX_SIZE
|
||||
self.size = 0
|
||||
self.lookup = {}
|
||||
self.linked_list = LinkedList()
|
||||
|
||||
def get(self, query):
|
||||
"""Get the stored query result from the cache.
|
||||
|
||||
Accessing a node updates its position to the front of the LRU list.
|
||||
"""
|
||||
node = self.lookup[query]
|
||||
if node is None:
|
||||
return None
|
||||
self.linked_list.move_to_front(node)
|
||||
return node.results
|
||||
|
||||
def set(self, results, query):
|
||||
"""Set the result for the given query key in the cache.
|
||||
|
||||
When updating an entry, updates its position to the front of the LRU list.
|
||||
If the entry is new and the cache is at capacity, removes the oldest entry
|
||||
before the new entry is added.
|
||||
"""
|
||||
node = self.map[query]
|
||||
if node is not None:
|
||||
# Key exists in cache, update the value
|
||||
node.results = results
|
||||
self.linked_list.move_to_front(node)
|
||||
else:
|
||||
# Key does not exist in cache
|
||||
if self.size == self.MAX_SIZE:
|
||||
# Remove the oldest entry from the linked list and lookup
|
||||
self.lookup.pop(self.linked_list.tail.query, None)
|
||||
self.linked_list.remove_from_tail()
|
||||
else:
|
||||
self.size += 1
|
||||
# Add the new key and value
|
||||
new_node = Node(query, results)
|
||||
self.linked_list.append_to_front(new_node)
|
||||
self.lookup[query] = new_node
|
@ -0,0 +1,338 @@
|
||||
# Design Amazon's sales rank by category feature
|
||||
|
||||
*Note: This document links directly to relevant areas found in the [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) to avoid duplication. Refer to the linked content for general talking points, tradeoffs, and alternatives.*
|
||||
|
||||
## Step 1: Outline use cases and constraints
|
||||
|
||||
> Gather requirements and scope the problem.
|
||||
> Ask questions to clarify use cases and constraints.
|
||||
> Discuss assumptions.
|
||||
|
||||
Without an interviewer to address clarifying questions, we'll define some use cases and constraints.
|
||||
|
||||
### Use cases
|
||||
|
||||
#### We'll scope the problem to handle only the following use case
|
||||
|
||||
* **Service** calculates the past week's most popular products by category
|
||||
* **User** views the past week's most popular products by category
|
||||
* **Service** has high availability
|
||||
|
||||
#### Out of scope
|
||||
|
||||
* The general e-commerce site
|
||||
* Design components only for calculating sales rank
|
||||
|
||||
### Constraints and assumptions
|
||||
|
||||
#### State assumptions
|
||||
|
||||
* Traffic is not evenly distributed
|
||||
* Items can be in multiple categories
|
||||
* Items cannot change categories
|
||||
* There are no subcategories ie `foo/bar/baz`
|
||||
* Results must be updated hourly
|
||||
* More popular products might need to be updated more frequently
|
||||
* 10 million products
|
||||
* 1000 categories
|
||||
* 1 billion transactions per month
|
||||
* 100 billion read requests per month
|
||||
* 100:1 read to write ratio
|
||||
|
||||
#### Calculate usage
|
||||
|
||||
**Clarify with your interviewer if you should run back-of-the-envelope usage calculations.**
|
||||
|
||||
* Size per transaction:
|
||||
* `created_at` - 5 bytes
|
||||
* `product_id` - 8 bytes
|
||||
* `category_id` - 4 bytes
|
||||
* `seller_id` - 8 bytes
|
||||
* `buyer_id` - 8 bytes
|
||||
* `quantity` - 4 bytes
|
||||
* `total_price` - 5 bytes
|
||||
* Total: ~40 bytes
|
||||
* 40 GB of new transaction content per month
|
||||
* 40 bytes per transaction * 1 billion transactions per month
|
||||
* 1.44 TB of new transaction content in 3 years
|
||||
* Assume most are new transactions instead of updates to existing ones
|
||||
* 400 transactions per second on average
|
||||
* 40,000 read requests per second on average
|
||||
|
||||
Handy conversion guide:
|
||||
|
||||
* 2.5 million seconds per month
|
||||
* 1 request per second = 2.5 million requests per month
|
||||
* 40 requests per second = 100 million requests per month
|
||||
* 400 requests per second = 1 billion requests per month
|
||||
|
||||
## Step 2: Create a high level design
|
||||
|
||||
> Outline a high level design with all important components.
|
||||
|
||||

|
||||
|
||||
## Step 3: Design core components
|
||||
|
||||
> Dive into details for each core component.
|
||||
|
||||
### Use case: Service calculates the past week's most popular products by category
|
||||
|
||||
We could store the raw **Sales API** server log files on a managed **Object Store** such as Amazon S3, rather than managing our own distributed file system.
|
||||
|
||||
**Clarify with your interviewer how much code you are expected to write**.
|
||||
|
||||
We'll assume this is a sample log entry, tab delimited:
|
||||
|
||||
```
|
||||
timestamp product_id category_id qty total_price seller_id buyer_id
|
||||
t1 product1 category1 2 20.00 1 1
|
||||
t2 product1 category2 2 20.00 2 2
|
||||
t2 product1 category2 1 10.00 2 3
|
||||
t3 product2 category1 3 7.00 3 4
|
||||
t4 product3 category2 7 2.00 4 5
|
||||
t5 product4 category1 1 5.00 5 6
|
||||
...
|
||||
```
|
||||
|
||||
The **Sales Rank Service** could use **MapReduce**, using the **Sales API** server log files as input and writing the results to an aggregate table `sales_rank` in a **SQL Database**. We should discuss the [use cases and tradeoffs between choosing SQL or NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql).
|
||||
|
||||
We'll use a multi-step **MapReduce**:
|
||||
|
||||
* **Step 1** - Transform the data to `(category, product_id), sum(quantity)`
|
||||
* **Step 2** - Perform a distributed sort
|
||||
|
||||
```python
|
||||
class SalesRanker(MRJob):
|
||||
|
||||
def within_past_week(self, timestamp):
|
||||
"""Return True if timestamp is within past week, False otherwise."""
|
||||
...
|
||||
|
||||
def mapper(self, _ line):
|
||||
"""Parse each log line, extract and transform relevant lines.
|
||||
|
||||
Emit key value pairs of the form:
|
||||
|
||||
(category1, product1), 2
|
||||
(category2, product1), 2
|
||||
(category2, product1), 1
|
||||
(category1, product2), 3
|
||||
(category2, product3), 7
|
||||
(category1, product4), 1
|
||||
"""
|
||||
timestamp, product_id, category_id, quantity, total_price, seller_id, \
|
||||
buyer_id = line.split('\t')
|
||||
if self.within_past_week(timestamp):
|
||||
yield (category_id, product_id), quantity
|
||||
|
||||
def reducer(self, key, value):
|
||||
"""Sum values for each key.
|
||||
|
||||
(category1, product1), 2
|
||||
(category2, product1), 3
|
||||
(category1, product2), 3
|
||||
(category2, product3), 7
|
||||
(category1, product4), 1
|
||||
"""
|
||||
yield key, sum(values)
|
||||
|
||||
def mapper_sort(self, key, value):
|
||||
"""Construct key to ensure proper sorting.
|
||||
|
||||
Transform key and value to the form:
|
||||
|
||||
(category1, 2), product1
|
||||
(category2, 3), product1
|
||||
(category1, 3), product2
|
||||
(category2, 7), product3
|
||||
(category1, 1), product4
|
||||
|
||||
The shuffle/sort step of MapReduce will then do a
|
||||
distributed sort on the keys, resulting in:
|
||||
|
||||
(category1, 1), product4
|
||||
(category1, 2), product1
|
||||
(category1, 3), product2
|
||||
(category2, 3), product1
|
||||
(category2, 7), product3
|
||||
"""
|
||||
category_id, product_id = key
|
||||
quantity = value
|
||||
yield (category_id, quantity), product_id
|
||||
|
||||
def reducer_identity(self, key, value):
|
||||
yield key, value
|
||||
|
||||
def steps(self):
|
||||
"""Run the map and reduce steps."""
|
||||
return [
|
||||
self.mr(mapper=self.mapper,
|
||||
reducer=self.reducer),
|
||||
self.mr(mapper=self.mapper_sort,
|
||||
reducer=self.reducer_identity),
|
||||
]
|
||||
```
|
||||
|
||||
The result would be the following sorted list, which we could insert into the `sales_rank` table:
|
||||
|
||||
```
|
||||
(category1, 1), product4
|
||||
(category1, 2), product1
|
||||
(category1, 3), product2
|
||||
(category2, 3), product1
|
||||
(category2, 7), product3
|
||||
```
|
||||
|
||||
The `sales_rank` table could have the following structure:
|
||||
|
||||
```
|
||||
id int NOT NULL AUTO_INCREMENT
|
||||
category_id int NOT NULL
|
||||
total_sold int NOT NULL
|
||||
product_id int NOT NULL
|
||||
PRIMARY KEY(id)
|
||||
FOREIGN KEY(category_id) REFERENCES Categories(id)
|
||||
FOREIGN KEY(product_id) REFERENCES Products(id)
|
||||
```
|
||||
|
||||
We'll create an [index](https://github.com/donnemartin/system-design-primer#use-good-indices) on `id `, `category_id`, and `product_id` to speed up lookups (log-time instead of scanning the entire table) and to keep the data in memory. Reading 1 MB sequentially from memory takes about 250 microseconds, while reading from SSD takes 4x and from disk takes 80x longer.<sup><a href=https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know>1</a></sup>
|
||||
|
||||
### Use case: User views the past week's most popular products by category
|
||||
|
||||
* The **Client** sends a request to the **Web Server**, running as a [reverse proxy](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* The **Web Server** forwards the request to the **Read API** server
|
||||
* The **Read API** server reads from the **SQL Database** `sales_rank` table
|
||||
|
||||
We'll use a public [**REST API**](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest):
|
||||
|
||||
```
|
||||
$ curl https://amazon.com/api/v1/popular?category_id=1234
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```
|
||||
{
|
||||
"id": "100",
|
||||
"category_id": "1234",
|
||||
"total_sold": "100000",
|
||||
"product_id": "50",
|
||||
},
|
||||
{
|
||||
"id": "53",
|
||||
"category_id": "1234",
|
||||
"total_sold": "90000",
|
||||
"product_id": "200",
|
||||
},
|
||||
{
|
||||
"id": "75",
|
||||
"category_id": "1234",
|
||||
"total_sold": "80000",
|
||||
"product_id": "3",
|
||||
},
|
||||
```
|
||||
|
||||
For internal communications, we could use [Remote Procedure Calls](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc).
|
||||
|
||||
## Step 4: Scale the design
|
||||
|
||||
> Identify and address bottlenecks, given the constraints.
|
||||
|
||||

|
||||
|
||||
**Important: Do not simply jump right into the final design from the initial design!**
|
||||
|
||||
State you would 1) **Benchmark/Load Test**, 2) **Profile** for bottlenecks 3) address bottlenecks while evaluating alternatives and trade-offs, and 4) repeat. See [Design a system that scales to millions of users on AWS](../scaling_aws/README.md) as a sample on how to iteratively scale the initial design.
|
||||
|
||||
It's important to discuss what bottlenecks you might encounter with the initial design and how you might address each of them. For example, what issues are addressed by adding a **Load Balancer** with multiple **Web Servers**? **CDN**? **Master-Slave Replicas**? What are the alternatives and **Trade-Offs** for each?
|
||||
|
||||
We'll introduce some components to complete the design and to address scalability issues. Internal load balancers are not shown to reduce clutter.
|
||||
|
||||
*To avoid repeating discussions*, refer to the following [system design topics](https://github.com/donnemartin/system-design-primer#index-of-system-design-topics) for main talking points, tradeoffs, and alternatives:
|
||||
|
||||
* [DNS](https://github.com/donnemartin/system-design-primer#domain-name-system)
|
||||
* [CDN](https://github.com/donnemartin/system-design-primer#content-delivery-network)
|
||||
* [Load balancer](https://github.com/donnemartin/system-design-primer#load-balancer)
|
||||
* [Horizontal scaling](https://github.com/donnemartin/system-design-primer#horizontal-scaling)
|
||||
* [Web server (reverse proxy)](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server)
|
||||
* [API server (application layer)](https://github.com/donnemartin/system-design-primer#application-layer)
|
||||
* [Cache](https://github.com/donnemartin/system-design-primer#cache)
|
||||
* [Relational database management system (RDBMS)](https://github.com/donnemartin/system-design-primer#relational-database-management-system-rdbms)
|
||||
* [SQL write master-slave failover](https://github.com/donnemartin/system-design-primer#fail-over)
|
||||
* [Master-slave replication](https://github.com/donnemartin/system-design-primer#master-slave-replication)
|
||||
* [Consistency patterns](https://github.com/donnemartin/system-design-primer#consistency-patterns)
|
||||
* [Availability patterns](https://github.com/donnemartin/system-design-primer#availability-patterns)
|
||||
|
||||
The **Analytics Database** could use a data warehousing solution such as Amazon Redshift or Google BigQuery.
|
||||
|
||||
We might only want to store a limited time period of data in the database, while storing the rest in a data warehouse or in an **Object Store**. An **Object Store** such as Amazon S3 can comfortably handle the constraint of 40 GB of new content per month.
|
||||
|
||||
To address the 40,000 *average* read requests per second (higher at peak), traffic for popular content (and their sales rank) should be handled by the **Memory Cache** instead of the database. The **Memory Cache** is also useful for handling the unevenly distributed traffic and traffic spikes. With the large volume of reads, the **SQL Read Replicas** might not be able to handle the cache misses. We'll probably need to employ additional SQL scaling patterns.
|
||||
|
||||
400 *average* writes per second (higher at peak) might be tough for a single **SQL Write Master-Slave**, also pointing to a need for additional scaling techniques.
|
||||
|
||||
SQL scaling patterns include:
|
||||
|
||||
* [Federation](https://github.com/donnemartin/system-design-primer#federation)
|
||||
* [Sharding](https://github.com/donnemartin/system-design-primer#sharding)
|
||||
* [Denormalization](https://github.com/donnemartin/system-design-primer#denormalization)
|
||||
* [SQL Tuning](https://github.com/donnemartin/system-design-primer#sql-tuning)
|
||||
|
||||
We should also consider moving some data to a **NoSQL Database**.
|
||||
|
||||
## Additional talking points
|
||||
|
||||
> Additional topics to dive into, depending on the problem scope and time remaining.
|
||||
|
||||
#### NoSQL
|
||||
|
||||
* [Key-value store](https://github.com/donnemartin/system-design-primer#key-value-store)
|
||||
* [Document store](https://github.com/donnemartin/system-design-primer#document-store)
|
||||
* [Wide column store](https://github.com/donnemartin/system-design-primer#wide-column-store)
|
||||
* [Graph database](https://github.com/donnemartin/system-design-primer#graph-database)
|
||||
* [SQL vs NoSQL](https://github.com/donnemartin/system-design-primer#sql-or-nosql)
|
||||
|
||||
### Caching
|
||||
|
||||
* Where to cache
|
||||
* [Client caching](https://github.com/donnemartin/system-design-primer#client-caching)
|
||||
* [CDN caching](https://github.com/donnemartin/system-design-primer#cdn-caching)
|
||||
* [Web server caching](https://github.com/donnemartin/system-design-primer#web-server-caching)
|
||||
* [Database caching](https://github.com/donnemartin/system-design-primer#database-caching)
|
||||
* [Application caching](https://github.com/donnemartin/system-design-primer#application-caching)
|
||||
* What to cache
|
||||
* [Caching at the database query level](https://github.com/donnemartin/system-design-primer#caching-at-the-database-query-level)
|
||||
* [Caching at the object level](https://github.com/donnemartin/system-design-primer#caching-at-the-object-level)
|
||||
* When to update the cache
|
||||
* [Cache-aside](https://github.com/donnemartin/system-design-primer#cache-aside)
|
||||
* [Write-through](https://github.com/donnemartin/system-design-primer#write-through)
|
||||
* [Write-behind (write-back)](https://github.com/donnemartin/system-design-primer#write-behind-write-back)
|
||||
* [Refresh ahead](https://github.com/donnemartin/system-design-primer#refresh-ahead)
|
||||
|
||||
### Asynchronism and microservices
|
||||
|
||||
* [Message queues](https://github.com/donnemartin/system-design-primer#message-queues)
|
||||
* [Task queues](https://github.com/donnemartin/system-design-primer#task-queues)
|
||||
* [Back pressure](https://github.com/donnemartin/system-design-primer#back-pressure)
|
||||
* [Microservices](https://github.com/donnemartin/system-design-primer#microservices)
|
||||
|
||||
### Communications
|
||||
|
||||
* Discuss tradeoffs:
|
||||
* External communication with clients - [HTTP APIs following REST](https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest)
|
||||
* Internal communications - [RPC](https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc)
|
||||
* [Service discovery](https://github.com/donnemartin/system-design-primer#service-discovery)
|
||||
|
||||
### Security
|
||||
|
||||
Refer to the [security section](https://github.com/donnemartin/system-design-primer#security).
|
||||
|
||||
### Latency numbers
|
||||
|
||||
See [Latency numbers every programmer should know](https://github.com/donnemartin/system-design-primer#latency-numbers-every-programmer-should-know).
|
||||
|
||||
### Ongoing
|
||||
|
||||
* Continue benchmarking and monitoring your system to address bottlenecks as they come up
|
||||
* Scaling is an iterative process
|