Reddit

By Tingting Tao

Published 2024-05-09

Requirement

Load who I am following, and who follows me
Load all post from a single user
Low latency news feeds
New post is visible to public or close friends
Nested comments

Read >> Write

Java concurrency (Study notes)

By Tingting Tao

Published 2024-05-05

Java concurrency

The Fork/Join model in multi-threaded programming:

(1) Initial setup: The Main Thread
(2) Fork: Spawn new subtasks
(3) Parallel execution
(4) Join: consolidate results
(5) Repeat

Critical Section: shared resources or variables
Race Condition: multiple threads trying to do things.
Synchronization tools: coordinate threads with critical section. includes: Mutexes, Read/Write locks, Semaphores, Condition variables, and Barriers.

How to choose data storage (Study notes)

By Tingting Tao

Published 2024-05-04

There are lots of different options for databases. How can we decide which one to choose? There are couple aspects to consider.

(1) consistency level: ACID or BASE
(2) index: B-tree or LSM-tree
(3) replication

Distributed key value storage overview (Study notes)

By Tingting Tao

Published 2024-04-26

High level components

Will discuss through couple aspects, includes: Index, Replication, Failure detection and Consistency, the last parts will be some existing database example.

Index

A database index is used for speeding up reads based on a specific key. Index will slow down database write and speed up read.

Hash index

Hash index is kept in memory hash table of key mapped to the memory location of the data, occasionally write to disk for persistence. however, it works poorly on disk.

Pros: easy to implement and veryfast (RAM is fast)
Cons: all keys must fit in memory and it is bad for range queries.

Scenarios: it is fast but only useful on small datasets.

Distributed object storage overview (Study notes)

By Tingting Tao

Published 2024-03-07

Block File and Object storage

Block storage: raw blocks attached to a server as a volume. mutable, higher cost and higher performance, however lower scalability casue it could only attached to one server and good for VMs and databases.
File storage: built on top of block storage, higher level of abstraction, handle files and directories, medium to high performance and cost, medium scalability, which provides general purpose file system access, good for sharing files/folders within organization.
Object storage: sacrifice performance for higher durability and vast scalability with low cost. it is generally immutable however version is supported. it targets relatively colder data, access is through Restful apis.

Requirement for object storage

This blog is more about object storage. It provides Restful Apis, includes PUT, GET object.
Business entities: bucket(folder) and object.

Congressional data sampling overview

By Tingting Tao

Published 2024-02-02

Why sampling?

Consider maintaining a highly concurrent service, and with 3k-5k requests per second hitting one server. this will generate a large number of request logs. Among all those request logs, normally data plane apis (data related) have a much larger volume than control plane apis (management related).

We want to understand how healthy the service is running, how healthy every apis are, note that a api with lower volume does not make it less important.

How do we do that? when the request logs data is large, it is often advantageous to choose a smaller subset of data which could summarize the original dataset, this is called sampling. Main idea is to take a statistically significant sample of data and then analyse this sample rather than having to use the whole original data set.

By querying sampling data, the system is able to provide a efficient result which is approximate to the real answer.

读书笔记：免疫力

By Tingting Tao

Published 2023-01-08

日本女性新发癌症病例数排名：乳腺，大肠，肺。其中经常使用特别油腻，含有大量添加剂的视频不利于健康，触发大肠癌。多吃有益于肠道细菌繁殖的食材，比如富含膳食纤维的蔬菜水果和发酵食品有利于大肠健康。

癌是如何形成的？

1癌细胞生成：起始因子（initiator）：人体细胞每天都在新陈代谢，在致癌物质，病毒感染，年龄增长等的作用下，细胞复制更容易出错。包括活性氧，化学物质，紫外线等。
预防：避免烟，辐射，紫外线等，服用抗氧化物质
2促进癌变：促癌因子（promoter）：把变异细胞转化为癌细胞。包括病毒，脂肪和盐分
3癌细胞增殖：NK细胞（natural killer cell）：在体内巡逻，攻击癌细胞

Tradeoffs when growing a large scale distributed system

By Tingting Tao

Published 2022-12-29

A simple load balancer

To start with a very simple point, clients connect to your service through network, a load balancer routing requests to different fleets.

Bash script cheatsheet

By Tingting Tao

Published 2022-12-10

And or list

1	echo "I am going to do sth" && doSomething 2>/dev/null \|\| error cmd (echo $?)`

print the exit code: echo $?
blackhole for information: /dev/null

Cache 101

By Tingting Tao

Published 2022-07-25

Cache helps on availability and resiliency by for example, improving request latency then service is more able to handle incoming traffic. as well as decrease load on downstream dependencies.

On the flip side, cache introduces modal behavior for your service, with differing behavior depending on whether a given object is cached.

Tingting Tao's Blog

Welcome to Tingting's Blog

System Design-Instagram/Twitter/Reddit

Requirement

Java concurrency (Study notes)

Java concurrency

How to choose data storage (Study notes)

Distributed key value storage overview (Study notes)

High level components

Index

Hash index

Distributed object storage overview (Study notes)

Block File and Object storage

Requirement for object storage

Congressional data sampling overview

Why sampling?

读书笔记：免疫力

癌是如何形成的？

Tradeoffs when growing a large scale distributed system

A simple load balancer

Bash script cheatsheet

And or list

Cache 101