这篇文章是中译英,然后英译中 😆。中文互联网的八股文还是出圈了

原文发表在今日头条,我在阅读阮一峰的周刊时发现这篇文章,然后读了 ref,发现是中译英。

Redis is a high-performance, in-memory key-value database. According to official test reports, it can support around 100,000 QPS (queries per second) on a single machine. However, Redis uses a single-threaded architecture in its design.

Redis 是一个高性能的内存键值数据库。根据官方测试报告,它在单台机器上可以支持大约 100,000 QPS(每秒查询数)。然而,Redis 在设计上采用了单线程架构。

Why does Redis still have such high performance with a single-threaded design? Wouldn’t it be better to use multiple threads for concurrent request processing?

为什么 Redis 在单线程设计下仍然具有如此高的性能?使用多线程进行并发请求处理不是更好吗?

In this article, let’s explore why Redis has a single-threaded architecture and still maintains its speed. The focus is on the following four aspects:

在本文中,让我们探讨为什么 Redis 采用单线程架构却仍然保持其速度。重点关注以下四个方面:

  • Data storage in memory

    数据存储在内存中

  • Efficient data structures

    高效数据结构

  • Single-threaded architecture

    单线程架构

  • Non-blocking I/O

    非阻塞 I/O

Let’s analyze each one in detail.

让我们详细分析每一个。

Data storage in memory 数据存储在内存中

Redis is completely based on memory, with data stored in memory. The vast majority of requests are pure memory operations, which are extremely fast. Compared with traditional disk file data storage, Redis avoids the overhead of reading data from disk into memory through disk I/O.

Redis 完全基于内存,数据存储在内存中。绝大多数请求都是纯内存操作,速度极快。与传统的磁盘文件数据存储相比,Redis 避免了通过磁盘 I/O 将数据从磁盘读取到内存的开销。

Efficient data structures 高效数据结构

Redis has a total of 5 data types: StringListHashSet, and SortedSet.

Redis 总共有 5 种数据类型:字符串列表哈希集合  和  有序集合

Different data types use one or more data structures at the bottom to support them, with the aim of achieving faster speeds.

不同的数据类型在底层使用一种或多种数据结构来支持它们,目的是实现更快的速度。

Single-threaded architecture 单线程架构

Using a single thread saves a lot of time on context switching and CPU consumption, there are no race conditions, no need to consider various locking problems, and no locking and unlocking operations that could cause performance overhead due to deadlocks. Additionally, it allows the use of various “thread-unsafe” commands, such as Lpush.

使用单线程可以节省大量的上下文切换和 CPU 消耗,没有竞争条件,不需要考虑各种锁定问题,也没有可能因死锁而导致性能开销的锁定和解锁操作。此外,它允许使用各种“线程不安全”的命令,例如  Lpush

Note that when we emphasize single thread, we are referring to using one thread to handle network I/O and key-value pair read and write (file event dispatcher). In other words, one thread handles all network requests, but Redis’s other functions, such as persistence, asynchronous deletion, and cluster data synchronization, are actually executed by additional threads.

请注意,当我们强调单线程时,我们指的是使用一个线程来处理网络 I/O 和键值对的读写(文件事件调度器)。换句话说,一个线程处理所有网络请求,但 Redis 的其他功能,如持久化、异步删除和集群数据同步,实际上是由额外的线程执行的。

So why use single thread? The official answer is that because the CPU is not Redis’s bottleneck, it is most likely machine memory or network bandwidth. Since a single thread is easy to implement and the CPU will not become a bottleneck, it makes sense to adopt a single-threaded solution.

所以为什么使用单线程?官方的回答是,因为 CPU 不是 Redis 的瓶颈,最可能的瓶颈是机器内存或网络带宽。由于单线程易于实现,并且 CPU 不会成为瓶颈,因此采用单线程解决方案是合理的。

Although a multi-threaded architecture allows an application to process tasks concurrently through context switching, it provides only a slight performance boost for Redis because most threads will eventually be blocked by network I/O.

尽管多线程架构允许应用程序通过上下文切换并发处理任务,但由于大多数线程最终会被网络 I/O 阻塞,因此对 Redis 的性能提升仅微乎其微。

It is also important to note that because Redis uses a single thread, if a command takes too long to execute (such as the hgetall command), it can cause blocking. Redis is a memory database designed for fast execution, so it is important to be cautious when using commands like lrangesmembershgetall, and so on.

还需要注意的是,由于 Redis 使用单线程,如果某个命令执行时间过长(例如  hgetall  命令),可能会导致阻塞。Redis 是一个为快速执行而设计的内存数据库,因此在使用  lrangesmembershgetall  等命令时需要谨慎。

Non-blocking I/O 非阻塞 I/O

Using a thread model based on network I/O multiplexing (non-blocking I/O) allows for concurrent connections to be handled and helps alleviate the problem of slow network I/O speeds.

使用基于网络 I/O 多路复用(非阻塞 I/O)的线程模型可以处理并发连接,并有助于缓解网络 I/O 速度慢的问题。

The multiplexing I/O model leverages the ability of select, poll, and epoll to simultaneously monitor I/O events of multiple streams. When idle, the current thread is blocked. When one or more streams have I/O events, the thread is awakened from its blocked state, and the program polls all streams (epoll only polls the streams that have generated events), then sequentially handles the ready streams. This approach avoids a large number of useless operations.

多路复用 I/O 模型利用 select、poll 和 epoll 同时监控多个流的 I/O 事件。当处于空闲状态时,当前线程被阻塞。当一个或多个流有 I/O 事件时,线程从阻塞状态中被唤醒,程序轮询所有流(epoll 仅轮询已生成事件的流),然后依次处理就绪的流。这种方法避免了大量无用的操作。

Here, “multiplexing” refers to multiple network connections, and “reuse” refers to reusing the same thread. The use of multiplexing I/O technology allows a single thread to efficiently handle multiple client network I/O connection requests (to minimize time spent on network I/O).

在这里,“多路复用”指的是多个网络连接,而“重用”指的是重用同一个线程。使用多路复用 I/O 技术可以让单个线程高效地处理多个客户端网络 I/O 连接请求(以最小化在网络 I/O 上花费的时间)。

Redis’s network event handler is based on the Reactor pattern, also known as the file event handler.

Redis 的网络事件处理程序基于反应器模式,也称为文件事件处理程序。

The file event handler uses I/O multiplexing to simultaneously listen to multiple sockets and associate tasks performed by the sockets with different event handlers.

文件事件处理程序使用 I/O 多路复用同时监听多个套接字,并将套接字执行的任务与不同的事件处理程序关联。

The file event runs in a single-threaded manner, but by using I/O multiplexing programs to listen to multiple sockets, the file event handler implements a high-performance network communication model.

文件事件以单线程方式运行,但通过使用 I/O 多路复用程序监听多个套接字,文件事件处理器实现了高性能的网络通信模型。

Redis handles client requests, including receiving (socket read), parsing, executing, and sending (socket write) by a single sequential main thread, which is the so-called single-threaded model.Redis

通过一个单一的顺序主线程处理客户端请求,包括接收(套接字读取)、解析、执行和发送(套接字写入),这就是所谓的单线程模型。

Multiple sockets may generate different operations, and each operation corresponds to a different file event. However, the I/O multiplexing program listens to multiple sockets and queues events generated by the sockets. The event dispatcher retrieves an event from the queue each time, and passes the event to the corresponding event handler for processing.

多个套接字可能会生成不同的操作,每个操作对应于不同的文件事件。然而,I/O 多路复用程序监听多个套接字,并排队由套接字生成的事件。事件调度器每次从队列中获取一个事件,并将该事件传递给相应的事件处理程序进行处理。

Redis client calls to the server go through three processes: sending commands, executing commands, and returning results. During the command execution phase, since Redis is single-threaded in handling commands, each command arriving at the server is not executed immediately. All commands are entered into a queue and executed one by one. The execution order of commands sent by multiple clients is uncertain. However, it is certain that two commands will not be executed simultaneously, avoiding concurrency issues. This is Redis’s basic single-threaded model.

Redis 客户端对服务器的调用经历三个过程:发送命令、执行命令和返回结果。在命令执行阶段,由于 Redis 在处理命令时是单线程的,抵达服务器的每个命令并不会立即执行。所有命令都会被放入一个队列中,逐个执行。多个客户端发送的命令的执行顺序是不确定的。然而,可以确定的是两个命令不会同时执行,从而避免了并发问题。这就是 Redis 的基本单线程模型。

Explanation of Redis 6.0 multi-threading Redis 6.0 多线程的解释

Why was multi-threading not used in Redis before version 6.0? 为什么在 Redis 6.0 之前没有使用多线程?

Redis used a single-threaded approach to achieve high maintainability. While multi-threading may perform well in certain aspects, it introduces uncertainty in the order of program execution, leading to a series of issues with concurrent reading and writing. This increases system complexity, and can potentially result in performance losses due to thread switching, locking and unlocking, and even deadlocks.

Redis 采用单线程方法以实现高可维护性。虽然多线程在某些方面可能表现良好,但它引入了程序执行顺序的不确定性,导致并发读写的一系列问题。这增加了系统复杂性,并可能由于线程切换、锁定和解锁,甚至死锁而导致性能损失。

Why did Redis 6.0 introduce multi-threading? 为什么 Redis 6.0 引入了多线程?

Redis 6.0 introduced multi-threading because its bottleneck was not in memory, but in the network I/O module, which consumed CPU time. Therefore, multi-threading was introduced to handle network I/O and make full use of CPU resources, reducing the performance loss caused by network I/O blocking.

Redis 6.0 引入了多线程,因为它的瓶颈不在内存,而在网络 I/O 模块,这消耗了 CPU 时间。因此,引入了多线程来处理网络 I/O,充分利用 CPU 资源,减少因网络 I/O 阻塞造成的性能损失。

How to enable multi-threading in Redis 6.0? 如何在 Redis 6.0 中启用多线程?

By default, multi-threading is disabled in Redis, and it can be enabled in the conf file:

默认情况下,Redis 中禁用了多线程,可以在配置文件中启用

1
2
io-threads-do-reads yes
io-threads [number of threads]

The recommended number of threads according to the official guidelines is to set 2-3 threads for a 4-core machine, and 6 threads for an 8-core machine. The number of threads should be less than the number of machine cores and should not exceed 8 threads if possible.

根据官方指南,推荐的线程数量是为 4 核机器设置 2-3 个线程,为 8 核机器设置 6 个线程。线程数量应少于机器核心数量,并且如果可能的话,不应超过 8 个线程。

Are there any thread concurrency issues in multi-threaded mode? 在多线程模式下是否存在线程并发问题?

As shown in the diagram, a Redis request involves establishing a connection, getting the command to execute, executing the command, and finally writing the response to the socket.

如图所示,Redis 请求涉及建立连接、获取要执行的命令、执行命令,最后将响应写入套接字。

In Redis’ multi-threaded mode, receiving, sending, and parsing commands can be configured to be executed in multiple threads because they are the main time-consuming points we have identified. However, command execution, which involves memory operations, still runs in a single thread.

在 Redis 的多线程模式下,接收、发送和解析命令可以配置为在多个线程中执行,因为它们是我们识别出的主要耗时点。然而,涉及内存操作的命令执行仍然在单个线程中运行。

Therefore, Redis’ multi-threaded part is only used to handle network data reading and writing and protocol parsing. Command execution is still executed in a single thread sequentially, so there are no concurrency safety issues.

因此,Redis 的多线程部分仅用于处理网络数据的读写和协议解析。命令执行仍然在单线程中顺序执行,因此没有并发安全问题。

总结

Redis 高性能的原因

  • Redis 将数据存储在内存中,大多数操作都是纯内存操作,避免了磁盘 I/O 开销
  • Redis 使用高效的数据结构来支持不同的数据类型
  • 单线程架构避免了上下文切换和锁定问题,简化了实现
  • 使用非阻塞 I/O 和 I/O 多路复用技术,高效处理并发连接

Redis 的单线程模型

  • 单线程主要用于处理网络 I/O 和键值对读写,其他功能由额外线程执行
  • 单线程模型避免了并发问题,所有命令按顺序执行
  • 使用基于 Reactor 模式的文件事件处理程序,实现高性能网络通信

Redis 6.0 的多线程支持

  • Redis 6.0 引入多线程以解决网络 I/O 瓶颈,但命令执行仍保持单线程
  • 多线程模式仅用于处理网络数据读写和协议解析,不会引入并发安全问题