This chapter is part of a series of articles on Linux application performance.
The design discussed in this article is more popularly known as “thread pool”. Essentially, there is a pre-created pool of threads that are ready to serve any incoming requests. This is comparable to the pre-forked server design. Whereas there was a process pool in the pre-forked architecture, we have a thread pool in this case.
Most of the code is very similar to the pre-forked server. Let’s take a look at the main()
function:
int main(int argc, char *argv[]) { int server_port; signal(SIGINT, print_stats); if (argc > 1) server_port = atoi(argv[1]); else server_port = DEFAULT_SERVER_PORT; if (argc > 2) strcpy(redis_host_ip, argv[2]); else strcpy(redis_host_ip, REDIS_SERVER_HOST); printf("ZeroHTTPd server listening on port %d\n", server_port); server_socket = setup_listening_socket(server_port); for (int i = 0; i < THREADS_COUNT; ++i) { create_thread(i); } for (;;) pause(); }
We call create_thread()
in a for loop that does iterations equal to THREADS_COUNT
. After this, the main thread calls pause()
forever in an infinite loop. create_thread()
itself is very simple:
void create_thread(int index) { pthread_create(&threads[index], NULL, &enter_server_loop, NULL); }
We call pthread_create()
there with enter_server_loop()
as the start function. Let’s take a look at that function now:
void *enter_server_loop(void *targ) { struct sockaddr_in client_addr; socklen_t client_addr_len = sizeof(client_addr); while (1) { pthread_mutex_lock(&mlock); long client_socket = accept( server_socket, (struct sockaddr *)&client_addr, &client_addr_len); if (client_socket == -1) fatal_error("accept()"); pthread_mutex_unlock(&mlock); handle_client(client_socket); } }
Rather than having all threads block on accept()
, all threads call pthread_mutex_lock()
. “Mutex” is a short form of the term “mutual exclusion”. Only one thread “acquires” the mutex and executes past pthread_mutex_lock()
. All other threads block on pthread_mutex_lock()
. This is an incredibly useful idea. Once one thread returns from accept()
, it then calls pthread_mutex_unlock()
to release the mutex so that some other thread can then acquire it and call accept()
. This setup ensures that only one thread among the pool can actually block on accept()
, thus avoiding the thundering herd problem discussed in the pre-forked server architecture article.
Other parts of the server are pretty much the same code as in the pre-forked server.
Pre-threaded Server Performance
Given that threads are pretty light-weight compared to processes since they share much of the main process memory, there is very little operating system overhead when creating a new thread relative to creating a new process. Let’s bring up our performance numbers table:
|
|||||||
concurrency | iterative | forking | preforked | threaded | prethreaded | poll | epoll |
20 | 7 | 112 | 2,100 | 1,800 | 2,250 | 1,900 | 2,050 |
50 | 7 | 190 | 2,200 | 1,700 | 2,200 | 2,000 | 2,000 |
100 | 7 | 245 | 2,200 | 1,700 | 2,200 | 2,150 | 2,100 |
200 | 7 | 330 | 2,300 | 1,750 | 2,300 | 2,200 | 2,100 |
300 | – | 380 | 2,200 | 1,800 | 2,400 | 2,250 | 2,150 |
400 | – | 410 | 2,200 | 1,750 | 2,600 | 2,000 | 2,000 |
500 | – | 440 | 2,300 | 1,850 | 2,700 | 1,900 | 2,212 |
600 | – | 460 | 2,400 | 1,800 | 2,500 | 1,700 | 2,519 |
700 | – | 460 | 2,400 | 1,600 | 2,490 | 1,550 | 2,607 |
800 | – | 460 | 2,400 | 1,600 | 2,540 | 1,400 | 2,553 |
900 | – | 460 | 2,300 | 1,600 | 2,472 | 1,200 | 2,567 |
1,000 | – | 475 | 2,300 | 1,700 | 2,485 | 1,150 | 2,439 |
1,500 | – | 490 | 2,400 | 1,550 | 2,620 | 900 | 2,479 |
2,000 | – | 350 | 2,400 | 1,400 | 2,396 | 550 | 2,200 |
2,500 | – | 280 | 2,100 | 1,300 | 2,453 | 490 | 2,262 |
3,000 | – | 280 | 1,900 | 1,250 | 2,502 | wide variations | 2,138 |
5,000 | – | wide variations | 1,600 | 1,100 | 2,519 | – | 2,235 |
8,000 | – | – | 1,200 | wide variations | 2,451 | – | 2,100 |
10,000 | – | – | wide variations | – | 2,200 | – | 2,200 |
11,000 | – | – | – | – | 2,200 | – | 2,122 |
12,000 | – | – | – | – | 970 | – | 1,958 |
13,000 | – | – | – | – | 730 | – | 1,897 |
14,000 | – | – | – | – | 590 | – | 1,466 |
15,000 | – | – | – | – | 532 | – | 1,281 |
Comparison to the threaded architecture
The pre-threaded server architecture has about 40% better performance on average compared to the threading server. However, it has similar performance compared to the pre-forked server. This clarifies the fact that under Linux, processes and threads are scheduled in the same way and also have similar performance characteristics. If there is any difference in overhead, it is in creating a process vs. creating a thread since processes share less and threads pretty much everything with the creating thread.
Comparison to the pre-forked architecture
There is one other crucial difference compared to the pre-forked architecture. While our pre-forked server is able to reliably scale till about 5,000 concurrent connections with decent performance, our pre-threaded server is able to handle up to 11,000 concurrent connections without significant deterioration in performance. This is clearly an advantage over the pre-forked server.
Switching from pre-forked to pre-threaded worth it?
It is fairly easy in most cases to change a server based on a pre-forked architecture to a pre-threaded architecture, especially if there are less synchronization primitives in use in the pre-forked version. It should be worth the effort since the performance gains that are achieved when going from a pre-forked to a pre-threaded architecture are non-trivial. However, this should be done only if you are expecting a lot of concurrent users. As you can see from the table, performance numbers are very similar for up to thousands of concurrent users.
Now, for something completely different
Now that you’ve seen how servers based on processes and threads work, we are ready to see, in the next article in the series, how servers based on the event processing model work.
Comments
6 responses to “Linux Applications Performance: Part V: Pre-threaded Servers”
[…] Part V. Pre-threaded Servers […]
[…] Part V. Pre-threaded Servers […]
[…] Part V. Pre-threaded Servers […]
[…] Part V. Pre-threaded Servers […]
[…] Part V. Pre-threaded Servers […]
[…] Part V. Pre-threaded Servers […]