This chapter is part of a series of articles on Linux application performance.
If you came here without reading the poll()
-based implementation description
All the explanation of how exactly we move from a process or thread-based model to an event based model is there in the poll()
-based article. Without going through it, this article might not make a lot of sense.
Continuing on with epoll
The main difference between a server featuring poll()
and a server featuring epoll()
is just the API. Otherwise, the general program structure should remain the same. The real major changes we saw are when we went from a threading to an event-based model. Pretty much every function in the program had to change. In this article, we’ll see the main differences between usage of poll()
vs epoll_wait()
.
Like we saw in the previous article in this series, the problem with poll()
is that it takes as input a pointer to an array of structures which has one element for every file descriptor you want to monitor. This array can grow significantly when you’re dealing with thousands of concurrent requests. Also, poll()
, when it returns, only tells you how many descriptors are ready. It is your job to iterate through the array and check the revents
member of the structure in each element to see if that particular descriptor is the one that’s ready. This does not scale well as the number of concurrent connections increase.
With epoll
, you first create an epoll
descriptor:
epoll_fd = epoll_create1(0);
Now, you can add descriptors to watch for activity by calling epoll_ctl
. Take a look at our utility function add_to_epoll_fd_list()
.
void add_to_epoll_fd_list(int fd, uint32_t ep_events) { struct epoll_event event; event.data.fd = fd; event.events = ep_events; if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &event)) fatal_error("add epoll_ctl()"); }
Finally, you call epoll_wait()
, which is the equivalent of calling poll()
, but you’ve already told the operating system which descriptors you want to monitor with epoll_ctl()
. There is no huge array sitting in user space that you need to pass to the kernel every time you call epoll_wait()
.
void enter_server_loop(int server_socket) { int epoll_ret; struct sockaddr_in client_addr; socklen_t client_addr_len = sizeof(client_addr); while (1) { epoll_ret = epoll_wait(epoll_fd, events, MAX_EVENTS, -1); if (epoll_ret < 0) fatal_error("epoll_wait()"); for (int i = 0; i < epoll_ret; i++) { if (events[i].events == 0) continue; int client_socket = accept( server_socket, (struct sockaddr *)&client_addr, &client_addr_len); if (client_socket > 0 ) { handle_client(client_socket); } if (events[i].data.fd != server_socket) { if (!((events[i].events & EPOLLIN)||(events[i].events & EPOLLOUT))) { printf("ERROR: Error condition in socket %d!\n", events[i].data.fd); exit(1); continue; } /* A descriptor handling one of the clients is ready */ client_context *check_cc; HASH_FIND(hh_cs, cc_cs_hash, &events[i].data.fd, sizeof(int), check_cc); if (!check_cc) { HASH_FIND(hh_rs, cc_rs_hash, &events[i].data.fd, sizeof(int), check_cc); if (!check_cc) { printf("Unable to find client context associated with socket %d.\n", events[i].data.fd); exit(1); } } check_cc->read_write_callback(check_cc); } } } }
Take a look at enter_server_loop()
above. epoll_wait()
even fills up the events
array with information on descriptors that are ready. It returns the number of descriptors that are ready. This way, you need not potentially loop through the entire array of file descriptors even if there is only one descriptor that is actually ready.
Apart from the addition of epoll_create1()
, and epoll_ctl()
and the replacement of poll()
with epoll_wait()
and the changes that go along with that, there are really no other changes in the program to convert the version of ZeroHTTPd based on poll()
to the one based on epoll
.
Performance of the epoll
based server
Let’s bring up our table of performance numbers again for easy reference
|
|||||||
concurrency | iterative | forking | preforked | threaded | prethreaded | poll | epoll |
20 | 7 | 112 | 2,100 | 1,800 | 2,250 | 1,900 | 2,050 |
50 | 7 | 190 | 2,200 | 1,700 | 2,200 | 2,000 | 2,000 |
100 | 7 | 245 | 2,200 | 1,700 | 2,200 | 2,150 | 2,100 |
200 | 7 | 330 | 2,300 | 1,750 | 2,300 | 2,200 | 2,100 |
300 | – | 380 | 2,200 | 1,800 | 2,400 | 2,250 | 2,150 |
400 | – | 410 | 2,200 | 1,750 | 2,600 | 2,000 | 2,000 |
500 | – | 440 | 2,300 | 1,850 | 2,700 | 1,900 | 2,212 |
600 | – | 460 | 2,400 | 1,800 | 2,500 | 1,700 | 2,519 |
700 | – | 460 | 2,400 | 1,600 | 2,490 | 1,550 | 2,607 |
800 | – | 460 | 2,400 | 1,600 | 2,540 | 1,400 | 2,553 |
900 | – | 460 | 2,300 | 1,600 | 2,472 | 1,200 | 2,567 |
1,000 | – | 475 | 2,300 | 1,700 | 2,485 | 1,150 | 2,439 |
1,500 | – | 490 | 2,400 | 1,550 | 2,620 | 900 | 2,479 |
2,000 | – | 350 | 2,400 | 1,400 | 2,396 | 550 | 2,200 |
2,500 | – | 280 | 2,100 | 1,300 | 2,453 | 490 | 2,262 |
3,000 | – | 280 | 1,900 | 1,250 | 2,502 | wide variations | 2,138 |
5,000 | – | wide variations | 1,600 | 1,100 | 2,519 | – | 2,235 |
8,000 | – | – | 1,200 | wide variations | 2,451 | – | 2,100 |
10,000 | – | – | wide variations | – | 2,200 | – | 2,200 |
11,000 | – | – | – | – | 2,200 | – | 2,122 |
12,000 | – | – | – | – | 970 | – | 1,958 |
13,000 | – | – | – | – | 730 | – | 1,897 |
14,000 | – | – | – | – | 590 | – | 1,466 |
15,000 | – | – | – | – | 532 | – | 1,281 |
In this test, as in the poll()
based test, I had 10 processes based on epoll
running. The real competition it has is really with the pre-threaded version of ZeroHTTPd. I ran two processes of the pre-threaded server with 10,000 threads in the pool each. You can see something very interesting. While both servers are neck-to-neck in performance, our epoll
based server extends well up to 15,000 concurrent users, whereas the pre-threaded server loses steam 12,000 concurrent connections mark. I think the most important observation is that while performance is decent, due to relatively less operating system overhead, it scales very well when it comes to handling very high concurrent requests.
There are still a couple of places where the poll()
and epoll
based servers can block. Like the initial read from the client right after we accept a connection and all the writes to both the client and to Redis. I believe, if we make these also non-blocking, we should see even better performance compared to the pre-threaded version.
Articles in this series
- Series Introduction
- Part I. Iterative Servers
- Part II. Forking Servers
- Part III. Pre-forking Servers
- Part IV. Threaded Servers
- Part V. Pre-threaded Servers
- Part VI: poll-based server
- Part VII: epoll-based server
Thank you!
If you made it all the way here through the 7 articles, I hope you feel it was worth the journey. Systems programming is a fascinating subject and knowing the nuances can be very rewarding and satisfying. Feel free to tweet at me for any feedback or just a hi-fi: @shuveb.
Comments
7 responses to “Linux Applications Performance: Part VII: epoll Servers”
[…] Part VII: epoll-based server […]
[…] Part VII: epoll-based server […]
[…] Part VII: epoll-based server […]
[…] Part VII: epoll-based server […]
[…] Part VII: epoll-based server […]
[…] the next section, we will see how the epoll family of calls does away with this problem and provides a descriptor […]
[…] 「結果を見るとepollとかいうのがよさげですね」「epollって誰かご存知?」「これがそれっぽいですね↓」「記事の第7回もepollで、eはどうやらイベントのeみたい」「epoll知らんけど?、詳しく知りたい人はきっとman epollで出てくると思います: LinuxのシステムコールはPOSIXを拡張したものが多いので、こういう新しいシステムコールを使うと速くなってくれるかも?」 […]