Linux Applications Performance: Part VII: epoll Servers

This chapter is part of a series of articles on Linux application performance.

If you came here without reading the `poll()`-based implementation description

All the explanation of how exactly we move from a process or thread-based model to an event based model is there in the poll()-based article. Without going through it, this article might not make a lot of sense.

Continuing on with `epoll`

The main difference between a server featuring poll() and a server featuring epoll() is just the API. Otherwise, the general program structure should remain the same. The real major changes we saw are when we went from a threading to an event-based model. Pretty much every function in the program had to change. In this article, we’ll see the main differences between usage of poll() vs epoll_wait().

Like we saw in the previous article in this series, the problem with poll() is that it takes as input a pointer to an array of structures which has one element for every file descriptor you want to monitor. This array can grow significantly when you’re dealing with thousands of concurrent requests. Also, poll(), when it returns, only tells you how many descriptors are ready. It is your job to iterate through the array and check the revents member of the structure in each element to see if that particular descriptor is the one that’s ready. This does not scale well as the number of concurrent connections increase.

With epoll, you first create an epoll descriptor:

epoll_fd = epoll_create1(0);

Now, you can add descriptors to watch for activity by calling epoll_ctl. Take a look at our utility function add_to_epoll_fd_list().

void add_to_epoll_fd_list(int fd, uint32_t ep_events) {
    struct epoll_event event;
    event.data.fd = fd;
    event.events = ep_events;
    if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &event))
        fatal_error("add epoll_ctl()");

}

Finally, you call epoll_wait(), which is the equivalent of calling poll(), but you’ve already told the operating system which descriptors you want to monitor with epoll_ctl(). There is no huge array sitting in user space that you need to pass to the kernel every time you call epoll_wait().

void enter_server_loop(int server_socket)
{
    int epoll_ret;
    struct sockaddr_in client_addr;
    socklen_t client_addr_len = sizeof(client_addr);

    while (1) {
        epoll_ret = epoll_wait(epoll_fd, events, MAX_EVENTS, -1);
        if (epoll_ret < 0)
            fatal_error("epoll_wait()");

        for (int i = 0; i < epoll_ret; i++) {

            if (events[i].events == 0)
                continue;

            int client_socket = accept(
                    server_socket,
                    (struct sockaddr *)&client_addr,
                    &client_addr_len);

            if (client_socket > 0 ) {
                handle_client(client_socket);
            }

            if (events[i].data.fd != server_socket) {
                if (!((events[i].events & EPOLLIN)||(events[i].events & EPOLLOUT))) {
                    printf("ERROR: Error condition in socket %d!\n", events[i].data.fd);
                    exit(1);
                    continue;
                }
                /* A descriptor handling one of the clients is ready */
                client_context *check_cc;
                HASH_FIND(hh_cs, cc_cs_hash, &events[i].data.fd, sizeof(int), check_cc);
                if (!check_cc) {
                    HASH_FIND(hh_rs, cc_rs_hash, &events[i].data.fd, sizeof(int), check_cc);
                    if (!check_cc) {
                        printf("Unable to find client context associated with socket %d.\n", events[i].data.fd);
	            		exit(1);
                    }

                }
                check_cc->read_write_callback(check_cc);
            }
        }
    }
}

Take a look at enter_server_loop() above. epoll_wait() even fills up the events array with information on descriptors that are ready. It returns the number of descriptors that are ready. This way, you need not potentially loop through the entire array of file descriptors even if there is only one descriptor that is actually ready.

Apart from the addition of epoll_create1(), and epoll_ctl() and the replacement of poll() with epoll_wait() and the changes that go along with that, there are really no other changes in the program to convert the version of ZeroHTTPd based on poll() to the one based on epoll.

Performance of the `epoll` based server

Let’s bring up our table of performance numbers again for easy reference

	requests/second
concurrency	iterative	forking	preforked	threaded	prethreaded	poll	epoll
20	7	112	2,100	1,800	2,250	1,900	2,050
50	7	190	2,200	1,700	2,200	2,000	2,000
100	7	245	2,200	1,700	2,200	2,150	2,100
200	7	330	2,300	1,750	2,300	2,200	2,100
300	–	380	2,200	1,800	2,400	2,250	2,150
400	–	410	2,200	1,750	2,600	2,000	2,000
500	–	440	2,300	1,850	2,700	1,900	2,212
600	–	460	2,400	1,800	2,500	1,700	2,519
700	–	460	2,400	1,600	2,490	1,550	2,607
800	–	460	2,400	1,600	2,540	1,400	2,553
900	–	460	2,300	1,600	2,472	1,200	2,567
1,000	–	475	2,300	1,700	2,485	1,150	2,439
1,500	–	490	2,400	1,550	2,620	900	2,479
2,000	–	350	2,400	1,400	2,396	550	2,200
2,500	–	280	2,100	1,300	2,453	490	2,262
3,000	–	280	1,900	1,250	2,502	wide variations	2,138
5,000	–	wide variations	1,600	1,100	2,519	–	2,235
8,000	–	–	1,200	wide variations	2,451	–	2,100
10,000	–	–	wide variations	–	2,200	–	2,200
11,000	–	–	–	–	2,200	–	2,122
12,000	–	–	–	–	970	–	1,958
13,000	–	–	–	–	730	–	1,897
14,000	–	–	–	–	590	–	1,466
15,000	–	–	–	–	532	–	1,281

In this test, as in the poll() based test, I had 10 processes based on epoll running. The real competition it has is really with the pre-threaded version of ZeroHTTPd. I ran two processes of the pre-threaded server with 10,000 threads in the pool each. You can see something very interesting. While both servers are neck-to-neck in performance, our epoll based server extends well up to 15,000 concurrent users, whereas the pre-threaded server loses steam 12,000 concurrent connections mark. I think the most important observation is that while performance is decent, due to relatively less operating system overhead, it scales very well when it comes to handling very high concurrent requests.

There are still a couple of places where the poll() and epoll based servers can block. Like the initial read from the client right after we accept a connection and all the writes to both the client and to Redis. I believe, if we make these also non-blocking, we should see even better performance compared to the pre-threaded version.

Articles in this series

Thank you!

If you made it all the way here through the 7 articles, I hope you feel it was worth the journey. Systems programming is a fascinating subject and knowing the nuances can be very rewarding and satisfying. Feel free to tweet at me for any feedback or just a hi-fi: @shuveb.

Posted

April 28, 2019

Linux, Network Programming, Performance

shuveb

Tags:

Comments

7 responses to “Linux Applications Performance: Part VII: epoll Servers”

Linux Applications Performance: Introduction – Unixism

April 28, 2019

[…] Part VII: epoll-based server […]
Linux Applications Performance: Part II: Forking Servers – Unixism

April 28, 2019

[…] Part VII: epoll-based server […]
Linux Applications Performance: Part III: Preforked Servers – Unixism

April 28, 2019

[…] Part VII: epoll-based server […]
Linux Applications Performance: Part V: Pre-threaded Servers – Unixism

April 28, 2019

[…] Part VII: epoll-based server […]
Linux Applications Performance: Part I: Iterative Servers – Unixism

May 14, 2019

[…] Part VII: epoll-based server […]
Linux Applications Performance: Part VI: Polling Servers – Unixism

May 16, 2019

[…] the next section, we will see how the epoll family of calls does away with this problem and provides a descriptor […]
週刊Railsウォッチ（20190611-2/2後編）Dockerfileベストプラクティス、DBの冗長化、jQueryとの付き合い方ほか｜TechRacho（テックラッチョ）〜エンジニアの「？」を「！」に〜｜BPS株式会社

January 22, 2020

[…] 「結果を見るとepollとかいうのがよさげですね」「epollって誰かご存知？」「これがそれっぽいですね↓」「記事の第7回もepollで、eはどうやらイベントのeみたい」「epoll知らんけど?、詳しく知りたい人はきっとman epollで出てくると思います: LinuxのシステムコールはPOSIXを拡張したものが多いので、こういう新しいシステムコールを使うと速くなってくれるかも？」 […]

Linux Applications Performance: Part VII: epoll Servers

If you came here without reading the poll()-based implementation description

Continuing on with epoll

Performance of the epoll based server