Linux Applications Performance: Part IV: Threaded Servers

This chapter is part of a series of articles on Linux application performance.

Threads were all the rage in the 90s. They allowed the cool guys to give jaw-dropping demos to their friends and colleagues. Like so many cool things from the 90s, today, it is yet another tool in the arsenal for any programmer. In Linux, the API to use is POSIX Threads, which is lovingly shortened to pthreads. Underneath all of it, Linux implements threads using the clone() system call. But you don’t really have to worry about that. It is sufficient enough to know that the C library presents the well-known pthreads API and you just need to use it.

The threaded server is the equivalent of our forking server. That is, we create a new thread every time we get a connection from a client and we handle processing of that request in the new thread, after which the thread terminates. You might think now that we know creating a new process or thread as a request comes in incurs a lot of overhead, using this technique isn’t a great idea. You are right. A better idea would be to create the threads equivalent of the pre-forked server, which is a pre-threaded or a thread-pool based server. We are doing this for the sake of completeness and it also totally worth it since it reveals something very interesting: the difference in overhead between creating a process with each new connection request vs. creating a thread in response to a new connection request.

int main(int argc, char *argv[])
{
    int server_port;
    signal(SIGINT, print_stats);

    if (argc > 1)
      server_port = atoi(argv[1]);
    else
      server_port = DEFAULT_SERVER_PORT;

    if (argc > 2)
        strcpy(redis_host_ip, argv[2]);
    else
        strcpy(redis_host_ip, REDIS_SERVER_HOST);

    int server_socket = setup_listening_socket(server_port);
    printf("ZeroHTTPd server listening on port %d\n", server_port);
    enter_server_loop(server_socket);
    return (0);
}

We are back to a simple main() function. It calls enter_server_loop() directly.

void enter_server_loop(int server_socket)
{
    struct sockaddr_in client_addr;
    socklen_t client_addr_len = sizeof(client_addr);
    pthread_t tid;

    while (1)
    {
        int client_socket = accept(
                server_socket,
                (struct sockaddr *)&client_addr,
                &client_addr_len);
        if (client_socket == -1)
            fatal_error("accept()");

        pthread_create(&tid, NULL, &handle_client, (void *)(intptr_t) client_socket);
    }
}

Here, we get into an infinite loop which calls accept() and every time there is a new connection, we create a new thread with pthread_create(). We pass handle_client() as the thread’s start routine. We pass client_socket as its argument.

void *handle_client(void *targ)
{
    char line_buffer[1024];
    char method_buffer[1024];
    int method_line = 0;
    int client_socket = (long) targ;
    int redis_server_socket;

    /* No one needs to do a pthread_join() for the OS to free up this thread's resources */
    pthread_detach(pthread_self());

    connect_to_redis_server(&redis_server_socket);

    while (1)
    {
        get_line(client_socket, line_buffer, sizeof(line_buffer));
        method_line++;

        unsigned long len = strlen(line_buffer);

        /*
         The first line has the HTTP method/verb. It's the only
         thing we care about. We read the rest of the lines and
         throw them away.
         */
        if (method_line == 1)
        {
            if (len == 0)
                return (NULL);

            strcpy(method_buffer, line_buffer);
        }
        else
        {
            if (len == 0)
                break;
        }
    }

    handle_http_method(method_buffer, client_socket);
    close(client_socket);
    close(redis_socket_fd);
    return(NULL);
}

We see that the handle_client() function implemented here is not very different from the ones we saw implemented in earlier server architectures. Right at the beginning, we call pthread_detatch(), which tells the operating system that it can free this thread’s resources without another thread having to call pthread_join() to collect details about this thread’s termination. This is much like how the parent process needs to call wait(). Otherwise, this method has all the usual function calls to be able to handle the client request. Remember however, that once this function returns, the thread ends since this function was the one that was specified as the thread’s start routine.

Performance of the Threaded Server

Here is our table of performance numbers for easy reference:

requests/second
concurrency iterative forking preforked threaded prethreaded poll epoll
20 7 112 2,100 1,800 2,250 1,900 2,050
50 7 190 2,200 1,700 2,200 2,000 2,000
100 7 245 2,200 1,700 2,200 2,150 2,100
200 7 330 2,300 1,750 2,300 2,200 2,100
300 380 2,200 1,800 2,400 2,250 2,150
400 410 2,200 1,750 2,600 2,000 2,000
500 440 2,300 1,850 2,700 1,900 2,212
600 460 2,400 1,800 2,500 1,700 2,519
700 460 2,400 1,600 2,490 1,550 2,607
800 460 2,400 1,600 2,540 1,400 2,553
900 460 2,300 1,600 2,472 1,200 2,567
1,000 475 2,300 1,700 2,485 1,150 2,439
1,500 490 2,400 1,550 2,620 900 2,479
2,000 350 2,400 1,400 2,396 550 2,200
2,500 280 2,100 1,300 2,453 490 2,262
3,000 280 1,900 1,250 2,502 wide variations 2,138
5,000 wide variations 1,600 1,100 2,519 2,235
8,000 1,200 wide variations 2,451 2,100
10,000 wide variations 2,200 2,200
11,000 2,200 2,122
12,000 970 1,958
13,000 730 1,897
14,000 590 1,466
15,000 532 1,281

The best comparison one can make with the threaded server is with that of the forking server. For lower concurrency numbers, you can see around 10x performance improvement over the forking server design. For concurrencies involving several 100 connections, you can see that it outperforms the forking server by 5-6x! Also, very importantly, it is able to stably deal with concurrencies of up to 8,000 connections, although by around 3,000 concurrent connections, we begin to see a drop in performance.

In the next article in this series, we see how using a thread pool improves performance over a threading design as described here.

Articles in this series

  1. Series Introduction
  2. Part I. Iterative Servers
  3. Part II. Forking Servers
  4. Part III. Pre-forking Servers
  5. Part IV. Threaded Servers
  6. Part V. Pre-threaded Servers
  7. Part VI: poll-based server
  8. Part VII: epoll-based server