Based on "Understanding
TCP/IP Network Stack & Writting Network Apps" article,we will be able
to know about Transmission Control Protocol/Internet Protocol . Internet
service will become useless without TCP/IP. Nowadays, all Internet services we
have developed and used at NHN are based on solid basis,TCP/IP. This article
gave a lot of informations about overall operation scheme of the network stack
based on data flow and and control flow in Linux OS and the hardware layer.
First of all, I want to define each of
the components, Based on "Understanding TCP/IP Network Stack &
Writting Network Apps" article,we will be able to know about Transmission
Control Protocol/Internet Protocol . We cannot imagine Internet service without
TCP/IP. Nowadays, all Internet services we have developed and used at NHN are
based on solid basis,TCP/IP. This article gave a lot of informations about
overall operation scheme of the network stack based on data flow and and
control flow in Linux OS and the hardware layer.
First of all, I want to
define each of the components. The Transmission Control Protocol/Internet
Protocol (TCP/IP) suite was created by the U.S. Department of Defense There is
the TCP Control Block (TCB) structure connected to the socket. The TCB includes
data required for processing the TCP connection. Data in the TCB are connection
state (DoD) to ensure that communications could survive any conditions and that
data integrity wouldn’t be compromised under malicious attacks. TCP is the
connection-oriented in the sense that prior to transmission end points need to
establish a connection first. The data units of TCP protocol are segments which
has a fixed 20-byte header followed by a variable size data field. A stream of
bytes can be breaking down by TCP into segments and reconnecting them at the
other end, retransmitting data that has been lost and it also organizing the
segments in the correct order. Next , The Internet Protocol (IP) is the
principal communications protocol in the Internet protocol suite for relaying
datagrams across network boundaries. Its routing function enables
internetworking, and essentially establishes the Internet .
Usually, since TCP and
IP have different layer structures, it would be correct to describe them
separately. However, here I will describe them as one. First is
connection-oriented. A connection is made between two endpoints (local and remote)
and then data is transferred. Here, the "TCP connection identifier"
is a combination of addresses of the two endpoints, having<local IP address,
local port number, remote IP address, remote port number> type. Next is
Bidirectiona; Byte Stream. Bidirectional data communication is made by using
byte stream. Then, In-Order Delivery. A receiver receives data in the order of
sending data from a sender. For that, the order of data is required. To mark
the order, 32-bit integer data type is used. Next , Reliability through ACK.
When a sender did not receive acknowledgement from a receiver after
sending data to the receiver, the sender TCP re-sends the data to the receiver.
Therefore, the sender TCP buffers unacknowledged data from the receiver. Next is
Flow Control. A sender sends as much data as a receiver can afford. A receiver
sends the maximum number of bytes that it can receive (unused buffer size,
receive window) to the sender. The sender sends as much data as the size of
bytes that the receiver's receive window allows. Next , Congestion Control. The
congestion window is used separately from the receive window to prevent network
congestion by limiting the volume of data flowing in the network. Like the
receive window, the sender sends as much data as the size of bytes that the
receiver's congestion window allows by using a variety of algorithms such as
TCP Vegas, Westwood, BIC, and CUBIC. Different from flow control, congestion
control is implemented by the sender only.
Network stack has many layer but the layer can be classified into three areas
that are User area, Kernel area and device area. The tasks in the user area and
kernel area are using CPU. They are called “host”. When the write system call
is called, the data in the user area is copied to the kernel memory and then
added to the end of the send socket buffer. This is to send data in order. In
the Figure 1, the light-gray box refers to the data in the socket buffer. Then,
TCP is called. It is functional to distinguish them from from the device
area. There is the TCP Control Block (TCB) structure connected to the socket.
The TCB includes data required for processing the TCP connection. There are two
TCP segments , TCP header and data.

The figure above shows
operation process by each layer of TCP/IP network stacks for handing data
received. First, the NIC writes the packet onto its memory. It checks whether
the packet is valid by performing the CRC check and then sends the packet to
the memory buffer of the host. This buffer is a memory that has already been
requested by the driver to the kernel and allocated for receiving packets. After
the buffer has been allocated, the driver tells the memory address and size to
the NIC. When there is no host memory buffer allocated by the driver even
though the NIC receives a packet, the NIC may drop the packet.
After sending the packet
to the host memory buffer, the NIC sends an interrupt to the host OS.
Then, the driver checks
whether it can handle the new packet or not. So far, the driver-NIC
communication protocol defined by the manufacturer is used.
When the driver should
send a packet to the upper layer, the packet must be wrapped in a packet
structure that the OS uses for the OS to understand the packet. For example,
sk_buff of Linux, mbuf of BSD-series kernel, and NET_BUFFER_LIST of Microsoft
Windows are the packet structures of the corresponding OS. The driver sends the
wrapped packets to the upper layer.
The Ethernet layer
checks whether the packet is valid and then de-multiplexes the upper protocol
(network protocol). At this time, it uses the ethertype value of the Ethernet
header. The IPv4 ethertype value is 0x0800. It removes the Ethernet header and
then sends the packet to the IP layer.
The IP layer also checks
whether the packet is valid. In other words, it checks the IP header checksum.
It logically determines whether it should perform IP routing and make the local
system handle the packet, or send the packet to the other system. If the packet
must be handled by the local system, the IP layer de-multiplexes the upper
protocol (transport protocol) by referring to the proto value of the IP header.
The TCP proto value is 6. It removes the IP header and then sends the packet to
the TCP layer.
Like the lower layer,
the TCP layer checks whether the packet is valid. It also checks the TCP
checksum. As mentioned before, since the current network stack uses the
checksum offload, the TCP checksum is computed by NIC, not by the kernel. The
size of the receive socket buffer is the TCP receive window. To a certain
point, the TCP throughput increases when the receive window is large. In the
past, the socket buffer size had been adjusted on the application or the OS
configuration. The latest network stack has a function to adjust the receive
socket buffer size, i.e., the receive window, automatically.
When the application
calls the read system call, the area is changed to the kernel area and the data
in the socket buffer is copied to the memory in the user area. The copied data
is removed from the socket buffer. And then the TCP is called. The TCP
increases the receive window because there is new space in the socket buffer.
And it sends a packet according to the protocol status. If no packet is
transferred, the system call is terminated.
This article is really
helpful for those who want to develop network programs, execute performance
test, and perform troubleshooting.