Home » TCP/IP Deep Dive: Three-Way Handshake, Window Size, Congestion Control, TCP Offload และ Performance
TCP/IP Deep Dive: Three-Way Handshake, Window Size, Congestion Control, TCP Offload และ Performance
TCP/IP Deep Dive: Three-Way Handshake, Window Size, Congestion Control, TCP Offload และ Performance
TCP/IP เป็น protocol suite ที่ขับเคลื่อน internet ทั้งหมด Three-Way Handshake สร้าง reliable connection, Window Size ควบคุม flow control, Congestion Control ป้องกัน network congestion, TCP Offload ย้าย TCP processing ไป NIC hardware และ Performance tuning ปรับแต่งให้ได้ throughput สูงสุด
แม้ว่า TCP จะถูกออกแบบมาตั้งแต่ 1981 (RFC 793) แต่ยังเป็น backbone ของ internet: HTTP, HTTPS, SSH, SMTP, database connections ทุกอย่างใช้ TCP ปัญหาคือ TCP ถูกออกแบบสำหรับ low-bandwidth, high-latency networks ของยุค 80s → modern high-speed networks ต้องการ tuning อย่างมากเพื่อใช้ bandwidth ได้เต็มที่ (TCP window scaling, selective ACK, congestion control algorithms)
Three-Way Handshake
| Step |
Packet |
Description |
| 1 |
SYN |
Client → Server: “ขอเชื่อมต่อ” (ISN = Initial Sequence Number ของ client) |
| 2 |
SYN-ACK |
Server → Client: “ตกลง” (ISN ของ server + ACK client’s ISN+1) |
| 3 |
ACK |
Client → Server: “รับทราบ” (ACK server’s ISN+1) → connection established |
TCP Header Fields
| Field |
Size |
Purpose |
| Source/Dest Port |
16 bits each |
Identify application endpoints (0-65535) |
| Sequence Number |
32 bits |
Track byte position ของ data ใน stream |
| Acknowledgment Number |
32 bits |
Next byte expected (cumulative ACK) |
| Window Size |
16 bits |
Receive buffer available (flow control) — max 65535 without scaling |
| Flags |
9 bits |
SYN, ACK, FIN, RST, PSH, URG, ECE, CWR, NS |
| Checksum |
16 bits |
Error detection (header + data) |
| Options |
Variable |
MSS, Window Scale, SACK, Timestamps |
Window Size and Flow Control
| Concept |
Description |
| Receive Window (rwnd) |
จำนวน bytes ที่ receiver รับได้ก่อน ACK — advertise ใน TCP header |
| Window Scaling |
RFC 7323: multiply window by 2^scale factor → support windows up to 1GB (scale 0-14) |
| Sliding Window |
Window เลื่อนไปเมื่อ data ถูก ACK → sender ส่ง data ใหม่ได้ตาม window ที่เหลือ |
| Zero Window |
Receiver buffer เต็ม → advertise window=0 → sender หยุดส่ง → probe periodically |
| BDP |
Bandwidth-Delay Product = bandwidth × RTT → window ต้อง ≥ BDP เพื่อ fill pipe |
| Example |
1 Gbps link, 10ms RTT → BDP = 1.25 MB → window ต้อง ≥ 1.25 MB |
Congestion Control Algorithms
| Algorithm |
How |
Best For |
| TCP Reno |
AIMD: slow start → congestion avoidance → fast retransmit/recovery |
Legacy default — simple but not optimal |
| TCP CUBIC |
Cubic function for window growth — aggressive recovery after loss |
Linux default — good for high-BDP networks |
| TCP BBR (Google) |
Model-based: estimate bandwidth + RTT → set rate accordingly (not loss-based) |
Long-distance, lossy links (internet, cloud) |
| TCP Vegas |
Delay-based: detect congestion from RTT increase (before loss) |
Low-latency environments |
| DCTCP |
ECN-based: use ECN marks to fine-tune window (data center optimized) |
Data center (low latency, high throughput) |
| QUIC (HTTP/3) |
UDP-based transport: 0-RTT handshake, stream multiplexing, built-in encryption |
Web traffic, replacing TCP+TLS for HTTP |
TCP Options
| Option |
Purpose |
Impact |
| MSS (Maximum Segment Size) |
Max data per segment (usually 1460 for Ethernet) |
Avoid fragmentation |
| Window Scale |
Scale factor for receive window (negotiate in SYN) |
Enable windows > 65535 bytes |
| SACK (Selective ACK) |
ACK specific byte ranges → retransmit only lost segments |
Much better than cumulative ACK for loss recovery |
| Timestamps |
RTT measurement + PAWS (Protection Against Wrapped Sequences) |
Accurate RTT, prevent sequence wrap issues |
| TFO (TCP Fast Open) |
Send data in SYN → save 1 RTT on reconnection |
Faster connection establishment |
TCP Offload
| Type |
What Offloaded |
Benefit |
| TSO (TCP Segmentation Offload) |
NIC splits large buffer into TCP segments |
Reduce CPU overhead for segmentation |
| LRO (Large Receive Offload) |
NIC combines multiple small packets into large buffer |
Reduce interrupt + processing overhead |
| GRO (Generic Receive Offload) |
Software LRO ที่ preserve packet boundaries |
Better than LRO for forwarding/routing |
| Checksum Offload |
NIC calculates TCP/IP checksums |
Reduce CPU for checksum calculation |
| RSS (Receive Side Scaling) |
Distribute packets across multiple CPU cores |
Scale TCP processing across cores |
TCP Performance Tuning
| Parameter |
Default |
Tuned |
Impact |
| tcp_rmem / tcp_wmem |
4K / 16K / 4M |
4K / 256K / 16M |
Larger buffers for high-BDP connections |
| tcp_window_scaling |
Enabled |
Enabled (don’t disable) |
Allow windows > 64KB |
| tcp_sack |
Enabled |
Enabled (don’t disable) |
Better loss recovery |
| tcp_congestion_control |
cubic |
bbr (for internet-facing) |
Better throughput on lossy links |
| tcp_fastopen |
Disabled |
3 (client+server) |
Save 1 RTT on reconnect |
| net.core.netdev_max_backlog |
1000 |
5000-50000 |
Handle burst traffic without drops |
ทิ้งท้าย: TCP = Reliable But Needs Tuning for Modern Networks
TCP/IP Deep Dive Handshake: SYN → SYN-ACK → ACK (3-way), FIN → ACK → FIN → ACK (4-way close) Window: receive window (flow control), window scaling (up to 1GB), BDP = bandwidth × RTT Congestion: Reno (AIMD), CUBIC (Linux default), BBR (Google, model-based), DCTCP (data center) Options: MSS (avoid fragmentation), SACK (selective retransmit), timestamps (RTT), TFO (save 1 RTT) Offload: TSO (send segmentation), LRO/GRO (receive aggregation), RSS (multi-core), checksum offload Tuning: buffer sizes (rmem/wmem), BBR for internet, window scaling + SACK always on, TFO for web Key: TCP default settings are for 1990s networks — tune buffers, use BBR, enable SACK/TFO for modern performance
อ่านเพิ่มเติมเกี่ยวกับ QoS Quality of Service DiffServ DSCP และ Network Troubleshooting Methodology Wireshark ที่ siamlancard.com หรือจาก icafeforex.com และ siam2r.com