Something About Nginx
Something about Nginx
1. Nginx as a Traffic Gateway
- Definition
- Nginx is primarily known as a high‑performance HTTP server, reverse proxy, and load balancer.
- Written in C for efficiency and portability.
- Operates primarily at the application layer (Layer 7) when performing load balancing (though it can also handle TCP/UDP streams in later versions).
- Key Features
- Reverse Proxy: Nginx can sit in front of web servers or applications, intercepting and forwarding client requests, and can perform caching, SSL termination, compression, etc.
- Load Balancing: Distributes incoming requests across multiple backend servers, supporting algorithms like round-robin, IP hash, least connections, etc.
- High Concurrency & Low Memory Usage: Nginx uses an asynchronous, event-driven model that scales extremely well under high traffic.
- Caching: Can cache both static and dynamic content to improve performance.
- One Master Process + Multiple Worker Processes: This architecture helps isolate the manager and configuration tasks (master) from the actual I/O handling (workers).
- Event-Driven, Asynchronous I/O
- Nginx relies on epoll (on Linux) or similar mechanisms (kqueue on FreeBSD, event ports on Solaris, etc.) to handle a large number of simultaneous connections in a non-blocking manner.
2. Nginx Deployment Architecture
- Master Process
- Reads and evaluates configuration files.
- Spawns, reconfigures, or terminates worker processes.
- Does not handle incoming traffic directly; instead, it delegates that work to the worker processes.
- Worker Processes
- Each worker process handles incoming connections using asynchronous I/O.
- The worker processes all share the same listening sockets (set up by the master), and the OS distributes incoming connections among these workers.
- Because of the event-driven approach, each worker can handle thousands of concurrent connections without spawning new threads or processes for each connection.
- Multi-Worker Model
- Typically, you configure 1 worker per CPU core, but this can vary based on the workload.
- Workers can process many requests in parallel via non-blocking I/O.
3. Comparison with HAproxy and LVS
There are three popular load balancing/reverse proxy tools: Nginx, HAProxy, and LVS. Although they may overlap in functionality, each has strengths and is often used in different scenarios:
- Nginx
- Layer 7 focused (application layer), though it can also do TCP/UDP (Layer 4) in newer versions.
- Excellent for HTTP and HTTPS load balancing, reverse proxying, caching, SSL termination, and compression.
- Can serve static files directly.
- Configuration can be more extensive (complex rewrites, caching rules, etc.).
- HAProxy
- Primarily focused on Layer 4 (TCP) and Layer 7 (HTTP) load balancing.
- Known for very high performance, stable load balancing features, and advanced metrics.
- Lacks some built-in features that Nginx has (like a native web server for static files or easy caching), but it’s extremely efficient and widely used for pure load balancing.
- LVS (Linux Virtual Server)
- Operates mostly at Layer 4 (Transport layer).
- Implemented in the Linux kernel (IPVS module) for IP-based load balancing.
- Highly efficient for raw TCP/UDP load balancing, but lacks Layer 7 capabilities (cannot do complex application-level logic, HTTP rewrites, etc.).
- Often used in very large-scale deployments where advanced application-layer features are not required.
In summary, if you need application-level manipulation (HTTP headers, caching, rewriting) and a built-in reverse proxy, Nginx is a good choice. If you only need extremely high-performance load balancing at either Layer 4 or 7, HAProxy is often preferred. LVS is best for very large-scale Layer 4 balancing scenarios when you don’t need application-layer features.
4. The epoll Model
- What is epoll?
- epoll is a Linux kernel system call interface for handling large numbers of file descriptors (network sockets) in an event-driven, non-blocking fashion.
- It uses a readiness notification model, telling you which descriptors are “ready” for read or write without having to actively poll them all the time.
- Why epoll?
- Scalability: With epoll, you can efficiently manage thousands (or even millions) of concurrent connections using a single or few worker threads.
- Performance: epoll reduces overhead because it avoids constantly scanning all connections; it only processes events for sockets that actually require attention.
- Memory Efficiency: epoll can be more memory-friendly compared to traditional poll/select.
- Nginx and epoll
- On Linux, Nginx’s worker processes typically rely on epoll to handle incoming connections.
- When an event (like a socket becoming readable) occurs, epoll notifies the worker, and the worker processes data without blocking on other I/O operations.
5. Core Components of Nginx
- Event Core
- Handles the registration and delivery of events (read/write) on network connections.
- Abstracted so that different OS-level event mechanisms (epoll, kqueue, etc.) can be used.
- HTTP/Stream Modules
- HTTP Module: Contains the main logic for handling HTTP requests, parsing headers, implementing rewrites, etc.
- Stream Module: Introduced for TCP/UDP load balancing, allowing Nginx to handle Layer 4 traffic.
- Configuration System
- Nginx config files (
nginx.conf
, plus various includes) define directives for modules and specify how traffic is handled.
- Nginx config files (
- Core Infrastructure
- Master Process manages worker processes, re-reading configs, and graceful restarts.
- Worker Processes handle the actual client connections, using the event-driven model.
- Additional Modules
- Cache, SSL, Gzip compression, Access Control, etc.
- Third-party modules can extend functionality further (e.g., Lua modules, security modules).
Something About Nginx
https://yyb345.github.io/NginxNotes/