WebSockets, the "why" and "how"
The WebSocket protocol facilitates message passing between a client and server providing a bidirectional, full-duplex communications channel that operates over HTTP through a single TCP/IP socket connection meaning that it is a persistent connection between a client and server.
Why the need for WebSocket?
The need for WebSockets was raised out by the limitations imposed by the HTTP-based technology.
With HTTP, a client requests a resource and the server responds with the requested data. HTTP is a strictly unidirectional protocol meaning that the data sent from the server to the client must be first requested by the client.
A way of circumventing this limitation is the long-polling technique. The long-polling means that a client makes an HTTP request with a long timeout period and the server uses that long timeout to push data to the client. Long-polling works but the resources on the server are tied up throughout the length of the long-poll even when no data is available to send. This makes it inefficient and problematic.
WebSockets allow for sending message-based data, similar to UDP, but with the reliability of TCP. WebSocket uses HTTP as the initial transport mechanism, but keeps the TCP connection alive after the HTTP response is received so that it can be used for sending messages between client and server.
How does it work?
WebSockets begin life as a standard HTTP request and response. Within that request response chain, the client asks to open a WebSocket connection. The server responds and if this initial handshake is successful, the client and server have agreed to use the existing TCP/IP connection that was established for the HTTP request as a WebSocket connection.
Data can now flow over this connection using a basic framed message protocol. Once both parties acknowledge that the WebSocket connection should be closed, the TCP connection is torn down.
Establishing a WebSocket connection
WebSockets do not follow the HTTP protocol so it does not use the http:// or https:// scheme. WebSocket URIs use a new scheme ws: (or wss: for a secure WebSocket). The remainder of the URI is the same as an HTTP URI: a host, port, path and any query parameters.
"ws:" "//" host [ ":" port ] path [ "?" query ]
"wss:" "//" host [ ":" port ] path [ "?" query ]
WebSocket connections can only be established to URIs that follow this scheme.
If you see a URI with a scheme of ws:// (or wss://), then both the client and the server MUST follow the WebSocket connection protocol to follow the WebSocket specification.
WebSocket connections are established by upgrading an HTTP request/response pair. A client that supports WebSockets and wants to establish a connection will send an HTTP request that includes a few required headers:
Connection: Upgrade
The Connection header generally controls whether or not the network connection stays open after the current transaction finishes.
A common value for this header is keep-alive to make sure the connection is persistent to allow for subsequent requests to the same server.
During the WebSocket opening handshake we set to header to Upgrade, signaling that we want to keep the connection alive, and use it for non-HTTP requests.
Upgrade: websocket
The Upgrade header is used by clients to ask the server to switch to one of the listed protocols, in descending preference order. We specify websocket here to signal that the client wants to establish a WebSocket connection.
The Sec-WebSocket-Key is a one-time random value generated by the client. The value is a randomly selected 16-byte value that has been base64-encoded.
The only accepted version of the WebSocket protocol is 13
Sec-WebSocket-Version: 13
Any other version listed in this header is invalid.
Together, these headers would result in an HTTP GET request from the client to a ws:// URI like in the following example:
GET ws://example.com:8181/ HTTP/1.1
Host: localhost:8181
Connection: Upgrade
Pragma: no-cache
Cache-Control: no-cache
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: <Base64 key>
Once a client sends the initial request to open a WebSocket connection, it waits for the server’s reply. The reply must have an HTTP 101 Switching Protocols response code.
The HTTP 101 Switching Protocols response indicates that the server is switching to the protocol that the client requested in its Upgrade request header. In addition, the server must include HTTP headers that validate the connection was successfully upgraded:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: <Base64 key>
// Connection: Upgrade
// Confirms that the connection has been upgraded.
// Upgrade: websocket
// Confirms that the connection has been upgraded.
// Sec-WebSocket-Accept: <Base64 key>`
After the client receives the server response, the WebSocket connection is open to start transmitting data as described at the start of the article.