First, we need to know what a URL is.
A URL (Uniform Resource Locator) is a unique address that identifies a specific web page or resource on the internet. It typically consists of a domain name (such as www.example.com) followed by additional information about the location of the page or resource, such as its path (/help/index.html) and any query parameters (?q=example). URLs are used to locate and access web resources and can be typed into a web browser’s address bar to access a particular website or page.
A URL has four parts.
It has four parts. First is scheme. In this example, it is
http://. This tells the browser to connect to the server using a protocol called
HTTP. Another common scheme is
HTTPS, the connection is encrypted. The second part of the URL is a domain. In this example, example.com. It is the domain name of the site. The third part of the URL is path, and the fourth is resource. The difference between these two is often not very clear. Just think of them as a directory and file in a regular file system. They together specify the resource on the server we want to load.
This is done with a process called DNS lookup.
DNS stands for domain name system.
Think of it as a phone book of the internet. DNS translates domain names to IP addresses so browsers can load resources.
Now to make the lookup process fast, the DNS information is heavily cached. First, the browser itself caches it for a short period of time. And if it is not in the browser cache the browser asks the operating system for it. The operating system itself has a cache for it. Which also keeps the answer for a short period of time.
Now if the operating system doesn’t have it, it makes a query out to the internet to a DNS resolver.
This sets off a chain of requests until the IP address is resolved. This is an elaborate and elegant process. Just keep in mind that this process involves many servers in the DNS infrastructure and the answer is cached every step of the way. Again we’ll discuss this in detail in another video. Now finally the browser has the IP address of the server. In our case, again, example.com.
The browser establishes a
Next, the browser establishes a
TCP connection with the server using the IP address it got for it. Now there’s a handshake involved in establishing a
TCP connection. It takes several network round trips for this to complete. To keep the loading process fast, modern browsers use something called a keep-alive connection to try to reuse an established
TCP connection to the server as much as possible. One thing to note is that if the protocol is
HTTPS the process of establishing a new connection is even more involved. It requires a complicated process called
SSL/TLS handshake to establish the encrypted connection between the browser and the server. This handshake is expensive and the browsers use tricks like SSL session resumption to try to lower the cost. Finally, the browser sends a
HTTP request to the server over the established
HTTP itself is a very simple protocol. The server processes the request and sends back a response. The browser receives the response and renders