6 min read

How the web works!

We know so little of how the internet operates. We also have no clue of how the web works, don’t we?

The internet is so closely woven into our lives today that I hate to think of a world without one. Yet, we know so little of how the internet operates. We also have no clue of how the web works, don’t we?  


💡The "Internet" and the "Web" are not the same!

The internet more so refers to the mind-boggling infrastructure or hardware that helps millions of computers all over the world connect, network, and talk to each other.
The web also referred to as the World Wide Web (WWW) is the software that sits on top of the internet. The web helps us access information through the internet.



🎥FourZeroThree - YouTube

Heads up! Did you know I also run a YouTube channel by the same name “FourZeroThree”? I try to create short and entertaining videos out of the articles I post. Don’t forget to check it out!


An overview of the Internet

Let me briefly explain what the internet is.

  • The local network of devices in your home is connected to the hardware (network of computers) outside with the help of the modem and the wireless router.
  • Your modem is connected to your Internet Service Provider (ISP) through underground copper cables that eventually connect to fiber optic cables.
  • The ISP may either have single or multiple regional and national networks that connect and supply to different cities in the country. These networks apart from connecting within a country also connect to other countries.
  • The infrastructure is also an inter-continental connection of networks made possible by underwater fiber optic cables provided by many different cable companies across the world.

This infrastructure that makes up the internet is what makes sharing information possible. Staggering isn’t it?

Cutting to the chase - The World Wide Web

The web that is built on top of the internet allows users to access information. All of it boils down to a “Client and Server” interaction.

Client

A client is your desktop, laptop, or mobile device that you use to connect to the web. Also, the web browser (Chrome, Firefox, Brave, etc) in your device that you make use of to access the web, could be considered a client.

The client is responsible for making “requests” for resources on a server. This is usually accomplished by typing in a URL or clicking on links within your browser. A resource could be any file like a web/HTML page, an image, a video, or any file type.

Server

A server is a computer that hosts the resources (web pages, videos, images, etc) of a website. It “responds” to “requests” made by the client and delivers the resource to the client.

Client and Server
Client and Server

Domain names

Take this example URL,

https://www.google.com/

Here, google.com is the “domain name” of the website. A domain name has a few parts to it, each separated by a period.

  • The “google” portion of “google.com” refers to the name of the website. It is called the Secondary Level Domain(SLD)
  • The “.com” portion of “google.com” is called the Top Level Domain (TLD) and it gives you an idea of what sort of an entity the organization behind the website is. Entities could be commercial (.com), government organizations (.gov), educational institutions (.edu) etc. There are also Country code TLDs - “.in” for India, “.fr” for France, “.hk” for Hong Kong and so on.

Domain Name System

All devices, be it the client or the server is tied to an IP (Internet Protocol) address. The IP address serves to be an identifier for a device or a computer, so it can identify and talk with another device on the internet. An IP address is a set of four numbers each separated by a period. An example would be 192.154.23.10. Each number in the set can have a maximum value of “255”. That means an IP address could range from 0.0.0.0 to 255.255.255.255. This version of IP addressing is called “IPv4”. It allows for a combination of 4 billion IP addresses.

Every website or application (server hosting the website) we visit would have an IP address tied to it.  Imagine having to type out an IP address on the address bar of your browser for every website rather than having to type its name, like facebook.com or google.com. It would be humanly impossible to remember IP addresses for all websites.

This is where the Domain Name System or DNS comes as such a boon. The DNS allows for assigning human readable “names” to an IP address. This makes our lives easy. So what actually happens is that, once a request for a website is fired on your browser, the Domain Name System works behind the scenes to resolve the name of the website to its corresponding IP address.

Connecting to the website (name resolution by DNS)

  1. Let’s say, you either type the name of the website or click the link https://www.google.com on your browser .
  2. Your browser looks into its cache to check if the IP address for the typed domain name exists. If it doesn’t it asks the DNS resolver, a software in your operating system, for the same. If the answer is no again, your browser communicates with the “local DNS server”, this being commonly your Internet Service Provider (ISP), for requesting the address for www.google.com.
  3. If the local DNS server does not have the IP address for the domain name in its cache, it requests a server called the “root name server” for the same. The root name server stores information of all “Top Level Domain (TLD) name servers”. Since the TLD in your case is “.com”, it returns the IP address of the “.com” TLD name server to your local DNS server.
  4. The local DNS server requests the “.com” TLD name server for the address of the domain name and the TLD name server responds with the address for the “name server” of the domain requested.
  5. After reaching out to the name server of the domain requested, it responds with the IP address of the “web server” of the domain www.google.com.
  6. The local DNS server finally responds to your browser with the IP address of the domain requested.
  7. Once the IP address of www.google.com is resolved, your browser establishes a socket connection with the server of www.google.com.
  8. Your browser shoots a request to the corresponding IP address (www.google.com).

The google server receives the request and responds with the web page.

Name resolution by DNS

  • Now, contrary to believing that the whole webpage is loaded in the browser with a single response from the server, it actually takes multiple requests from the browser to load a single webpage.
  • Once the server responds, the browser parses the returned HTML (from the response) and may come across other resources like image files, CSS files and Javascript files. For each resource encountered, the browser will fire another request to the server and receive a response.
  • A single web page is usually a sum of several HTTP requests and responses.

A note on types of servers

If you have come this far, you may want to believe that there is a single web server that does all the heavy lifting behind the scenes. This may be true for a small website. However, in the real world, popular websites get millions of requests per second and would find it difficult to handle several million requests per second. Servers apart from handling requests, also have to run web application software and handle tons of data.

Servers are split into different types based on functionality in order to making handling of requests easier. Some of these types may include,

  • Web server - This server machine only handles the incoming HTTP requests. Eg- Apache.
  • Application server - An application server is responsible for running the software of the web application written with back-end programming languages like PHP, Ruby on Rails or Python.
  • Database server - This runs the Database Management system such as SQL, MySQL or MongoDB. This is where the data resides.

The ease of handling requests on scale is further enhanced by server farms. The idea here is to replicate many servers based on functionality and evenly distribute incoming requests with the help of another computer called a “load balancer”. This prevents any server from being overloaded.