What Happens When You Type a URL in Your Browser and Press Enter
The Adventure Behind Entering A Website
Have you ever wondered what exactly happens when you type in a URL? Well it actually can vary depending on what it is you’re entering! In this article, we’ll talk about some key points of the process:
- DNS Request
- TCP/IP
- Firewall
- HTTPS/SSL
- Load-Balancer
- Web Server
- Application Server
- Database
Along with the example URL: https://www.holbertonschool.com
Remember to check out the diagram I made for an overview of this process — it’s at the end of the article!
Let’s Get Started
Did you know the URL is just an alias?
Your browser, in fact, doesn’t actually find the web page by its URL… it’s found by the IP address!
IP stands for Internet Protocol, and there are two types: IPv4 and IPv6. Think of it as the unique identifier for the website.
So you, the client, (or more specifically your computer) needs to get the IP address for the domain name holbertonschool.com
in order to actually arrive at the website. How does it figure out the IP address? By making a DNS request!
DNS stands for Domain Name System… read this cute comic for a more visual representation!
Although there are a good amount of steps to DNS, since it’s a protocol, you can think of it being made up of two major parts: the cache and the translation.
The DNS Protocol
The protocol starts off by checking if the domain name request was cached somewhere previously. It’s a lot faster than having to translate it since that means the IP address would already be defined.
First it checks the browser’s cache, then the Operating System’s, and lastly the resolver’s. If the process makes it to the resolver and still cannot find the IP address, then that means the domain name will essentially have to be translated.
In order to do this, there’s a request made to the root in order to figure out which TLD to go to, which then defines which ANS holds the IP address information.
Let’s break that down a bit further:
The resolver is a server that is usually your Internet Service Provider. Although it differs between most people, each resolver will always be able to locate the root server.
The root server will basically tell the resolver where to locate the Top-Level Domain (TLD) server of the request. In the case of holbertonschool.com
— the TLD server would be the .com
TLD server.
After saving this information, the resolver arrives at the proper TLD server. Here, it is then told the Authoritative Name Servers (ANS) of our request. Yet again, it saves this information before going to the ANS — where it finally gets the IP address!
The IP address is saved in the resolver’s cache (which is cleared automatically on reboot), but finally! The DNS then responds the client’s request with the domain name’s IP address, such as 52.0.149.47
. (Just one of the possibly multiple IPv4 addresses hosting the website!)
Before we go further with this adventure on what happens once the client has the IP address…
How Are Requests and Responses Done?
We mentioned the client sending a request and receiving a response… but what does this mean exactly?
Well these requests and responses actually goes through another protocol: TCP/IP.
TCP/IP stands for Transmission Control Protocol/Internet Protocol. It’s a connection-oriented protocol that uses a method of error-checking.
TCP is the most common, reliable way of sending packets of information (data) over the internet. It basically follows this process of tracking each packet to make sure no piece of data is lost or corrupted during transit.
This is done by the recipient telling the sender if it has received a packet — which is done for each one in the process. If it doesn’t receive the packet properly, then the sender knows to resend the packet so the receiver gets them correctly.
There’s another commonly known protocol called UDP, User Datagram Protocol, that is faster than TCP… but only because it doesn’t do any error-checking.
Now… Let’s Continue Our Adventure!
Once the client has the IP Address, it can make a formal request to whatever web server(s) is hosting the website.
This is where the adventure can really vary depending on how the website is hosted. In the case of https://www.holbertonschool.com
, the client will send a request over the internet (by using TCP/IP) for the website, but before reaching the web server… it hits a firewall!
FIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIREFIRE —
Okay, no it’s not actually a wall made of fire (but do you like how I tried to make one?), it’s actually a kind of network security system/device! Firewalls can be implemented as either hardware, software, or a combo of both.
Firewalls are mostly used to prevent unauthorized users from accessing private networks that connect to the internet. They can allow remote access to a private network through secure authentication certificates and logins. Firewalls are also useful for helping block interactions that don’t meet the whatever security criteria was specified. This is done in order to help prevent possibly very harmful interactions, like those of viruses or hackers.
But before expanding on secure authentication certificates…
Remember what TCP/IP is? While packets of information are able to move over the internet and between locations without being lost or corrupted… how does the recipient actually know what to do with it?
It’s actually told by HTTP or in the case of https://www.holbertonschool.com
, HTTPS!
HTTP stands for Hyper Text Transfer Protocol and the S stands for Secure. You can easily tell if a website is secure or not by looking for a lock pad next to the URL. Some browsers actually warn their users if a website is unsecure before connecting… for very good reasons too!
Using HTTPS means all information (so the packets from earlier) are encrypted. This allows your browser to communicate with the website in confidentiality. This is especially important for websites that may hold sensitive information or do a lot of online transactions that needs to be censored, such as online banking or shopping.
So to summarize, while TCP/IP transports data over the Internet, HTTP(S) attaches the packets with information explaining the nature of the request/response so the recipient knows how to handle it.
An example of a HTTP(S) request would be “GET”, a command used to request a file from a web server. In the case the requested file was unable to be found or accessed for any reason, the HTTP(S) response would be a certain “Status Code” — which changes depending upon the error encountered.
Okay… but what actually makes HTTPS secure?
Secure authentication certificates!
While there are different kinds, remember we are following the adventure of https://www.holbertonschool.com
. So the particular Secure protocol our https
is using to encrypt communications is called SSL. This stands for the Secure Sockets Layer, which uses an ‘asymmetric’ Public Key Infrastructure (PKI) system.
What does that mean? Well the way it works is like this:
There are two keys: one is public and one is private. They are used to decrypt each other. For example: when something is encrypted with the public key then only the private key can decrypt it, and it’s the same vice-versa.
In this case, the web server always owns the private key while the public key is what was distributed for clients to use. So when there’s an HTTPS request, it’ll be encrypted with the public key and sent to the web server. The web server then uses the private key to decrypt it and once it’s ready to send it’s response, it’ll encrypt the packets using the private key.
So how does one get the public key? Through a SSL certificate!
When you request a HTTPS connection to a webpage, the client will be sent a SSL certificate for the browser before any other requests/responses can be made. This certificate contains the public key needed to begin the session on a uniquely secured connection.
Another kind of secure protocol that is also well known is TLS — Transport Layer Security. It also uses a similar PKI system as SSL.
Let’s Step Back to the Firewall…
We just made it through and have arrived at the load balancer. Similar to the firewall, the load balancer can also be implemented as either hardware or software.
An example of a load balancer would be HAProxy — High Availability Proxy. It’s free, common software that specializes in TCP and HTTP(S)-based applications.
It’s main purpose is to distribute web traffic to different servers. The amount of servers can vary depending on what is being hosted and how much traffic it gets. There can be as few as a handful of servers or be as many as thousands of servers.
So distributing web traffic is important as it basically spreads the work-load of the overall system over multiple “smaller” systems. The way this works is by having multiple servers host the same content. This makes the the website more readily available as it’s reliability and efficiency is increased, as even if one server crashes, there are other ones to direct traffic to.
Did you know there are different kinds of servers?
There are actually many kinds, first there are just plain servers that can be virtual or physical. Then there are web servers and application servers, which can also be hardware or software.
But again, to keep things simple, we’ll think in terms of https://www.holbertonschool.com
.
So let’s say there are physical servers, and each one is hosting a web server, an application server, and a database.
Some examples of web servers would be Nginx and Apache.
The web server, in this case, is a software that delivers the web pages. It has several parts to it but can hold an HTTP server. The HTTP server is another piece of software that understands web addresses (which are URLS) and HTTP.
There are also two kinds of web servers: static and dynamic. A static server is also known as a stack and sends whatever files it hosts “as-is”. Meanwhile, a dynamic web server is made up of a static web server with extra software — which can actually be an application server and a database.
The application server, on the other hand, updates the hosted files on the web server before sending them back to the browser/client.
Now before we can say the adventure is over…
How do these servers even hold their information in the first place? Through databases!
The most commonly known database management is MySQL — My Structured Query Language. The kind of information being saved changes the type of database to be used by a server.
A database is essentially a collection of information that can vary depending on what needs to be stored. For example, a web site might need to hold onto certain information about its users. What it does with this information can vary wildly between websites. Maybe the database uses this information in order to give different permissions for each user. Such permissions would change how they are allowed to interact with the website.
There are so many ways in which databases can be used for servers. They aren’t just meant to hold files or data, it allows for an easy management of updating and accessing.
As there are multiple servers hosting the same content, it’s highly possible the databases on these servers are actually slaves. They are read only and are basically clones of one master database, this way each server can be easily updated.
And There We Go! Our Adventure Complete!
You’re finally at https://www.holbertonschool.com
and can see the webpage. Can you believe so much happened in the time it took to load?
Thank you for reading! Look below to see an overview of this process!