Why everyone should know how a URL is structured - HTTPS, Typosquatting and Open redirects
Awareness of how a URL is structured could help you be secure when faced with certain malicious situations, like phishing.
Of course, you know what a URL is, don’t you? The string of text that appears on the address bar of your browser? But do you know how a URL is structured? Wait, before you think knowing that is not necessary, let me tell you, this knowledge could help you be secure when faced with certain malicious situations.
So, here’s what I’ll do. I’ll explain the structure of the URL along with how you could use this awareness in being careful in real-world malicious scenarios. That way you would have an incentive to want to know how the URL is structured :)
🎥FourZeroThree - YouTube
A quick shout out! Here's a “video version” of the article. If you are the visual type, I recommend watching the video. I bet you’ll enjoy it :)
WHAT IS A URL?
URL stands for “Uniform Resource Locator”. Let’s just say, a URL is something like an address that helps specify where a particular “resource” of a website is “located”. A resource could be any file like a web/HTML page, an image, or a document. So when you see something like
https://example.com/public/new-blog-post
understand that the “new-blog-post” (web/HTML page) is the resource located in the path/directory “public” of the website “https://example.com”. URLs could also simply refer to the address of a website - https://example.com.
STRUCTURE OF A URL
Let’s break the URL chunk by chunk.
Protocol/scheme (HTTP/HTTPS)
The highlighted portion of the URL refers to a “protocol” or “scheme”. HyperText Transfer Protocol abbreviated as “HTTP” is basically a set of rules that help browsers and servers talk to each other. It is this protocol (or set of rules) that helps in transfer of hypermedia (includes hypertext, sound, videos, graphics) on the web.
“HTTP” helps transfer data (from browser to server and vice-versa) in plain text. For example, when you type in your credentials on a website and click the login button, a “request” is sent from your browser to the server (hosting the website). This request is a data packet that would include your credentials.
Now, if the website were to use “HTTPS” (S - stands for Secure), data packets would be encrypted and would look like garbage (of course, the server would have a private key to decrypt the data packet)! This is a “Transport Layer” protection, where the data being transferred is encrypted. Website owners would have to apply for, what is called, a “Transport Layer Security” (TLS) certificate to implement TLS over HTTP. This would ensure that data is transferred over an encrypted connection. You would know a website has implemented TLS when you see a “padlock” besides “https” on the address bar.
Real-world significance (Websites with and without HTTPS)
Being aware of a website’s implementation of TLS is especially significant when you browse websites on a public Wi-Fi/network. There are tools (example - Wireshark) that a malicious hacker could use to intercept data in your network. This means that the hacker could capture the data packet sent from your browser and read it before it could hit the intended address/server (Troy Hunt has written about this in detail).
Don't key in private details like usernames, passwords or credit card numbers (or anything else considered private) on such websites.
💡Remember
This is hardly the problem today. Internet users are well aware of this and most websites today employ HTTPS. The trouble lies in users thinking websites having HTTPS are inherently safe or legitimate. The padlock or HTTPS only signifies that data is transferred over an encrypted connection. It does not mean that the website itself is not malicious. TLS certificates are quite easy to get these days and there are scam websites masquerading as genuine entities. Not all websites with HTTPS are legitimate.
Domain/subdomain
The highlighted portion of the URL refers to the domain name of the website. The domain name basically resolves to an IP address of a computer (server) on the internet.
Here, example.com is the domain name of the website. A domain name has a few parts to it, each separated by a period.
The “example” portion of “example.com” refers to the name of the website. It is called the Secondary Level Domain (SLD)
The “.com” portion of “example.com” is called the Top Level Domain (TLD) and it gives you an idea of what sort of an entity the organization behind the website is. Entities could be commercial (.com), government organizations (.gov), educational institutions (.edu) etc. There are also Country code TLDs - “.in” for India, “.fr” for France, “.hk” for Hong Kong and so on.
But what if you encounter a URL like this,
Sometimes website owners prefer to have specific sections on their website. So instead of the web page/HTML file “new-blog-post” existing in a path/directory like “public” (previous example), this URL shows that the web page/HTML file “new-blog-post” exists in the subdomain called blog. Here blog is the subdomain of domain example.
💡Takeaway
The takeaway that you have to remember is that, the part/label that comes just before the TLD (the SLD), is the main/primary website.
Real-world significance (Typosquatting)
💡Pay attention to the URL!
Always pay attention to the domain/subdomain in the URL. Bad guys employ something called a typosquatting attack, where they use a fake URL to impersonate a genuine or legitimate entity in order to commit fraud or spread malware.
Lets say there is a website called “https://example.com”.
Here are the some of the ways scammers would try to spoof the domain name in order to trick you into clicking it.
https://example.scamwebsite.com -> “scamwebsite.com” is the main website not “example.com”
https://ex-ample.com
-> This is a hyphenated domain. “ex-ample.com” is not the same as “example.com”
https://example.com.scamwebsite.com - “scamwebsite.com” is the main website. Note that “example.com” here, is a subdomain of “scamwebsite.com”
https://example.co - “example.co” has a wrong TLD extension and is not the same as “example.com”
https://www-example.com - This is another example of a hyphenated domain. “www-example.com” is not a subdomain of “example.com”. Note the hyphen in “www-example.com”. A genuine subdomain would be separated from the main website domain (SLD) by a period like in “www.example.com”.
https://exampIe.com - This is an example of a domain with a misspelling. Note that the “l” in the domain name has been replaced with “I” (“i” in capitals).
TLS certificates are quite easy to get these days and most scam websites serve HTTP over TLS making them look genuine to the naïve.
Note: Google has an amazing website (Can you spot when you’re being phished?) that offers an interactive quiz which helps you learn how to identify phishing attacks. You could try to apply some concepts you learnt here as well :)
Path/Directory (Query strings and parameters)
The highlighted portion of the URL refers to the path/directory called “public” where the resource (web page/HTML file) “new-blog-post” resides. This is a clean URL and is called a “REST-style” URL.
However, there are URLs that do not conform to the REST-style pattern. A URL could also look like this,
The stuff appearing after the “?” symbol is called a query string
“&” is called a separator
“type” and “post” are called parameters
“public” and “new-blog-post” are called values
Real-world significance (Open redirect attacks)
💡Open redirect
Open redirect is slightly trickier to identify since it may be tougher to spot. Open redirect attacks are especially executed with URLs having parameters.
Take this URL for example,
https://auth.example.com/login?redirect=https://example.com/account
Here, the website example.com hosts the login page in a subdomain called auth.example.com. Once login is successful, the website would redirect you to your account page in example.com. The URL would look like this after a successful login -
https://example.com/account
💡Pay attention to the parameters in the URL
An attacker could tamper with the “redirect” parameter to make it look like this -
https://auth.example.com/login?redirect=http://example.attacker.com
The attacker could send you this link via a phishing email and entice you to click it. Note how the attacker could cleverly make use of the “example” subdomain for his website attacker.com to make it look like example.com.
Since the URL begins with https://auth.example.com (if you hover over the link), the link may look genuine. If the website example.com is susceptible to an open redirect (secure websites are well coded to not be susceptible to open redirects), after logging (in https://auth.example.com), you would be redirected to the attacker’s website -
http://example.attacker.com
The website attacker.com could,
make you automatically download malware,
or spoof the login page of example.com to trick you into entering your credentials again.
Open redirects need not always be straight forward to detect. The attacker may make the URL look like this -
https://auth.example.com/login?redirect=http%3a%2f%2fexample%2eattacker%2ecom
The symbols are all URL encoded. At times the entire redirect URL could be encoded,
https://auth.example.com/login?redirect=%68%74%74%70%3a%2f%2f%65%78%61%6d%70%6c%65%2e%61%74%74%61%63%6b%65%72%2e%63%6f%6d
Identifying a malicious intent in this URL could be hard.
💡Don't click links you find suspicious
The best way to avoid falling prey to open redirects is to not click on links you are suspicious of, especially if the link is pretty shady like the one above. Do not immediately click links you receive via email, social media or text messages (unless you are very sure it is genuine). Make it a habit to inspect them first.
🏃♂️Where to next?
And hey, by the way, please do give FourZeroThree a shout-out to your friends and colleagues, would you? Would really appreciate it! Cheers and happy reading :)