Are URLs out of control?
While we all know how to search for things on the web, URLs are often overlooked. Geoff Meads explains how they work and why they keep expanding every time you try and search something new.
For the most part, using the internet is pretty easy these days. You write a Tweet, post the Tweet and everyone loves your Tweet. Or something like that.
ADVERTISEMENT
just occasionally though you come across things that look weird. You thought you understood how the whole internet thing works but then you see something a little odd and are left wondering what on earth it’s all about.
One example of this is the inexpiably long URL. Image the scenario – you see an advert on Facebook that looks interesting. You click on it and like what you see so you decide to send it to a friend on WhatsApp. But when you paste it into Messenger, the link URL is huge! Rather than a link that a link that looks like this:
https://www.nicethings.com/look-at-this-thing/
You get something like this:
https://www.nicethings.com/look-at-this-thing/?utm_source=facebook&utm_medium=cpc&utm_campaign=23853889015160068&utm_content=ONJ-1__&origin=facebook&fb_params (actually, the real example URL was even longer than this but we’ve shortened it to save paper)
What’s going on?
What we’re seeing here is a basic URL plus a whole lot of extra bits of data called query strings. This is data that can be useful but is sometimes invasive and always annoying. To see what these query strings are doing we first need to eliminate the ‘good’ bits of the URL.
Domains & protocols
The first parts of this URL you will probably recognise. Firstly, we have the data protocol ‘https’. This stands for ‘Hyper Text Transfer Protocol Secure’. The protocol sets an agreed way of transferring the data, in this case web page data. The ‘S’ for secure is not always present (until recently we had just ‘http’) but should be there for most trustworthy websites these days.
The next part ‘www.nicethings.com’ contains the sub-domain, domain and top-level domain (TLD) and are in reverse order of their size. A sub-domain is part of the domain which is part of the TLD. For more detailed information on this subject, you might like to look up my previous articles on the Domain Name System (DNS) on the Connected website.
Finally, we have the query strings themselves which, in the example above, start with the text ‘?utm_source=facebook’. Before we delve any further, we should talk a little about the character format of URLs and why they may be a little different to what you might expect.
URL escaping
You may also have noticed when looking at URLs in the past that not all characters are allowed. The most obvious of these are spaces. Where characters like the humble space are not allowed (we call them ‘illegal’ characters) we must encode the URL such that the illegal character becomes some other character or collection of characters. This is called ‘escaping’ and is used in many programming languages for a variety of reasons.
The piece of text that needs to be manipulated like this, is referred to as a ‘string’ so the process is properly called ‘escaping a string’. Programming languages that deal with web content (such as Javascript, PHP etc.) have inbuilt functions for this process. In PHP for example it is the ‘urlencode()’ function that takes a string and encodes it into a new string that is suitable for use in a URL. Pretty self-explanatory really.
For URLs, escaping a space character substitutes the space for the characters ‘%20’. As an example, the string ‘smart home installer’ would become ‘smart%20home%20installer’. If needed a simple un-escaping function will remove the ‘%20’ characters and replace them with a space again and you’re back to normal text.
Static & dynamic
Now we understand string escaping let’s return to our query strings.
There are two types of query string data that can be used on the end of a URL like this. These are static queries and dynamic queries.
Static query strings are most often used to display a certain type of data on the destination page. For example, several versions of the page text, each in a different language, could be sent from the server to your browser along with some extra page code to tell the browser which language to show. By manipulating the query string on the end of the URL the user can see the page in any of the languages. The resulting URL for a page in English might look like this:
https://www.mywebsite.com/my-page?lang=en
The beginning of the query string is marked with the ‘?’ character. The ‘lang’ part is a language parameter and the ‘en’ part is the value of the language parameter, this one representing English. The same page in French might therefore be:
https://www.mywebsite.com/my-page?lang=fr
You could also use this method to display product costs in different currencies and a whole host of other possibilities.
While we are looking at that URL, note that the ‘?’ character can only be used at the start of the query string. If more than one parameter needs to be sent in a URL then we separate the individual parameters with an ampersand (&). So, if we wanted a page in English with a dark layout then we might use:
https://www.mywebsite.com/my-page?lang=en&layout=dark
Note the new ‘layout’ parameter and its value ‘dark’.
Search queries
A great example of a Dynamic query is using it for a search term in a website search. This tells the server that we want specific content to be returned, not just a page with all possible content.
For example, if we want to see a set of results describing all content on a website that includes the words ‘smart’ and ‘home’ but not necessarily the two-word phrase ‘smart home’ then the URL might look like this:
https://www.mywebsite.com/search-results?terms=smart+home
Note that the text between the words ‘smart’ and ‘home’ is not ‘%20’ but ‘+’. This is to tell the server that we wish to treat the two words separately in our search rather than the specific two-word phrase ‘smart home’.
Tracking queries
A rather more annoying use of query strings is that of tracking. In the example I gave right at the beginning of this article, deep within the very long query string, was the query ‘&fbclid=’. Now we already know that the ampersand is a separator between two queries but what about the ‘fbclid’ bit? Well, this is Facebook’s ‘click id’ parameter which allows them to track who clicked what and when. Did someone say Big Brother?
Can you clean a URL?
Now, if you have an inquisitive mind, you might be asking yourself: “What happens if I get rid of everything at the end of a URL starting with the question mark symbol?” Well, the content of the resulting web page will depend on whether the query strings are static or dynamic.
However, most likely the page will load just fine, and you won’t be tracked either. So, if you are sharing links with friends, sometimes it pays to look at the link you’re sharing and clean it before you share it.
-
ADVERTISEMENT
-
ADVERTISEMENT
-
ADVERTISEMENT
-
ADVERTISEMENT