Since the beginning of the internet, URLs have provided a uniform method of clearly iden­ti­fy­ing network resources: the URL – an RFC standard since 1994 – provides internet users with general syntax that can localize and retrieve public content on demand. This makes the URL one of the most basic tech­nolo­gies of the internet. Internet users can use URLs on a daily basis to access resources through a browser, and it isn’t just limited to ad­dress­ing web pages.

In this article, we will introduce you to the structure that behind a URL and focus on key ap­pli­ca­tion areas.

What is a URL?

The ab­bre­vi­a­tion “URL” stands for “Uniform Resource Locator”. It is a sub­species of uniform resource iden­ti­fiers (URIs). URL structure also cor­re­sponds to URI syntax.

Iden­ti­fiers make it possible to locate resources using a unique iden­ti­fi­er, both locally and worldwide on the internet. As an “Iden­ti­fi­er” sub­species, URLs are sometimes used in­ter­change­ably with the term “Internet address”. This is because of the URLs main use: ad­dress­ing web pages. However, URLs are not limited just to this function. Files in the local file system can be localized using URLs, for example. This means that every internet address is a URL but not every URL is an internet address.

De­f­i­n­i­tion

The ab­bre­vi­a­tion URL stands for “Uniform Resource Locator”. URLs allow you to uniquely address resources and request them as needed. For example, internet users use URLs in the browser to access web pages from the address bar, or download files.

URL structure

Every URL consists of a formula and a formula-specific part: 

  • Formula: the URL formula specifies both the kind of resource and the method needed to access it. The URL often has the same name as the accessor’s protocol at the ap­pli­ca­tion level. Common formulae are mailto, file, ftp or http/https
  • Formula-specific part: depending on the kind of formula, the formula-specific part of the URL is made up of a number of segments that contain the resource’s location as well as optional pro­cess­ing pa­ra­me­ters.

The separator between the formula and the formula-specific part is a colon. Depending on the formula, you may also need two slashes, which were commonly used in the early days of the internet, but have no specific function today. 

A URL is based on the following URI-syntax:

Scheme:[//[user[:password]@]host[:port]][/path][?query][#fragment]

Each segment of the formula specific part has its own function. The user, password, host and port sections are called “Au­thor­i­ties”. The authority indicates which computer a resource can be found on and what name is assigned to it.

  • user and password: the user and password sections contain the username and password of the person au­tho­rized to access the resource and they are separated by a colon. Both details are only required if the resource requests au­then­ti­ca­tion. Username and password are separated by an @ sign from the host URI segment
  • host: the URI host segment usually includes a Domain including a top, second- and third-level domain, in­di­cat­ing which specific host should retrieve the resource. Al­ter­na­tive­ly, the computer’s name can be specified in the form of an IP address
  • port: by spec­i­fy­ing a port number, you can control a specific TCP/IP port in the network. Since most formulae already have a standard port, a separate entry is optional. For example, standard ports are 80 for HTTP, 443 for HTTPS or 21 for FTP. A port number should only be given if no general port is defined or if a non-default port is being used for standard trans­mis­sions. The port number is separated from the host section by a colon

The “authority” domain is usually specified in human-readable form. Computers, on the other hand, work with IP addresses. Visiting a website requires an in­ter­me­di­ate step, im­per­cep­ti­ble to the user: the name res­o­lu­tion based on the Domain Name System (DNS).

Note

DNS refers to an IP-based network service that is re­spon­si­ble for the domain name res­o­lu­tion in an IP address. Internet service providers require a DNS-Server. When an internet user visits a web page, their router forwards the request to the re­spon­si­ble DNS server first. The DNS server then looks for the matching IP address for the requested domain and sends it back. Once the router has received the chosen IP address, the cor­re­spond­ing web server can be addressed.

The URI’s authority is followed by an in­di­ca­tion of where the resource is located on the computer, as well as the optional com­po­nents: query string and fragment iden­ti­fi­er.

  • path: the URI segment path contains the resource file reference and reveals its location on the target computer. The file path always starts with a slash (/)
  • query: some websites contain ex­e­cutable com­po­nents and, in addition to the file path, expect a “query string” (also called a query part). This includes pa­ra­me­ters (such as user input) that are passed along with the URL and processed by the server. This is customary for dynamic web pages that are only created at the time of retrieval from database data records. The query string is always initiated with a question mark (?)
  • fragment: if a specific location in a resource needs to be ref­er­enced, the URI ends with a fragment iden­ti­fi­er. This is separated with a hashtag (#) and usually refers to a label uniquely iden­ti­fied by an index in an HTML document – like a sub­head­ing, for example

The elements of URI syntax that contain a URL depends on the formula. The URL build is de­ter­mined by the type of resource. The following list includes the most common URL types:

http

Web pages are retrieved using the HTTP Protocol (Hypertext Transfer Protocol) or HTTPS (Hypertext Transfer Protocol over SSL). The latter transmits data over a secure con­nec­tion and URL structure is the same for both protocols.

There is usually no au­then­ti­ca­tion required when re­triev­ing a URL. The “authority” only includes the domain where the chosen website can be accessed. The username and password are omitted.

mailto

Mailto is a URL formula for email addresses that allows website operators to include hy­per­links to their website. When an internet user clicks on a mailto link, most browsers open the system’s default email program and a new email window. The email address is specified in the formula-specific part and is entered as the recipient address in the email window. The user does not have to start the program them­selves, nor do they have to transmit the email address manually.

In URLs that include the mailto formula, the addressee’s email address is listed in the formula specific part. The formula and formula specific part are also separated by a colon, elim­i­nat­ing the double slash. Using a query string, you can set mail headers to fill the subject and text of the email, for example.

file

The formula file is used to call specific files on your own computer. If you enter the correct file path as a URL in the address bar of a web browser, it will call up the requested directory or file.

Since the formula file refers to a local resource, the authority spec­i­fi­ca­tion is omitted. The file path always starts with a slash. This results in a URL with three con­sec­u­tive slashes.

ftp

URLs that have the FTP (File Transfer Protocol) formula allow access to files located on another machine (remote access). The file transfer protocol FTP of the same name is used for trans­mis­sion.

A user who wants to access files in a remote file system using FTP usually has to au­then­ti­cate itself. Therefore, URLs that reference FTP resources usually contain access data (username and password).

Permitted char­ac­ters in a URL

The URL standard only supports a limited character set of selected American Code for In­for­ma­tion In­ter­change (ASCII) char­ac­ters. In addition, various char­ac­ters already have certain functions, like iden­ti­fy­ing in­di­vid­ual segments and sub­se­quent­ly allowing a URL to decompose or be processed.

The following char­ac­ters have already been assigned a specific function in the URL standard:

: / ? # [ ] @ $ & ' ( ) * + , ; =

For example, the question mark (?) initiates a query string. Various pa­ra­me­ters in the query string are delimited with the ampersand (&). The separator between parameter name and value is the equal sign (=). The hash (#) initiates the jump label.

Char­ac­ters without a pre­de­fined function include all letters and digits and the special char­ac­ters mentioned below:

A-Z, a-z
0-9
- . _ ~

Other than the ASCII char­ac­ters listed here, non-ASCII char­ac­ters may now be used in URLs and must be rewritten. It is also possible to rewrite one of the reserved char­ac­ters to prevent it from being in­ter­pret­ed by its pre­de­fined meaning. To convert ASCII char­ac­ters, the URL standard uses the masking character % (percent) and the ASCII value table in hexa­dec­i­mal notation. Non-ASCII char­ac­ters are also rewritten using percent rep­re­sen­ta­tion. RFC 3986 rec­om­mends ASCII-com­pat­i­ble encoding based on UTF-8. This rec­om­men­da­tion is not binding and the service providers ul­ti­mate­ly decide which encoding is used. In contrast, domain special char­ac­ters are converted to ASCII-com­pat­i­ble strings using punycode. Learn more about encoding with punycode in our article on in­ter­na­tion­al domain names.

Tip

A free URL encoder is available on web con­sul­tant Eric A. Meyer’s website.

The dif­fer­ence between absolute and relative URLs

URLs can be absolute or relative URLs. Absolute URLs are uni­ver­sal­ly valid and include all segments required for the given formula. Relative URLs, on the other hand, are only valid in specific contexts and inherit certain prop­er­ties from them, so that cor­re­spond­ing URL sections become redundant and can be omitted. The in­for­ma­tion that context provides includes the protocol, domain or even path to the resource.

Relative URLs are used in webpage hy­per­links that lead to different subpages of a website. The link URL is the data from the webpage it’s leading to.

The following examples show a link from www.example.org/index/page1 to www.example.org/index/page2 with absolute or relative URLs.

Hyperlink with an absolute URL:

<a href="http://www.example.org/index/seite2">Linktext</a>

Hyperlink with a relative URL:

<a href="/index/seite2">Linktext</a>

Relative URLs have the advantage that they are sig­nif­i­cant­ly shorter and con­tribute to a stream­lined, clear source code. In addition, hy­per­links with relative URLs fa­cil­i­tate domain re­lo­ca­tion. If a website domain changes, it must be exchanged manually with an internal link that has an absolute URL or redi­rect­ed using redirects. This effort is un­nec­es­sary for relative URLs that don’t have an “authority”, and thus, don’t need domain in­for­ma­tion.

Go to Main Menu