What is XML? An overview of extensible markup language

XML is a text-based data format that is used, among other things, for storing data and platform-independent data exchanges online. XML forms the basis for other languages such as HTML and can be read by humans and machines alike.

What is XML?

XML stands for “Extensible Markup Language”. According to the definition of XML it’s a markup language used to represent data in a structured text file, which can be evaluated by humans and computers. The meta language was developed by the World Wide Web Consortium as early as 1998 and is now an indispensable part of web development. The current version of XML, the fifth edition, was published in 2008. The standard character encoding of XML documents is UTF-8.

The usage and function of XML

XML can be used in many different ways. It all begins with the creation of an XML file which is a text document with the extension .xml created using any text editor. For a document to be valid you must adhere to the syntax of the language. These documents can be used, for example, when HTML reaches its limits. This is often the case, for example, with website sitemaps. A sitemap.xml file is created here.

To read in the XML documents, a parser is needed. This provides a programming interface through which applications can access the XML document. Access happens via:

  • DOM: The XML document is represented as a tree structure and can be read in this way. Changing the tree structure and writing to the structure are also enabled.
  • Pull-API: Data from an XML document is processed sequentially and mostly event-based.
  • SAX: The XML document is treated as a sequential data stream.

Standard web browsers usually include a parser, which simplifies reading XML documents.

Use of XML

The extensible markup language is used across many different areas. For example, it can be used as the basis for other languages. HTML, the Hypertext Markup Language, is based on XML. SVG, the most popular file format for vector graphics, is also based on XML.

However, XML isn’t just relevant as a basis for other languages. In web development, pure XML is often used to set up website sitemaps. In addition, the extensible markup language can be used to create XML databases. These document-oriented databases are more flexible than relational databases and unsurprisingly very popular.

How is XML structured?

At first glance, XML documents resemble HTML documents. That’s because HTML is built on the extensible markup language and uses the same notation accessing various attributes and tags.

With XML you can distinguish between different document types. Document-centric XML documents are largely comprehensible to human readers without additional information and are only slightly structured. However, this reduces their machine readability enormously. In a sense, data-centric XML documents can be regarded as the opposite. The high degree of structuring goes hand in hand with good machine readability. For human readers, data-centric documents are intuitive, but hardly comprehensible. Finally, as a compromise, there are semi-structured documents.

Is XML a programming language?

The extensible markup language is a markup language that’s not a programming language in its own right. There’s no XML compiler, and you cannot create executable files using XML. The languages based on XML are not considered programming languages

The different XML elements

The most important component of the extensible markup language are elements. Their names can be freely chosen. Elements start and end with tags. To declare the element “house” in your XML document, for example, the code would look like this:       

<house></house>
You can list the contents of the house here.

Elements can be nested as desired. You can also add attributes to the elements that contain additional information about your elements. For example, if you want to add a roof and two numbered rooms to your house, this changes the XML document as follows:

<house></house>
You can list the contents of the house here.
	<roof></roof>
	The roof of the house.
	
	<room number="“1”"></room>
	Room 1 in your house.
	
< room number = “2”>
	Room 2 in your house.

The room numbers are the previously mentioned attributes.

Other components of XML documents can be comments of the form .

XML entities

You can access predefined entities with the extensible markup language. Creating your own entities is just as easy. These is specific content that is defined for later use. Other XML documents can be included using entities.

Important entities for using XML are, for example, <; for the < character or > for the > character. You can create your own entities in the XML document as follows:

<!--ENTITY eg “Example entity”-->

Differences between XML and HTML

HTML, now coming up to version 5 with HTML5, is based on XML and looks very similar to the extensible markup language. In comparison to HTML there are no predefined sets of permissible attributes and tags. With XML these can be defined by the programmers. Compared to HTML, XML is case-sensitive. Another difference is that XML requires closed tags. A line break beginning on
in HTML would be
in XML.