Advanced Web Technologies

Introduction to XML

Objectives

The main objective of this unit is the reader to describe and create well-formed an XML Document. The reader should able to learn how to convert XML documents to the relational and open database and vice versa.


1.1 What Is XML?

Many computer systems contain data in incompatible formats. A time-consuming challenge is to exchange data between such systems. XML is a generic data storage format that comes bundled with a number of tools and technologies that should make it easier to exchange specific XML 'applications' between incompatible systems. Since XML is open and generic, it is expected that as time progresses, more and more organizations and people will jump onto the XML bandwagon, both developers and data users. This should make XML the ultimate viable technology for certain types of data exchange.

The eXtensible Markup Language (XML) is a meta-markup language defined by the World Wide Web Consortium (W3C). XML is used not only for exchanging information, but also for publishing Web pages. XML's very strict syntax allows for smaller and faster Web browsers and as such is well suited for use with Personal Digital Assistants (PDAs) and cellphones. Web browsers that interpret HTML documents, on the other hand, are bloated with programming code to compensate for HTML’s not so strict coding.

The types of data generally well suited for encoding as XML are those where field lengths are unknown and unpredictable and where field contents are predominantly textual.

An XML schema allows for the exchange of information in a standardized structure. A schema defines custom markup tags that can contain attributes to describe the content that is enclosed by these tags. Information from the tagged data in the XML document can be extracted using an application called a “parser”, and with the use of an XML stylesheet the data can be formatted for a Web page.

XML's power lies in the combination of custom markup tags and content in a defined XML document. The purpose of eXtensible Markup Language (XML) is to make information self-describing. Based on SGML, XML is designed to support electronic commerce. 


1.2 The Difference between XML and HTML

XML should steadily replace HTML on many Web sites because of some key advantages. The major differences between XML and HTML are captured in the following table.

Exhibit 4: XML vs HTML


XML   HTML
 

Information content

 

Information presentation

 Extensible set of tags  Fixed set of tags

Data exchange language

 

Data presentation language

 

Greater hypertext linking

 

Limited hypertext linking


1.3 Examples of XML Applications

CML : Chemical Markup Language. Example CML document snippet:

WML: Wireless Markup Language for WAP services

MathML: Mathematical Markup Language

WeatherML: Weather Markup Language

S2ML: Security Services Markup Language

cXML: Commerce XML

BRML Business Rules Markup Language

GML: Geography Markup Language

HRMML: Human Resource Management Markup Language.


1.4 Key features of XML

  • XML is case sensitive

  • XML is a widely accepted open standard.

  • XML allows to clearly separate content from form (appearance).

  • XML is text-oriented.

  • XML is extensible.

  • XML is self-describing.

  • XML is universal; meaning internationalization is no problem.

  • XML is independent of platforms and programming languages.

  • XML provides a robust and durable format for information storage.

  • XML is easily transformable.

  • XML is a future-oriented technology.


XML will improve the efficiency of data exchange in several important ways, which include:

write once and format many times: Once an XML file is created it can be presented in multiple ways by applying different XML stylesheets. For instance, the information might be displayed on a web page or printed in a book.

hardware and software independence: XML files are standard text files, which means they can be read by any application.

write once and exchange many times: Once an industry agrees on an XML standard for data exchange, data can be readily exchanged between all members using that standard.

Faster and more precise web searching: When the meaning of information can be determined by a computer (by reading the tags), web searching will be enhanced. For example, if you are looking for a specific book title, it is far more efficient for a computer to search for text between the pair of tags <booktitle> and </booktitle> than search an entire file looking for the title. Furthermore, spurious results should be eliminated.

data validation XML allows data validation using XSD or DTD which is a contractual agreement between two interacting parties.


1.5 The components of an XML document

XML document is a collection of an XML declaration, elements, attributes, comments and character data (CDATA).

1.5.1 XML declarations

Syntax

<?xml version='1.0' encoding='character encoding' standalone='yes|no'?>

XML documents can contain an XML declaration that if present, must be the first construct in the document. An XML declaration is made up of as many as three name/value pairs, syntactically identical to attributes. The three attributes are a mandatory version attribute and optional encoding and standalone attributes. The order of these attributes within an XML declaration is fixed.

The XML declaration begins with the character sequence <?xml and ends with the character sequence ?>. Note that although this syntax is identical to that for processing instructions, the XML declaration is not considered to be a processing instruction. All XML declarations have a version attribute with a value that must be 1.0

The character encoding used for the document content can be specified through the encoding attribute. XML documents are inherently Unicode, even when stored in a non-Unicode character encoding. The XML recommendation defines several possible values for the encoding attribute. For example, UTF-8, UTF-16, ISO-10646-UCS-2, and ISO-10646-UCS-4 all refer to Unicode/ISO-10646 encodings, whereas ISO-8859-1 and ISO-8859-2 refer to 8-bit Latin character encodings. Encodings for other character sets including Chinese, Japanese, and Korean characters are also supported. It is recommended that encodings be referred to using the encoding names registered with the Internet Assigned Numbers Authority (IANA). All XML processors are required to be able to process documents encoded using UTF-8 or UTF-16, with or without an XML declaration. The encoding of UTF-8 and UTF-16 encoded documents is detected using the Unicode byte-order-mark. The XML declaration is mandatory if the encoding of the document is anything other than UTF-8 or UTF-16. In practice, this means that documents encoded using US-ASCII can also omit the XML declaration because US-ASCII overlaps entirely with UTF-8.

Only one encoding can be used for an entire XML document. It is not possible to “redefine” the encoding part of the way through. If data in different encodings need to be represented, then external entities should be used. If an XML document can be read with no reference to external sources, it is said to be a stand-alone document. Such documents can be annotated with a standalone attribute with a value of yes in the XML declaration. If an XML document requires external sources to be resolved to parse correctly and/or to construct the entire data tree (for example, a document with references to external general entities), then it is not a stand-alone document. Such documents may be marked standalone='no', but because this is the default, such an annotation rarely appears in XML documents.

Example of xml declarations

<?xml version='1.0' ?>

<?xml version='1.0' encoding='US-ASCII' ?>

<?xml version='1.0' encoding='US-ASCII' standalone='yes' ?>

<?xml version='1.0' encoding='UTF-8' ?>

<?xml version='1.0' encoding='UTF-16' ?>

<?xml version='1.0' encoding='ISO-10646-UCS-2' ?>

<?xml version='1.0' encoding='ISO-8859-1' ?>

<?xml version='1.0' encoding='Shift-JIS' ?>

1.5.2 XML Elements

Elements are the basic building blocks of an XML document.

Syntax of an XML Element

<name_of_the_element>text</name_of_the_element>

Following are the basic characteristics of XML Elements

  • Most of the data contained in an XML document, is enclosed within XML Elements.

  • An Element in an XML document starts with an "<" tag and ends with an ">" tag.

  • In XML documents the name of the Elements is user-defined. Having said that, users or authors can use Element names of their own choice. Here is an example: <tutorial> We are learning fundamentals of XML </tutorial>

Example xml file: First.xml

<?xml version="1.0"?>

<xml>

<tutorial>w3resource

<one>html</one>

<two>xml

<subtopic1>Learning xml</subtopic1>

<subtopic2>Learning DTD</subtopic2>

<subtopic3>Learning XSLT</subtopic3>

<subtopic4>Learning xpath</subtopic4>

</two>

<three>css</three>

<four>javascript</four>

<five>ajax</five>

<six>php</six>

<seven>mysql</seven>

<eight>svg</eight>

</tutorial>

</xml>

Children of an element reside with the open and close tags of their parents.

<?xml version="1.0"?>

<world>

<continents>There are five continents </continents>

</world>

Note: The element world has a child element continent in this example. Note that the element continent is started and finished within the opening tag<world> and closing tag </world> of the element with tagname world, which is the parent of the continents element.

In the following example, loss element does not have any content. So it is being written as <loss></loss>

<?xml version="1.0"?>

<series>

<loss></loss>

</series>

But writing it like the following is also allowed:

<?xml version="1.0"?>

<series>     

   <loss />

</series>

This is referred as empty element shorthand.

  • Unlike HTML, XML does not have any predefined tagname. So designers of an XML document can decide the tagnames of an XML document. For all of the examples related to an element you surely have noticed that we have used the name of the element as per our requirement.

  • Element names of an XML document are case sensitive.

         <?xml version="1.0"?>

<W3RESOURCE>This is the largest online tutorial on web development </w3resource>

But this is the correct use

<?xml version="1.0"?>

   <w3resource>This is the largest online tutorial on web development </w3resource>

Element names must begin with a letter or an underscore(_). The initial character of the name of an element can be followed by any number of letters, digits, periods(.), hyphens(-), underscores or colons (:). However, because colons are used in the syntax of the namespaces in XML, they should not be used in naming an element, unless as described by that specification.

Here is a table displaying correct an incorrect examples of names of xml elements:

Examples of Names of XML elements
Capture_KLOs8EO.JPGExamples of Names of XML elements

Examples of Names of XML elements


  • Element name that starts with a character sequence xml, are reserved for future use.
  • Contents of the elements can be textdata, no-data (referred as empty an element) and other elements (called as child element).



1.5.3 Attributes

The XML begin tag may also have attributes.

  <MyTag attribute1="nothing" attribute2="nothing else">Text</MyTag>

Any attribute value must consist of one of the following types of attributes available to an XML document.

  • CDATA

  • ENTITY

  • Enumeration

  • ID

  • IDREF

  • IDREFS

  • NMTOKEN

  • NMTOKENS

  • NOTATION

XML Quotation marks for attribute values

Attribute values must be enclosed in quotation marks. In html, attribute values don't have to be in quotes for a browser to present a document. But that does not work for xml. If quotes are removed, an xml parser will generate an error. Its your choice to use a single or a double quote, but you need to be consistent in using them. The following example will tell you what is wrong and what is correct:

These are a correct use of quotes:

<tutorial type="text">
<tutorial type='text'>

But this is a wrong use of quote:

<tutorial type="text'>

1.5.4 Comments

Syntax for writing comments in XML: 

<!-- This is a comment -->

Comment can be in one line or in multiple lines such as:

<!--  Line 1

Line 2

Line 3 -->

1.5.5 XML Character Data (CDATA)

CDATA sections can be used to “block escape” literal text when replacing prohibited characters with entity references is undesirable. CDATA sections can appear inside element content and allow < and & character literals to appear. A CDATA section begins with the character sequence <![CDATA[ and ends with the character sequence ]]>. Between the two character sequences, an XML processor ignores all markup characters such as <, >, and &. The only markup an XML processor recognizes inside a CDATA section is the closing character sequence ]>. The character sequence that ends a CDATA section ]]> must not appear inside the element content. Instead, the closing greater-than character must be escaped using the appropriate entity &gt;. CDATA sections cannot be nested.

Syntax of a CDATA section:

<![CDATA[ Information ]]>

Example

<sometext>

    <![CDATA[ They're saying "x < y" & that "z > y" so I guess that means that z > x ]]>

</sometext>

1.5.6 Entities

By specification, XML documents consist of a set of storage units. These storage units are called Entities. Entities act as a replacement mechanism. A similar kind of example of using entities is, creating and attaching mail-merge to Microsoft Word documents. We create a database of names and address and attach them to an MS Word document.

To explain in another note, we can say that Entities can be used as a kind of shortcut that allows you to embed blocks of text or even entire documents and files into an XML document. This makes updating documents across networks very easy.

The following are the uses of entities in XML document

  • Denoting special markup, such as the > and < tags.

  • Managing binary files and other data not native to XML.

  • Reducing the code in DTD by bundling declarations into entities.

  • Offering richer multilingual support.

  • Repeating frequently used names in a way that guarantees consistency in spelling and use.

  • Providing for easier updates. By using entities in your markup for items you know will be changed later-such as weather reports or software version changes-you greatly improve dynamic document automation.

  • Merging multiple file links and interaction.

In general, we have three types of entities: internal entities, external entities, and parameter entities. Internal Entities are entities that refer to entities whose definitions can be found entirely within a document's definition. External Entities are entity references that refer to entities whose definitions can be found outside of a document. Parameter Entities are available within internal or external  subsets of DTD.

Predefined Internal Entities

There are five internal entities which are predefined in internal entities. All xml processors are required to support references to those entities, even if they are not declared.

Here is a table containing predefined entities and their replacement text:

Entity Names
Capture_K4w3dOP.JPGEntity Names

Entity Names

Example:

<html>

<head>

<title>XML Entity Example</title>

</head>

<body>

<p>An xml&nbsp; tag starts with &lt; and ends with &gt;</p>

</body>

</html>

Example XML Documents:

book.xml

<?xml version="1.0" encoding="ISO-8859-1"?>

<book>

            <title>My First XML</title>

            <production id="33-657" media="paper"></production >

            <chapter>Introduction to XML

                        <para>What is HTML</para>

                        <para>What is XML</para>

            </chapter>

            <chapter>XML Syntax

                        <para>Elements must have a closing tag</para>

                        <para>Elements must be properly nested</para>

            </chapter>

</book>

catalog.xml

<?xml version="1.0"?>

<catalog>

               <book id="bk101">

                        <author>Gambardella, Matthew</author>

                        <title>XML Developer  Guide</title>

                        <genre>Computer</genre>

                        <price>44.95</price>

              </book>

            <book id="bk112">

                        <author>Galos, Mike</author>

                        <title>Visual Studio 7: A Comprehensive Guide</title>

                        <genre>Computer</genre>

                        <price>49.95</price>

            </book>

</catalog>     


1.6 Well-formed XML

An XML document is called well-formed if it satisfies certain rules, specified by The W3C. These rules are:

  • A well-formed XML document must have a corresponding end tag for all of its start tags.

  • Nesting of elements within each other in an XML document must be proper. For example, <tutorial><topic>XML</topic></tutorial> is a correct way of nesting but <tutorial><topic>XML</tutorial></topic> is not.

  • In each element two attributes must not have the same value. For example, <tutorial id="001"><topic>XML</topic></tutorial> is right,but <tutorial id="001" id="w3r"><topic>XML</topic></tutorial> is incorrect.

  • Markup characters must be properly specified. For example, <tutorial id="001"><topic>XML</topic></tutorial> is right, not <tutorial id="001" id="w3r"><topic>XML</topic></tutorial>.

  • An XML document can contain only one root element. So, the root element of an xml document is an element which is present only once in an xml document and it does not appear as a child element within any other element.

Example of a Well formed XML document

<?xml version="1.0" ?>

<w3resource>

<design>

html

xhtml

css

svg

xml

</design>

<programming>

php

mysql

</programming>

</w3resource>


1.7 Convert XML documents to the relational and open database and vice versa

1.7.1 Exporting

1.      Create Student table in StudentDB Database in Microsoft Access

Student Table
fig11.pngStudent Table

Student Table

Figure 1.1 Student Table

 

2.      Select External Data->Export->XML File menu

Selecting XML Menu for Exporting
fig12.pngSelecting XML Menu for Exporting

Selecting XML Menu for Exporting

Figure 1.2 Selecting XML Menu for Exporting

 

3.      Specify the location and xml file name to save

Specifying the file name
fig13.pngSpecifying the file name

Specifying the file name

Figure 1.3 Specifying the file name

 

4.    Select what information to be exported

Export settings with the user interface
fig14_yDOqzDP.pngExport settings with the user interface

Export settings with the user interface

Figure 1.4 Export settings with the user interface

1.7.2 Importing

 

1.      Create a xml file Marks.xml

<?xml version="1.0"?>

<MARKS>

      <STU_NO>200</STU_NO>

      <SEMESTER_ID>S11819</SEMESTER_ID>

      <XML>90</XML>

<GUI>78<//GUI>

</MARKS>

 

2.      Open StudentDB Database in Microsoft Access and Select External Data->Import & Link->XML File menu

Selecting XML Menu for Importing
fig15.pngSelecting XML Menu for Importing

Selecting XML Menu for Importing

Figure 1.5 Selecting XML Menu for Importing

3.      Select the Marks.xml file

Select the xml file
fig16.pngSelect the xml file

Select the xml file

Figure 1.6 Select the xml file

4.      Select the import options based on your requirement

Select the import options
fig17.pngSelect the import options

Select the import options

Figure 1.7 Select the import options

5.      The xml data is now imported as a Marks table in your StudentDB database

Marks table in your database
fig18.pngMarks table in your database

Marks table in your database

    Figure 1.8 Marks table in your database


Unit Summary

XML (Extensible Markup Language) is used to stored data ,shared data and exchange data from various sources. XML is also used to create new language. XML document is a collection  of elements , attributes, comments, character data (CDATA) and XML declaration. If you want an XML processor to process a document correctly, the document must be well-formed, which means that the document and the code markup follow the rules of XML syntax. XML can be converted to database and vice versa.


Laboratory Exercises

1.      Create a well-formed XML document to store your current semester timetable.

2.      Create a well-formed XML document to store students and teachers information of your department.

3.      Create a well-formed XML document to store doctors and departmental information in a hospital.

4.      Create a well-formed XML document to store products details available in a supermarket.

5.      Create a well-formed XML document to store books information available in your college library.


References

https://en.wikibooks.org/wiki/XML_-_Managing_Data_Exchange/Introduction_to_XML

http://www.oercommons.org/courses/xml-basics/view

https://www.w3resource.com/xml/

Return to top