What is XML?&&&&&&&&
*********************************************************************************************
The main
difference between XML and HTML
XML was designed to carry data.
XML is not a replacement for HTML.
XML and HTML were designed with different goals:
XML was designed to describe data
and to focus on what data is.
HTML was designed to display data and to focus on how data looks.
HTML is about displaying
information, XML is about describing information.
*********************************************************************************************
XML does not DO
anything
XML was not designed to DO
anything.
Maybe it is a little hard to
understand, but XML does not DO anything. XML is created to structure,
store, and to send information.
The following example is a note to Tove from Jani, stored as XML:
|
<note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
The note has a header, and a
message body. It also has sender and receiver information. But still, this XML
document does not DO anything. It is just pure information wrapped in XML tags.
Someone must write a piece of software to send, receive, or display it.
*************************************************************************************************************
XML is free and
extensible
XML tags are not predefined. You
must "invent" your own tags.
The tags used to mark up HTML
documents and the structure of HTML documents are
predefined. The author of HTML documents can only use tags that are defined in
the HTML standard (like <p>, <h1>, etc.).
XML allows the author to define his
own tags and his own document structure.
The tags in the example above (like
<to> and <from>), are not defined in any XML standard. These tags
are "invented" by the author of the XML document.
*************************************************************************************************************
XML is a
complement to HTML
XML is not a replacement for HTML.
It is important to understand that
XML is not a replacement for HTML. In future Web development it is most likely
that XML will be used to describe the data, while HTML will be used to format
and display the same data.
My best description of XML is: XML
is a cross-platform, software and hardware independent tool for transmitting
information.
************************************************************************************************************
XML in future
Web development
XML is going to be everywhere.
We have been participating in XML
development since its creation. It has been amazing to see how quickly the XML
standard has been developed, and how quickly a large number of software vendors
have adopted the standard.
We strongly believe that XML will
be as important to the future of the Web as HTML has been to the foundation of
the Web, and that XML will be the most common tool for all data manipulation
and data transmission.
*********************************************************************************
How can XML be
Used?&&&&&&&
It is important to understand
that XML was designed to store, carry, and exchange data. XML was not designed
to display data.
*********************************************************************************
XML can Separate
Data from HTML
With XML, your data is stored
outside your HTML.
When HTML is used to display data,
the data is stored inside your HTML. With XML, data can be stored in separate
XML files. This way you can concentrate on using HTML for data layout and
display, and be sure that changes in the underlying data will not require any
changes to your HTML.
XML data can also be stored inside
HTML pages as "
***********************************************************************************
XML is used to
Exchange Data
With XML, data can be exchanged
between incompatible systems.
In the real world, computer systems
and databases contain data in incompatible formats. One of the most
time-consuming challenges for developers has been to exchange data between such
systems over the Internet.
Converting the data to XML can
greatly reduce this complexity and create data that can be read by many
different types of applications.
************************************************************************************
XML and B2B
With XML, financial information can
be exchanged over the Internet.
Expect to see a lot about XML and
B2B (Business To Business) in the near future.
XML is going to be the main
language for exchanging financial information between businesses over the
Internet. A lot of interesting B2B applications are
under development.
************************************************************************************
XML can be used to
Share Data
With XML, plain text files can be
used to share data.
Since XML data is stored in plain
text format, XML provides a software- and hardware-independent way of sharing
data.
This makes it much easier to create
data that different applications can work with. It also makes it easier to
expand or upgrade a system to new operating systems, servers, applications, and
new browsers.
*************************************************************************************
XML can be used to
Store Data
With XML, plain text files can be
used to store data.
XML can also be used to store data
in files or in databases. Applications can be written to store and retrieve
information from the store, and generic applications can be used to display the
data.
**********************************************************************************************
XML can make your
Data more Useful
With XML, your data is available to
more users.
Since XML is independent of
hardware, software and application, you can make your data available to other
than only standard HTML browsers.
Other clients and applications can
access your XML files as data sources, like they are accessing databases. Your
data can be made available to all kinds of "reading machines"
(agents), and it is easier to make your data available for blind people, or
people with other disabilities.
***********************************************************************************************
XML can be used to Create new Languages
XML is the mother of WAP and WML.
The Wireless Markup Language (WML),
used to markup Internet applications for handheld devices like mobile phones,
is written in XML.
**********************************************************************************************
If Developers have
Sense
If they DO have sense, all future
applications will exchange their data in XML.
The future might give us word
processors, spreadsheet applications and databases that can read each other's
data in a pure text format, without any conversion utilities in between.
We can only pray that Microsoft and
all the other software vendors will agree.
**********************************************************************************************
XML Syntax&&&&&&&
The syntax rules of XML are very simple and very strict. The
rules are very easy to learn, and very easy to use.
Because of this, creating software that can read and manipulate
XML is very easy to do.
**********************************************************************************
XML documents use a
self-describing and simple syntax.
|
<?xml
version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
The first line in the document -
the XML declaration - defines the XML version and the character encoding used
in the document. In this case the document conforms to the 1.0 specification of
XML and uses the ISO-8859-1 (Latin-1/West European) character set.
The next line describes the root
element of the document (like it was saying: "this document is a
note"):
|
<note> |
The next 4 lines describe 4 child
elements of the root (to, from, heading, and body):
|
<to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> |
And finally the last line defines
the end of the root element:
|
</note> |
Can you detect from this example
that the XML document contains a Note to Tove from Jani? Don't you agree that XML is pretty self-descriptive?
****************************************************************************************************
All XML elements
must have a closing tag
With XML, it is illegal to omit the
closing tag.
In HTML some elements do not have
to have a closing tag. The following code is legal in HTML:
|
<p>This is a paragraph <p>This is another paragraph |
In XML all elements must have a
closing tag, like this:
|
<p>This is a paragraph</p> <p>This is another paragraph</p> |
Note: You might have noticed from
the previous example that the XML declaration did not have a closing tag. This
is not an error. The declaration is not a part of the XML document itself. It
is not an XML element, and it should not have a closing tag.
***************************************************************************************************
XML tags are case
sensitive
Unlike HTML, XML tags are case
sensitive.
With XML, the tag <Letter> is
different from the tag <letter>.
Opening and closing tags must
therefore be written with the same case:
|
<Message>This is incorrect</message> <message>This is correct</message> |
All XML elements must be properly
nested
Improper nesting of tags makes no
sense to XML.
In HTML some elements can be
improperly nested within each other like this:
|
<b><i>This text is
bold and italic</b></i> |
In XML all elements must be
properly nested within each other like this:
|
<b><i>This text is
bold and italic</i></b> |
**********************************************************************************************
All XML documents
must have a root tag
The first tag in an XML document is
the root tag.
All XML documents must contain a
single tag pair to define the root element. All other elements must be nested
within the root element.
All elements can have sub elements
(children). Sub elements must be correctly nested within their parent element:
|
<root> <child> <subchild>.....</subchild> </child> </root> |
****************************************************************************************
Attribute values
must always be quoted
With XML, it is illegal to omit
quotation marks around attribute values.
XML elements can have attributes in
name/value pairs just like in HTML. In XML the attribute value must always be
quoted. Study the two XML documents below. The first one is incorrect, the
second is correct:
|
<?xml
version="1.0" encoding="ISO-8859-1"?> <note date= <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
|
<?xml
version="1.0" encoding="ISO-8859-1"?> <note date=" <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
The error in the first document is that the date
attribute in the note element is not quoted.
This is correct: date="
******************************************************************************************************
With XML, White
Space is Preserved
With XML, the white space in your
document is not truncated.
This is unlike HTML. With HTML, a sentence like this:
Hello
my name is Tove,
will be displayed like this:
Hello my name is Tove,
because HTML strips off the white space.
****************************************************************************************
With XML, CR / LF
is Converted to LF
With XML, a new line is always
stored as LF.
Do you know what a typewriter is?. Well, a typewriter is a type of mechanical device they
used in the previous century :-)
After you have typed one line of
text on a typewriter, you have to manually return the printing carriage to the
left margin position and manually feed the paper up one line.
In Windows applications, a new line
in the text is normally stored as a pair of CR LF (carriage return, line feed)
characters. In Unix applications, a new line is
normally stored as a LF character. Macintosh applications use only a CR
character to store a new line.
********************************************************************************************
Comments in XML
The syntax for writing comments in
XML is similar to that of HTML.
<!-- This is a comment -->
*********************************************************************************************
There is Nothing
Special about XML
There is nothing special about XML.
It is just plain text with the addition of some XML tags enclosed in angle
brackets.
Software that can handle plain text
can also handle XML. In a simple text editor, the XML tags will be visible and
will not be handled specially.
In an XML-aware application
however, the XML tags can be handled specially. The tags may or may not be
visible, or have a functional meaning, depending on the nature of the
application.
*********************************************************************************************
XML Elements&&&&&&&&
XML Elements are extensible and
they have relationships.
XML Elements have simple naming
rules.
XML Elements are Extensible
XML documents can be extended to
carry more information.
*********************************************************************************************
Look at the
following XML NOTE example:
|
<note> <to>Tove</to> <from>Jani</from> <body>Don't forget me this
weekend!</body> </note> |
Let's imagine that we created an
application that extracted the <to>, <from>, and <body>
elements from the XML document to produce this output:
|
MESSAGE To: Tove Don't forget me this weekend! |
Imagine that the author of the XML
document added some extra information to it:
|
<note> <date>1999-08-01</date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this
weekend!</body> </note> |
Should the application break or
crash?
No. The application should still be
able to find the <to>, <from>, and <body> elements in the XML
document and produce the same output.
XML documents are Extensible.
****************************************************************************************************
XML Elements
have Relationships
Elements are related as parents and
children.
To understand XML terminology, you
have to know how relationships between XML elements are named, and how element
content is described.
Imagine that this is a description
of a book:
|
Book Title: My First XML Chapter 1: Introduction to XML
Chapter 2: XML Syntax
|
Imagine that this XML document
describes the book:
|
<book> <title>My First
XML</title> <prod id="33-657"
media="paper"></prod> <chapter>Introduction to
XML <para>What
is HTML</para> <para>What
is XML</para> </chapter> <chapter>XML Syntax <para>Elements
must have a closing tag</para> <para>Elements
must be properly nested</para> </chapter> </book> |
Book is the root element.
Title, prod, and chapter are child elements of book. Book is the parent
element of title, prod, and chapter. Title, prod, and chapter are siblings
(or sister elements) because they have the same parent.
********************************************************************************************************
Elements have
Content
Elements can have different
content types.
An XML element is everything
from (including) the element's start tag to (including) the element's end tag.
An element can have element
content, mixed content, simple content, or empty content.
An element can also have attributes.
In the example above, book has element
content, because it contains other elements. Chapter has mixed content
because it contains both text and other elements. Para has simple content
(or text content) because it contains only text. Prod has empty
content, because it carries no information.
In the example above only the prod
element has attributes. The attribute named id has the value
"33-657". The attribute named media has the value
"paper".
***********************************************************************************************************
Element Naming
XML elements must follow these
naming rules:
Take care when you
"invent" element names and follow these simple rules:
Any name can be used, no words are
reserved, but the idea is to make names descriptive. Names with an underscore
separator are nice.
Examples: <first_name>,
<last_name>.
Avoid "-" and
"." in names. It could be a mess if your software tried to subtract
name from first (first-name) or think that "name" is a property of
the object "first" (first.name).
Element names can be as long as you
like, but don't exaggerate. Names should be short and simple, like this: <book_title> not like this: <the_title_of_the_book>.
XML documents often have a corresponding
database, in which fields exist corresponding to elements in the XML document.
A good practice is to use the naming rules of your database for the elements in
the XML documents.
Non-English letters like éòá are perfectly legal in XML element names, but watch out
for problems if your software vendor doesn't support them.
The ":" should not be
used in element names because it is reserved to be used for something called
namespaces (more later).
****************************************************************************************************
XML Attributes
XML elements can have attributes
in the start tag, just like HTML.
Attributes are used to provide
additional information about elements.
*****************************************************************************************************
XML elements can
have attributes.
From HTML you will remember this:
<IMG SRC="computer.gif">. The SRC
attribute provides additional information about the IMG element.
In HTML (and in XML) attributes
provide additional information about elements:
|
<img src="computer.gif"> <a href="demo.asp"> |
Attributes often provide
information that is not a part of the data. In the example below, the file type
is irrelevant to the data, but important to the software that wants to
manipulate the element:
|
<file type="gif">computer.gif</file> |
************************************************************************************************************
Quote Styles,
"female" or 'female'?
Attribute values must always be
enclosed in quotes, but either single or double quotes can be used. For a
person's sex, the person tag can be written like this:
|
<person sex="female"> |
or like this:
|
<person sex='female'> |
Double quotes are the most common,
but sometimes (if the attribute value itself contains quotes) it is necessary
to use single quotes, like in this example:
|
<gangster name='George "Shotgun" Ziegler'> |
************************************************************************************************************
Use of Elements
vs. Attributes
Data can be stored in child
elements or in attributes.
Take a look at these examples:
|
<person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> |
|
<person>
<sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> |
In the first example sex is an
attribute. In the last, sex is a child element. Both examples provide the same
information.
There are no rules about when to
use attributes, and when to use child elements. My experience is that
attributes are handy in HTML, but in XML you should try to avoid them. Use
child elements if the information feels like data.
********************************************************************************************************
My
I like to store data in child
elements.
The following three XML documents
contain exactly the same information:
A date attribute is used in the
first example:
|
<note date=" <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
A date element is used in the
second example:
|
<note> <date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
An expanded date element is used in
the third: (THIS IS MY FAVORITE):
|
<note> <date>
<day>12</day>
<month>11</month>
<year>99</year> </date> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
***********************************************************************************************************
Avoid using
attributes?
Should you avoid using
attributes?
Here are some of the problems using
attributes:
If you use attributes as containers
for data, you end up with documents that are difficult to read and maintain.
Try to use elements to describe data. Use attributes only to provide
information that is not relevant to the data.
Don't end up like this ( if you think this looks like XML, you have not understood
the point):
|
<note day="12" month="11"
year="99" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this
weekend!"> </note> |
************************************************************************************************************
An Exception to
my Attribute rule
Rules always have exceptions.
My rule about attributes has one
exception:
Sometimes I assign ID references to
elements. These ID references can be used to access XML elements in much the
same way as the NAME or ID attributes in HTML. This example demonstrates this:
|
<messages> <note
id="p501"> <to>Tove</to> <from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body> </note> <note
id="p502"> <to>Jani</to> <from>Tove</from>
<heading>Re: Reminder</heading> <body>I
will not!</body> </note> </messages> |
The ID in these examples is just a
counter, or a unique identifier, to identify the different notes in the XML
file, and not a part of the note data.
What I am trying to say here is
that metadata (data about data) should be stored as attributes, and that data
itself should be stored as elements.
**************************************************************************************************************
Introduction to
DTD&&&&&&&
The purpose of a Document Type Definition is to define
the legal building blocks of an XML document. It defines the document structure
with a list of legal elements.
A DTD can be declared inline in your XML document, or as
an external reference.
**************************************************************************************************
Internal DOCTYPE
declaration
If the DTD is included in your XML
source file, it should be wrapped in a DOCTYPE definition with the following
syntax:
|
<!DOCTYPE root-element [element-declarations]> |
Example XML document with a DTD: (Open it in
IE5, and select view source):
|
<?xml
version="1.0"?> <!DOCTYPE note [ <!ELEMENT note
(to,from,heading,body)> <!ELEMENT
to (#PCDATA)> <!ELEMENT
from (#PCDATA)> <!ELEMENT
heading (#PCDATA)> <!ELEMENT
body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from>
<heading>Reminder</heading> <body>Don't
forget me this weekend</body> </note> |
The DTD above is interpreted like this:
!DOCTYPE note (in line 2) defines that this is a
document of the type note.
!ELEMENT note (in line 3) defines the note element as having four
elements: "to,from,heading,body".
!ELEMENT to (in line 4) defines the to element
to be of the type "#PCDATA".
!ELEMENT from (in line 5) defines the from
element to be of the type "#PCDATA"
and so on.....
********************************************************************************************************
External DOCTYPE
declaration
If the DTD is external to your XML
source file, it should be wrapped in a DOCTYPE definition with the following
syntax:
|
<!DOCTYPE root-element SYSTEM "filename"> |
This is the same XML document as
above, but with an external DTD: (Open it in
IE5, and select view source)
|
<?xml
version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> |
And this is a copy of the file
"note.dtd" containing the DTD:
|
<!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> |
*************************************************************************************************
Why use a DTD?
With DTD, each of your XML files
can carry a description of its own format with it.
With a DTD, independent groups of
people can agree to use a common DTD for interchanging data.
Your application can use a standard
DTD to verify that the data you receive from the outside world is valid.
You can also use a DTD to verify
your own data.
**************************************************************************************************
Introduction to
XML Schema
XML Schema is an XML based
alternative to DTD.
An XML schema describes the
structure of an XML document.
The XML Schema language is also
referred to as XML Schema Definition (XSD).
******************************************************************************************************
What You Should
Already Know
Before you study the XML Schema
Language, you should have a basic understanding of XML and XML Namespaces. It
will also help to have some basic understanding of DTD.
*****************************************************************************************************
What is an XML Schema?
The purpose of an XML Schema is to
define the legal building blocks of an XML document, just like a DTD.
An XML Schema:
·
defines elements that
can appear in a document
·
defines attributes that
can appear in a document
·
defines which elements
are child elements
·
defines the order of
child elements
·
defines the number of
child elements
·
defines whether an
element is empty or can include text
·
defines data types for
elements and attributes
·
defines default and
fixed values for elements and attributes
******************************************************************************
XML Schemas are the Successors of DTDs
We think that very soon XML Schemas
will be used in most Web applications as a replacement for DTDs.
Here are some reasons:
·
XML Schemas are
extensible to future additions
·
XML Schemas are richer
and more useful than DTDs
·
XML Schemas are written
in XML
·
XML Schemas support
data types
·
XML Schemas support
namespaces
*********************************************************************************
XML Schema is a W3C Recommendation
XML Schema was originally proposed
by Microsoft, but became an official W3C recommendation in May 2001.
The specification is now stable and
has been reviewed by the W3C Membership.
****************************************************************************************************************
Presentation:
Data and XML
-- Originally
designed to isolating content from presentation.
-- Useful in data
exchange.
-- Ability to
integrate data and documents
-- Designed to
communicate content in a flexible and extensible representation.
**********************************************************************************
XML and database
-- Xml documents can be stored in a relational or
objected-oriented database management.
system by
translating the documents into relation
or objects.
-- Database software
can present existing relations or objects in a database as XML.
-- A new DBSM
created with a data model based on XML(native XML
database).
**********************************************************************************
Data-Centric (Data- Processing-Oriented or
Message-Oriented) Documents:
--Documents that
use XML as data transport.
--Documents are
characterized by fairly regular structure with many repetitions of those data
structures.
--Processing focus
on it’s use and exchange byapplications.
-- Desired
operation:
searching for combination of elements and data.
modifying, adding
or deleting a single element or data.
**********************************************************************************
--example of data-centric
<SalesOrder SONumber="12345">
<Customer CustNumber="543">
<CustName>ABC
Industries</CustName>
<Street>
<City>
<State>IL</State>
<PostCode>60609</PostCode>
</Customer>
<OrderDate>981215</OrderDate>
<Item ItemNumber="1">
<Part PartNumber="123">
<Description>
<p><b>
Stainless steel, one-piece construction,
lifetime
guarantee.</p>
</Description>
<Price>9.95</Price>
</Part>
<Quantity>10</Quantity>
</Item>
<Item ItemNumber="2">
<Part PartNumber="456">
<Description>
<p><b>Stuffing separator:<b><br />
Aluminum, one-year guarantee.</p>
</Description>
<Price>13.27</Price>
</Part>
<Quantity>5</Quantity>
</Item>
</SalesOrder>
*******************************************************************************
Document-Centric(Document-Processing-Oriented or
Presentation-Oriented) Documents:
--Documents are designed(capture natural language) for human consumption.
--Complex or
irregular structure
and mixed contents.
--The processing
focus on final presentation of information to the end users.
--Desired
operation:
retrieving the
entire document.
searching for a
word.
modifying or
reordering a section.
*******************************************************************************
--Example of document-centric
<Product>
<Intro>
The <ProductName>
Fabrication Labs, Inc.</Developer> is
<Summary>like a monkey wrench,
but not as big.</Summary>
</Intro>
<Description>
<
handed versions (skyhook optional)</i>, is made of the <b>finest
stainless steel</b>. The Readi-grip rubberized handle quickly adapts
to your hands, even in the greasiest
situations. Adjustment is
possible through a variety of custom
dials.</
<
<List>
<Item><Link URL="Order.html">Order your own turkey
wrench</Link></Item>
<Item><Link URL="Wrenches.htm">Read more about
wrenches</Link></Item>
<Item><Link URL="Catalog.zip">Download
the catalog</Link></Item>
</List>
<
order now, comes with a <b>hand-crafted
shrimp hammer</b> as a
bonus gift.</
</Description>
</Product>
*****************************************************************************
Distinction of Data-Centric and
Document-Centric
--Will help you
decide which kind of database to use.
--As a general
rule, data is stored in traditional database, and documents are stored in
a native XML
database or content management system.
*****************************************************************************
Why use XML
database
--XML is useful
for data exchange database and applications, especially between different
enterprises.
-- XML
repositories may meet all needs
-- Easier to
represent more complex data than relation in a relational database.
-- XML has some
features that similar to objects in object-oriented programming language.
****************************************************************************