aspose file tools*
The moose likes Product and Other Certifications and the fly likes XML Notes - I Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Certification » Product and Other Certifications
Bookmark "XML Notes - I " Watch "XML Notes - I " New topic
Author

XML Notes - I

Vibha Verma
Ranch Hand

Joined: Jan 15, 2002
Posts: 107
As this is a long document so I am posting the first part of it. I am still typing the rest of it, will post it as soon as its done.

BASIC XML
 XML markup describes and provides structure to the content of an XML document or data packet.
 Unlike HTML, XML is case-sensitive including element-tags and attribute values.
 XML uses most of the characters defined in the 16-bit unicode character set.
 2 unicode formats are the basis of XML characters � UTF-8 and UTF-16.
 3 control characters are:
Horizontal Tab(HT) 09
Line Feed (LF) 0A
Carriage-Return (CR) 0D
 5 special markup characters are: < > & � � These characters have alternate representations in the form of entity references.
 Legal XML names:
First Charother chars (NmToken)
Unicode characterunicode character
Underscoreunicode number
Colon underscore
Colon
Hyphen
Period
 Colon char should not be used except as a namespace delimiter
 XML names should not begin with the string � XML � in any form.
 Elements are the basic building blocks of XML markup. Tags consist of element type names.
 Everything between the start-tag and the end-tag of an element is contained within that element.
 Examples: (here �sp� is space)
1. < sp ElementName> not allowed
2. <Name sp> allowed
3. <sp /Name> not allowed
4. </sp Name> not allowed
5. </Name sp> allowed
6. </Name /Name> not allowed
7. <Name sp/> allowed
8. <Name / sp> not allowed
 Empty element tags may have associated attributes
 XML documents have three parts �prolog (optional), body (required) and epilog(optional)
 Document root/ Document entity is the root element of the XML document (which is not visible), this has a subtree(body), the root element of that subtree is called Document element/Root element.
 Prolog may contain � XML declaration, comments, PIs, DOCTYPE declaration
 Epilog may contain � PIs or comments.
 XML data is in the form of a simple hierarchical tree.
 All elements must be properly nested, no overlapping of tags is allowed.
 String literals are used for the values of attributes, internal entities and external identifiers.
 All string literals are enclosed by apos (�) or quot (�)
 Attributes are comprised of name-value pairs.
 Attributes:
1. Permissible values may be:
Text characters
Entity references
character references
2. Forbidden characters in attribute values: < and &. Use the entity references instead.
3. Only one instance of attribute name is allowed within a given tag.
 All whitespace characters in the content are preserved and whitespace within element tags and attribute values may be removed.
 3 combinations of chars for end-of-line are: CR-LF, CR only, LF only. All these strings are converted to a single LF character.
 Except for the 5 built-in entity references, all entities must be defined prior to their use.
 Comments:
1. Can�t have double hyphen within the string
2. Can�t be nested
3. Can�t be put in the start or end tag
4. Extra hyphen at the end is illegal
 CDATA Section:
1. Can�t be empty
2. Can�t be nested
3. Text in the CDATA section can�t contain �]]>�
 XML Declaration
1. Order of attributes: version, encoding, standalone is fixed.
2. Version attribute is required, encoding and standalone are optional.
3. Default value for standalone is �no�
4. If encoding is other than UTF-8 or UTF-16, it must be specified.
5. Encoding values are not case-sensitive
 Special meaning attributes � xml:lang and xml:space( can have values preserve or default)
 XML document has logical and physical structure. Physical � document has storage units: entities. Logical � document is composed of declarations, elements, comments, char references and PIs
 Document Type Declaration contains or points to markup declaration that provides a grammar for a class of documents. This grammar is known as Document Type Definition.
 No attribute name may appear more than once in the same start tag or empty element tag.
 Attribute values cannot contain direct or indirect entity references to external entities.

Document Type Definitions (DTD)
 DTDs are a set of rules that define how XML data should be structured.
 Cooperating applications can share a single description of data known as XML vocabulary. A group of XML documents that share common XML vocabulary is known as document type and each individual document that conforms to a document type is a document instance.
 Multiple documents and applications can share DTDs
 Validity constraints ensure that any XML data conforms to its associated DTD.
 Only one DTD may be associated with a given XML document or data object.
 DTD has 2 parts � internal subset, external subset. DTD declarations in internal subset have priority over those in external subset.
 An XML document can be associated with only one DTD using a single DOCTYPE declaration.
 Syntax of DOCTYPE declaration:
1. <!DOCTYPE doc_element SYSTEM location [internal_subset]>
2. <!DOCTYPE doc_element PUBLIC identifier location [internal_subset]>
 Only comments and PIs cam be inserted between XML declaration and DOCTYPE declaration.
 DTDs are associated with the entire element tree via the document element.
 �#� character as URI fragment identifier cannot be used in the location of a DTD.
 The use of PUBLIC identifier should be limited to internal systems and legacy SGML applications.
 Four basic keywords used in DTD declaration are:
1. ELEMENT
2. ATTLIST
3. NOTATION
4. ENTITY
 ELEMENT:
Syntax: <!ELEMENT ele_name content_category>
<!ELEMENT ele_name (content_model)cardinality)>
 Content_category : ANY or EMPTY
 Content_Model : Text only, Element only, Mixed
 Child elements in mixed content can appear (or not) n any order, any number of times.
 Syntax of mixed content: <!ELEMENT foo (#PCDATA | child1 | child2)*>
1. No fixed sequence
2. #PCDATA must be the first item
3. �*� operator is needed as the mixed content doesn�t constainthe no. of occurences of the child elements.
 ATTLIST declaration Syntax: <!ATTLIST element_name attrName attrType attrDefault defaultValue>
 Attribute defaults: #REQUIRED, #IMPLIED, #FIXED, Default values
 Attribute types: (10 in number)
1. CDATA
2. Enumeration
3. ID
4. IDERF
5. IDERFS
6. NMTOKEN
7. NMTOKENS
8. NOTATION
9. ENTITY
10. ENTITIES
 Order of attributes cannot be enforced
 ID attribute type must not be used with #FIXED
 ID value must be unique within a given document
 Only one ID attribute for each element type
 NMTOKEN attribute prevents the inclusion of whitespace and some punctuation charaters
 NOTATION can be used to identify
1. The format of unparsed entities
2. The format of element attributes of ENTITY and ENTITIES type
3. The application associated with a PI
 Entities can be used to include a document inside a DTD

SCHEMA
 Advantages of XML Schema
1. Support for data-types
2. Uses XML syntax
3. Support for content model ( mixed content, exact number of occurences of elements, named group of elements)
4. Extensible
5. Self documenting
 Schema components is a generic term for the blocks that make up the abstract data model of the schema
 3 groups of components : Primary, Secondary, Helper
 Primary components:
1. Element declaration
2. Simple type definition
Built-in types: Primitive, Derived
3 varities of data-types: Atomic, List, Union
User derived types
3. Complex type definition
4. Attribute Declaration
 Default value of minOccurs and maxOccurs is 1
 Simple type can not have any child elements or carry attributes.
 Simple types are the atoms of information considered distinct to XML Schema and they cannot be split up.
 Primitive data types are data types in their own right and they are not defined in terms of other types
 Derived typed are built from the definitions of other data-types
 User derived types are derived by the author of the schema and are particular to that schema.
 Atomic data type is one that has a value that cannot be divided atleast not in the context of XML Schema. Atomic data type is not analogous to primitive type. Atomic type can be primitive or derived.
 Built-in list types are: IDREFS, ENTITIES, NMTOKENS
 Named complex type is created when the content model is to be reused, otherwise anonymous types can be created.
 �schema� is the root element of the Schema document.
 Attributes are to be defined as part of the complex type because simple types can only hold atomic values and not carry attributes or have child elements.
 Content models � ANY, EMPTY, Element only, Mixed
 ANY is the default content model
 EMPTY, for this define a complex type and restrict it from �anyType� so that it can only carry attributes.
 Secondary components:
1. Model group definition
2. Attribute groups
3. Notation declaration
4. Identity constraints
Unique Values
Key and KeyRef
Default or Fixed element content
Specifying null values
 Attribute groups can nest other attribute groups inside of them and rather like attribute declarations should appear at the end of the complex type.
 Notation declaration � associates a name with an identifier for an application used to view that sort of a notation.
 Key and KeyRef � primary and foreign key respectively.
 XML Schema data types are composed of three parts: Value space, Lexical space, Set of facets.
 Value spaces have certain facets: order, bound, cardinality, equality, numeric or non-numeric dichotomoy.
ENTITIES
 All XML documents are comprised of units of storage � entities.
 Document entity serves as the starting point for an XML parser.
 External and internal subsets of DTD are also entities, but unnamed ones.
 Main categories of entities:
 Internal vs External
 Parsed vs Unparsed
 General vs Parameter
 Internal entities can only be parsed
 External entities can be both parsed/unparsed
 General entities can be both parsed/unparsed
 Parameter entities are always parsed entities and so can be internal or external
 General entities are referenced by using entity reference �&name;� Parameter entities are referenced as �%name;�
 Unparsed entities:
1. May or may not be text
2. Need not be XML text
3. Must have associated notation
4. Can only be used as the value of an attribute having ENTITY/ENTITIES type
 The defining declaration should precede any references to the entity
 General entities cause fatal XML Parse errors if:
1. Any reference to an unparsed entity
2. Any char or general entity reference in DTD except within an entity or attribute value
3. Any reference to an external entity from within an attribute value
 Unparsed entities are always external
 Entities can never be empty
 An entity reference must not contain the name of an unparsed entity.
NAMESPACES
 XML Namespace is a named collection of names
 Qualified name - namespace prefix:local name
 A namespace declaration applies to the element in which it is declared.
 Unqualified attribute names do not belong to any namespace
 Qualified attribute names belong to the associated namespace
 Attributes are not explicitly part of any default namespace
 Default namespace can be disabled by using an empty value in the default namespace declaration
 XML namespaces do not work well with DTDs
 An XML namespace is a collection of element type and attribute names
 Two part naming system is he only thing defined by the XML namespace recommendation
 XML namespaces contain names of element types and attributes not the elements or attributes themselves
 If an element type or attribute name is not specifically declared to be in an XML namespace and there is no default namespace then that name is not in any XML namespace
 XML namespaces do not apply to entity names, notation names or PI targets
 No namespace declarations apply to DTDs
 XML namespace prefix cannot be undeclared, it can be ovverridden by redeclaring the same namespace prefix to some other URI.

XLinks and XPointers
 Links to external resources such as other XML documents, HTML documents or images
 Utility:
1. To define relationships between similar documents
2. To define a sequence in which documents should be navigated
3. To embed non-XML content in an XML document
 XLink attributes:
1. Type (possible values are: simple, extended, resource, location, arc, title)
2. Title � human readable string
3. Href � destination URI of the link
4. Role � function of link�s content
5. Arcrole � function of link
6. Show � how to render the link (new, replace, embed, other, none)
7. Actuate � when to trigger the link (onRequest, onLoad, other, none)
 Simple links (XLink:type = �simple�) offer similar functionality to HTML hyperlinks while extended links offer greater capabilities
 Simple links are a subset of extended links.
 Simple links link two locations in one direction and the start of the link is always the declaration of the link itself.
 The combinations like xlink:show = �replace� and xlink:actuate:�onLoad� do not make any sense.
 Extended links allow more than one resource to be linked together and they may be specified out-of-line
 3 types of extended links: inbound, outbound, third-party
 Elements that have extended Xlink attributes have 4 sub-elements : Locator element, Resource elements, arc element and title element and 3 attributes: type, title, role
 Extended links do not imply that their source is the document in which the link is located.
 Locator element: To specify the locations participating in an extended link. Attributes: href, role, title, label
 Resource element: To define participants in the link that are within the scope of extended link element. Attributes: role, title, label
 Arc element: To define the navigable connections between locators participating in an extended link. Attributes: arcrole, title, show, actuate, from, to
 Title element: Attributes: type
 Inline links: Extended links may be embedded in one of the resources participating in an extended link.
Out-of-line extended links � a special type of arc element is used to indicate to an XLink-aware processor that out-of-line link exists for a particular document.
 XPointer: to point to some portion of an XML document � individual sub-tree, attributes or even individual characters that are part of the text content.
 HTML pointers use �#� (fragment identifier) to indicate that the text following it refers to a named anchor point, or fragment identifier in the targeted document.
 3 ways to specify fragment identifiers: Bare names, Child Sequences, Full XPointers
 Bare Names: Shorthand notation is provided for pointing to elements with IDs
 Child sequences: pointed to be walking through the child element tree eg. /1/1/4/2
 Points: point location may be a node or a particular location within character content
Vasudha Deepak
Ranch Hand

Joined: Mar 15, 2002
Posts: 86
Thanks Vibha!


IBM Certified Developer -XML and Related Technologies(141)<br />SCJP2 SCWCD
Xinyi Zhang
Ranch Hand

Joined: Apr 28, 2001
Posts: 42
Thanks a lot. It is helpful.


Xinyi
Yingtang Tang
Ranch Hand

Joined: Nov 27, 2001
Posts: 42
Thanks a lot.
Yingtang
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: XML Notes - I
 
Similar Threads
XML One liners for the exam....
some notes on xml
JavaRanch XML mock exam errata-2
Resolving relative URIs for Canonicalization
need help on mock question