Whitespace  

XML 1.0 defines whitespace as a space, tab, carriage return, or line feed. XML 1.1 also includes the newline character NEL (#x85) and Unicode line separator (#x2028) in whitespace. Whitespace serves the same purpose in XML as it does in most programming and natural languages: to separate tokens and language elements from one another. To an XML parser, all whitespace in element content is significant and will be passed to the client application. Whitespace within tags—for instance, between attributes—is not significant. Consider the following example:

<p>  This sentence has extraneous 
  line breaks.</p>

After parsing, the character data from this example element is passed to the underlying application as:

   This sentence has extraneous
line breaks.

Although XML specifies that all whitespace in element content be preserved for use by the client application, an additional facility is available to the XML author to further hint that an element's character data's space and formatting should be preserved. For more information, see the discussion of the xml:space attribute in >Special Attributes later in this chapter.

To simplify the lives of software developers, parsers are expected to normalize all occurrences of the carriage return (#xD) character to a single line feed (#xA) character. When the carriage return character appears directly before a line feed, it is simply removed. This results in a document that contains only single line feed characters to mark line ends. In XML 1.1, this normalization to a line feed character also occurs for the Unicode characters #x85 (NEXT LINE, NEL) and #x2028 (LINE SEPARATOR).