[UP] Reference |
Output encodings This section explains how character data in the generated HTML output are encoded. The are various aspects of this theme, and it is quite easy to get totally confused. Because of this, I will first explain how character data change their encoding type during the processing steps by default, and later, how this behaviour can be modified. The standard way of encoding characters Phase 1: Parsing XML, and the internal representation Most of the character data are read from the XML files containing the UI definition, but some strings are also dynamically added by the program (e.g. read from a database, or another background store). The XML data are parsed, and the result is an in-memory representation as XML tree. This tree can be seen as a reference point of the various recoding steps as it expresses what is meant. This becomes clearer by an example: <ui:variable name="company"> <ui:string-value>Meyer & Son</ui:string-value> </ui:variable>This literal XML fragment is parsed, and represented as a tree: | ui:variable | +-- attribute "name" has value "company" +-- ui:string-value | +-- text "Meyer & Son"Especially, the ampersand is now represented as ampersand, and does not need any escaping notation. Of course, there are many more data structures than just XML trees. We have declared a variable here, and this creates a container for the variable. The important point is that the initial value of the variable can be directly taken from the XML tree, here it is "Meyer & Son". If the value is later changed (e.g. overwritten by some database record), no encoding changes are necessary. The general idea is that the internal representation never escapes characters. Phase 2: Internal processing In order to get HTML output, the XML tree needs to be transformed, for example, template calls must be expanded. The transformation never changes the way character data are encoded. Phase 3: Writing the HTML output The result of the transformation step is an HTML tree that must be written as text stream. There are essentially two major cases:
How to modify the way output is encoded Forcing the algorithm for attribute context One drawback of the normal output encoding is that it is impossible to generate raw HTML dynamically. Imagine you have a database containing HTML pages. How do you include the pages into your generated output? Let us assume the variable html_page contains the page. If you include it by <ui:dynamic variable="html_page"/>the ui:dynamic statement expands to a text node, and the normal encoding escapes all HTML meta characters. The result is that the browser displays the code of the page as such, but does not interpret it. It is possible to force the algorithm that is used for attribute context. The important point is that this algorithm does not escape within text nodes. The ui:special element selects this algorithm, e.g. <ui:special> <ui:dynamic variable="html_page"/> </ui:special>Now the HTML meta characters are left as they are, without any escaping. The browser interprets the HTML code. Additional output encodings The HTML pre tag preserves the formatting of the inner character block. Sometimes it would be nice to simulate the effect of pre without using it, by replacing spaces with , newlines with <br>, and by expanding tabs. The ui:encode element allows one to add an escaping algorithm to the current active set of encoders: <ui:encode enc="pre"> This is the first line. Second line. </ui:encode>The two lines are first encoded by the HTML-escaping algorithm, the default algorithm. The ui:encode element takes the result of this, and applies pre-style escaping to it. The printed HTML code is: This is the first line.<br> Second line.<br> Another example: You want to generate a Javascript function that pops up an alert box on the screen: <ui:template name="alert" from-caller="body"> <script type="text/javascript"> <ui:special> window.alert("${body/js}"); </ui:special> </script> </ui:template>The ui:special element makes that HTML-escaping is turned off. The /js notation applies the js encoding to the value of body. This encoding escapes characters that cannot occur in Javascript strings literally, e.g. the quotation mark itself. The list of defined output encodings The following names can be used in ui:encode, and when expanding parameters (${param/encname}) and in bracket expressions ($[expr/encname]):
You can define your own encodings by calling the method add_output_encoding of the application object. The encodings can be referred to at a number of places:
|