XML
The module provides XML parsing and generation procedures. It also has optional dependency on LibXML for XML/HTML parsing. See details below.
To use the bindings from this module:
(import :std/xml)
Overview
This module adds utilities to work with XML. Gerbil Scheme uses SXML to represent the XML data. This module is mostly ported from Oleg's XML package. See more detailed info about SXML can be found at [http://okmij.org/ftp/Scheme/xml.html].
Compiling Gerbil with LibXML provides C-based XML parser procedures parse-xml and parse-html including their options.
Parsing
read-xml
(read-xml source [namespaces: ()]) -> sxml | error
source := port | string | u8vector
namespaces := alist or hash-table mapping urls to namespace prefixes
Reads and parses XML from source and returns SXML result. namespaces is
optional alist or a hash table of mapping uri (string) -> namespace (string)
same interface as parse-xml
so that implementations can be swapped. Signals an
error on invalid source value.
Examples
> (import :std/xml)
> (read-xml "<foo><element id=\"1\">foobar</element><element id=\"2\">barbaz</element></foo>")
(*TOP* (foo (element (@ (id "1")) "foobar") (element (@ (id "2")) "barbaz")))
parse-xml [libxml]
(parse-xml source [url: "none.xml"] [encoding: (encoding "UTF-8")] [options: (options parse-xml-default-options)] [namespaces: (ns [])]) -> sxml | error
source := string | u8vector | port
url := string for base URL
encoding := string | #f; strings are always UTF-8 encoded for parsing.
options := fixnum (ior of libxml parser options)
namespaces := alist or hash-table mapping urls to namespace prefixes
Parses XML data from source into SXML + CDATA unless collapsed with XML_PARSE_NOCDATA.
namespace's document defined prefixes are ignored; the url is used as the canonical prefix if it is not in the namespaces mapping. Signals an error on invalid source.
options is fixnum denoting XML parsing options. Available options are:
Options values
XML_PARSE_RECOVER
XML_PARSE_NOENT
XML_PARSE_DTDLOAD
XML_PARSE_DTDATTR
XML_PARSE_DTDVALID
XML_PARSE_NOERROR
XML_PARSE_NOWARNING
XML_PARSE_PEDANTIC
XML_PARSE_NOBLANKS
XML_PARSE_XINCLUDE
XML_PARSE_NONET
XML_PARSE_NODICT
XML_PARSE_NSCLEAN
XML_PARSE_NOCDATA
XML_PARSE_NOXINCNODE
XML_PARSE_COMPACT
XML_PARSE_HUGE
Details of above flags can be found at LibXML reference documentation.
There's also default options set as parse-xml-default-options variable which currently is set to:
(fxior XML_PARSE_NOENT
XML_PARSE_NONET
XML_PARSE_NOBLANKS)
Examples
> (import :std/net/request)
> (import :std/xml)
> (parse-xml (request-text (http-get "https://www.w3schools.com/xml/note.xml")))
(*TOP* (note (to "Tove") (from "Jani") (heading "Reminder") (body "Don't forget me this weekend!")))
parse-html [libxml]
(parse-html source [url: "none.xml"] [encoding: (encoding "UTF-8")] [options: (options parse-xml-default-options)] [filter: (filter-els [])]) -> sxml | error
source := string | u8vector | port
url := string for base URL
encoding := string | #f; strings are always UTF-8 encoded for parsing.
options := fixnum (ior of libxml parser options)
filter := list of strings
Parses HTML data from given source into SXML + CDATA. source, encoding, url, options as above filter: list of strings, elements to be removed from the tree Signals an error on invalid source.
Available options values are:
Options values
HTML_PARSE_RECOVER
HTML_PARSE_NODEFDTD
HTML_PARSE_NOERROR
HTML_PARSE_NOWARNING
HTML_PARSE_PEDANTIC
HTML_PARSE_NOBLANKS
HTML_PARSE_NONET
HTML_PARSE_NOIMPLIED
HTML_PARSE_COMPACT
HTML_PARSE_IGNORE_ENC
Details of above flags can be found at LibXML reference documentation.
There's also default options set as parse-html-default-options variable which currently is set to:
(fxior HTML_PARSE_RECOVER
HTML_PARSE_NOERROR
HTML_PARSE_NOWARNING
HTML_PARSE_NONET
HTML_PARSE_NOBLANKS)
SXML Queries
sxpath
(sxpath path) -> sxml
path := list
Evaluate an abbreviated SXPath
sxpath:: AbbrPath -> Converter, or
sxpath:: AbbrPath -> Node|Nodeset -> Nodeset
AbbrPath is a list. It is translated to the full SXPath according to the following rewriting rules:
(sxpath '()) -> (node-join)
(sxpath '(path-component ...)) ->
(node-join (sxpath1 path-component) (sxpath '(...)))
(sxpath1 '//) -> (node-or
(node-self (node-typeof? '*any*))
(node-closure (node-typeof? '*any*)))
(sxpath1 '(equal? x)) -> (select-kids (node-equal? x))
(sxpath1 '(eq? x)) -> (select-kids (node-eq? x))
(sxpath1 ?symbol) -> (select-kids (node-typeof? ?symbol)
(sxpath1 procedure) -> procedure
(sxpath1 '(?symbol ...)) -> (sxpath1 '((?symbol) ...))
(sxpath1 '(path reducer ...)) ->
(node-reduce (sxpath path) (sxpathr reducer) ...)
(sxpathr number) -> (node-pos number)
(sxpathr path-filter) -> (filter (sxpath path-filter))
sxml-select
(sxml-select n predf [mapf = values]) -> sxml
n := sxml nodes
predf := predicate function
mapf := transform function
Collects all children from node n that satisfy a predicate predf; optionally transforms result with mapping function mapf once a node satisfies a predicate, its children are not traversed.
sxml-attributes
(sxml-attributes n) -> list | #f
n := sxml node
Returns the attributes of given node n or #f if node does have any attributes.
sxml-e
(sxml-e n) -> symbol | #f
n := sxml node
Returns the element type of node n or #f if no type is found.
sxml-find
(sxml-find n predf [mapf = values]) -> sxml
n := sxml nodes
predf := predicate function
mapf := transform function
Find the first child that satisfies a predicate predf, using depth-first search. Predicate predf is a lambda which takes an node as parameter and returns an boolean. If optional mapf is given the results satisfying predf are transformed with it.
sxml-select*
(sxml-select* n predf [mapf = values]) -> sxml
n := sxml nodes
predf := predicate function
mapf := transform function
Select from immediate children of node n using predicate function predf. Results satisfying predf are transformed if given optional mapping function mapf.
sxml-attribute-e
(sxml-attribute-e n key) -> any | #f
n := sxml node
key := string; node key
Returns the node n attribute value for given key or #f if value is not found.
sxml-attribute-getq
(sxml-attribute-getq key attrs) -> any
key := string; node key
attrs := alist?
attribute list => value
sxml-class?
(sxml-class? klass) -> lambda
klass := string; node class to match
returns dom class
sxml-find*
(sxml-find* n pred [mapf = values]) -> sxml | #f
n := sxml node
pred := predicate fn
mapf := transform fn
find in immediate children
sxml-e?
(sxml-e? el) -> lambda
el := sxml element
returns element type
sxml-id?
(sxml-id? id) -> lambda
id := sxml node id value
returns dom id
sxml-children
(sxml-children n) -> list
n := sxml node
returns nodes children as a list
sxml-find/context
(sxml-find/context n predf [mapf values]) -> sxml
n := sxml node
predf := predicate fn to match
mapf := transform fn to apply to matches
find with context
Printing
write-xml
(write-xml sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Writes given sxml data as XML into output port. Signals an error on invalid port.
write-html
(write-html sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Writes given sxml data as HTML to output port. Signals an error on invalid port.
print-sxml->html
(print-sxml->html sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Write given sxml into port after converting it to HTML. Indents the result to multiple lines.
print-sxml->html*
(print-sxml->html* sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Same as print-sxml->html
but skips formatting the result.
print-sxml->html-fast
(print-sxml->html-fast sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Same as print-sxml->html
but skips formatting the result.
print-sxml->xhtml
(print-sxml->xhtml sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Write given sxml into port after converting it to XHTML. Indents the result to multiple lines.
print-sxml->xhtml*
(print-sxml->xhtml* sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Same as print-sxml->xhtml
but skips formatting the result.
print-sxml->xhtml-fast
(print-sxml->xhtml-fast sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Same as print-sxml->xhtml
but skips formatting the result.
print-sxml->xml
(print-sxml->xml sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Write given sxml into port after converting it to XML. Indents the result to multiple lines.
print-sxml->xml*
(print-sxml->xml* sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Same as print-sxml->xml
but skips formatting the result.
print-sxml->xml-fast
(print-sxml->xml-fast sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Same as print-sxml->xml
but skips formatting the result.
pretty-print-sxml->xml-file
(pretty-print-sxml->xml-file item outpath [noblanks]) -> void
sxml := SXML nodes
port := output port
Serializes SXML data from sxml into XML and writes the result to a port.
This depends on external xmllint
program being in PATH.
pretty-print-sxml->xhtml-file
(pretty-print-sxml->xhtml-file sxml [port = (current-output-port)]) -> void
sxml := SXML nodes
port := output port
Serializes SXML data from sxml into XHTML and writes the result to a port.
This depends on external xmllint
program being in PATH.
sxml->html-string-fragment
(sxml->html-string-fragment item [maybe-level] [quote-char = #\"]) -> string
item := SXML nodes
maybe-level := #f | fixnum; how much to indent the result or skip indent if #f
quote-char := quote character to use
Serializes the given SXML nodes in item into HTML string and returns it as a string. If maybe-level is given the result is indented.
sxml->xhtml-string
(sxml->xhtml-string item [maybe-level] [quote-char = #\"]) -> string
item := SXML nodes
maybe-level := #f | fixnum; how much to indent the result or skip indent if #f
quote-char := quote character to use