WurML is provided as a diagnostic tool for WAP developers.
When run with a URL specified, it will load the indicated document. If that document contains links to other documents, those links will also be visited/traversed to the specified depth (default 3).
For links which refer to external documents (those not on the same host as the original document) each one will be visited but not traversed.
Please note that WurML does _not_ handle HDML documents. Conventionally, WAP gateways (which your phone/device use to access Internet resources) convert HDML into WML on the fly.
As desktop application typically do not have access to these gateways, HDML responses will generate warnings and not be traversed.
WurML runs with defaults of –depth=3, –agent=”Wurl”.
- no configuration
- perform no variable substitution on <INPUT> tags
- respect robots.txt and <META name=”robots”> tags
- traverse links to depth of 3 and not revisit previously visited links.
The general syntax for WurML is:
Usage: wurml [OPTIONS] url1 [url2 ...]
-c, –config=CONF : use specified configuration file
-d, –depth=INT :traverse urls to specified depth (default 3)
-a, –agent=STRING : set User-Agent (default ‘WurML’, overrides config)
-p, –proxy=HOST[:PORT] : use HTTP proxy
-s, –show-cookies : show cookies sent and received in output
-m, –dump : dump received WML documents to output
-i, –ignore-robots : ignore ‘robots.txt’ on server host
-r, –revisit : while traversing, return to previously fetched urls
-n, –no-validation : do not validate; document is still checked for well-formedness but not validated against DTD
-y, –dry-run : load first document and show urls; do not fetch them
-v, –version : show the version and exit
-u, –usage : show this message
Valid WML documents are checked for several common errors beyond basic syntax:
Local ‘$vars’ found in URL references or <postfield>’s are tested to see if there is a valid source for definition. The document is scanned for <SELECT>, <INPUT> and <SETVAR> tags with appropriate names which might provide values. If none of these are found, WurML checks to see if the specific variable has been defined in a earlier document. Variable definitions are retained throughout a session in the same manner that hand-held browsers are expected to. A warning is generated if no definition for a given variable is found. URLs which contain unresolved variables are skipped in the traversal chain.
References to local targets (“#target”) are checked to verify that a valid target with that name exists. Warnings are generated if there is either no tag with the specified id, there is more than one tag with the specified id or the target tag is not a <CARD>.
Image tags <IMG> are tested to confirm that the tag contains a ‘src’ attribute. If it is present, the image is loaded and the mime type of the result is checked. If it is not reachable or the resulting image is a type other than ‘image/vnd.wap.wbmp’, warnings are generated. A warning is also produced when no ‘alt’ text is provided.
Links are loaded from: <CARD ontimer=”…”, onenterforward=”…”, onenterbackward=”…”>, <OPTION onpick=”…”> and any valid tag with an ‘href’ attribrute. Tags with an ‘href’ and a ‘method’ attribute who’s value is ‘POST’ are scanned for enclosed <postfield> tags. These URLs are treated as ‘POST’.
WurML, by default, avoids revisting URLs already checked during a given session. A link is considered to be identical if the string representing the URL is the same and the cookies have not changed since it was last visited. A cookie which has been reassigned with identical values to the previous definition is not considered to have changed. If the order of the query parameters in the URL is different than a previous invocation, the URL is considered to be different.
WurML generally confines itself to the site on which the intial URL resides. Links which refer to external sites are loaded and validated but no sanity checks are performed nor are links within those external documents traversed.
When a URL contains variables, the link is visited with every available value for that variable substituted. For example, if an ‘href’ contains both ‘$yy’ and ‘$mm’ and 2 values are available for each (either through options in a <SELECT> or though a list of values configured in the wumrlconf.xml file) the link will be visited 4 times with every available combination of values.
The WurML is able to be customised to allow the developer to build test cases for a site.
In additon to custom configuration of user agent, default content encoding and url ignores, you can build powerful test cases for automated inputs.
Your custom configuration can be run with the command
wurml –config mywurmlconf.xml
An annotated example config file ‘wurmlconf-example.xml’ is provided. Feel free to copy this file and modify.
WurML uses no configuration file if none
is explicitly indicated on the command line.
<!DOCTYPE wurmlconf PUBLIC "-//SHADOWPLAY.NET//DTD WORMCONF 1.1//EN" "http://wurml.shadowplay.net/dtd/wurmlconf.0.1.dtd">
<!– default content encoding to use if the server doesn’t provide one
with the doument. Defaults to US-ASCII
We can only *support* encodings available to your JVM
<!– prefixes of urls to be ignored –>
<!– HTTP headers to be added to all requests –>
<!– the User-Agent field may be overriden on the command line –>
<!– variables to be used to fill in <INPUT>s found in documents.
The <url> will be matched as the prefix of the document <INPUT> tags
are found in, In the event of a conflict, the longest matching url
will be taken.