Type:
Class
Constants:
LETTER : '[:alpha:]'
DIGIT : '[:digit:]'
COMBININGCHAR : ''
EXTENDER : ''
NCNAME_STR : "[#{LETTER}_:][-[:alnum:]._:#{COMBININGCHAR}#{EXTENDER}]*"
NAME_STR : "(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"
UNAME_STR : "(?:#{NCNAME_STR}:)?#{NCNAME_STR}"
NAMECHAR : '[\-\w\.:]'
NAME : "([\\w:]#{NAMECHAR}*)"
NMTOKEN : "(?:#{NAMECHAR})+"
NMTOKENS : "#{NMTOKEN}(\\s+#{NMTOKEN})*"
REFERENCE : "&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"
REFERENCE_RE : /#{REFERENCE}/
DOCTYPE_START : /\A\s*<!DOCTYPE\s/um
DOCTYPE_PATTERN : /\s*<!DOCTYPE\s+(.*?)(\[|>)/um
ATTRIBUTE_PATTERN : /\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\4/um
COMMENT_START : /\A<!--/u
COMMENT_PATTERN : /<!--(.*?)-->/um
CDATA_START : /\A<!\[CDATA\[/u
CDATA_END : /^\s*\]\s*>/um
CDATA_PATTERN : /<!\[CDATA\[(.*?)\]\]>/um
XMLDECL_START : /\A<\?xml\s/u;
XMLDECL_PATTERN : /<\?xml\s+(.*?)\?>/um
INSTRUCTION_START : /\A<\?/u
INSTRUCTION_PATTERN : /<\?(.*?)(\s+.*?)?\?>/um
TAG_MATCH : /^<((?>#{NAME_STR}))\s*((?>\s+#{UNAME_STR}\s*=\s*(["']).*?\5)*)\s*(\/)?>/um
CLOSE_MATCH : /^\s*<\/(#{NAME_STR})\s*>/um
VERSION : /\bversion\s*=\s*["'](.*?)['"]/um
ENCODING : /\bencoding\s*=\s*["'](.*?)['"]/um
STANDALONE : /\bstandalone\s*=\s*["'](.*?)['"]/um
ENTITY_START : /^\s*<!ENTITY/
IDENTITY : /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u
ELEMENTDECL_START : /^\s*<!ELEMENT/um
ELEMENTDECL_PATTERN : /^\s*(<!ELEMENT.*?)>/um
SYSTEMENTITY : /^\s*(%.*?;)\s*$/um
ENUMERATION : "\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"
NOTATIONTYPE : "NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"
ENUMERATEDTYPE : "(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"
ATTTYPE : "(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"
ATTVALUE : "(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"
DEFAULTDECL : "(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"
ATTDEF : "\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"
ATTDEF_RE : /#{ATTDEF}/
ATTLISTDECL_START : /^\s*<!ATTLIST/um
ATTLISTDECL_PATTERN : /^\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um
NOTATIONDECL_START : /^\s*<!NOTATION/um
PUBLIC : /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um
SYSTEM : /^\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um
TEXT_PATTERN : /\A([^<]*)/um
PUBIDCHAR : "\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"

Entity constants

SYSTEMLITERAL : %Q{((?:"[^"]*")|(?:'[^']*'))}
PUBIDLITERAL : %Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}
EXTERNALID : "(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
NDATADECL : "\\s+NDATA\\s+#{NAME}"
PEREFERENCE : "%#{NAME};"
ENTITYVALUE : %Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
PEDEF : "(?:#{ENTITYVALUE}|#{EXTERNALID})"
ENTITYDEF : "(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
PEDECL : "<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
GEDECL : "<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
ENTITYDECL : /\s*(?:#{GEDECL})|(?:#{PEDECL})/um
EREFERENCE : /&(?!#{NAME};)/
DEFAULT_ENTITIES : { 'gt' => [/&gt;/, '&gt;', '>', />/], 'lt' => [/&lt;/, '&lt;', '<', /</], 'quot' => [/&quot;/, '&quot;', '"', /"/], "apos" => [/&apos;/, "&apos;", "'", /'/] }
MISSING_ATTRIBUTE_QUOTES : /^<#{NAME_STR}\s+#{NAME_STR}\s*=\s*[^"']/um

These are patterns to identify common markup errors, to make the error messages more informative.

Using the Pull Parser

This API is experimental, and subject to change.

parser = PullParser.new( "<a>text<b att='val'/>txet</a>" )
while parser.has_next?
  res = parser.next
  puts res[1]['att'] if res.start_tag? and res[0] == 'b'
end

See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.

Notice that:

parser = PullParser.new( "<a>BAD DOCUMENT" )
while parser.has_next?
  res = parser.next
  raise res[1] if res.error?
end

Nat Price gave me some good ideas for the API.

entity

entity( reference, entities ) Instance Public methods

2015-05-07 04:05:57
stream=

stream=( source ) Instance Public methods

2015-05-07 04:26:08
normalize

normalize( input, entities=nil, entity_filter=nil ) Instance Public methods Escapes

2015-05-07 04:10:28
has_next?

has_next?() Instance Public methods Returns true if there are more events.

2015-05-07 04:07:33
new

new( source ) Class Public methods

2015-05-07 03:54:40
unnormalize

unnormalize( string, entities=nil, filter=nil ) Instance Public methods Unescapes

2015-05-07 04:33:09
unshift

unshift(token) Instance Public methods Push an event back on the head of the

2015-05-07 04:36:30
position

position() Instance Public methods

2015-05-07 04:18:39
add_listener

add_listener( listener ) Instance Public methods

2015-05-07 03:57:03
pull

pull() Instance Public methods Returns the next event. This is a PullEvent

2015-05-07 04:23:55