Type:
Class
Constants:
LETTER
:
'[:alpha:]'
DIGIT
:
'[:digit:]'
COMBININGCHAR
:
''
EXTENDER
:
''
NCNAME_STR
:
"[#{LETTER}_:][-[:alnum:]._:#{COMBININGCHAR}#{EXTENDER}]*"
NAME_STR
:
"(?:(#{NCNAME_STR}):)?(#{NCNAME_STR})"
UNAME_STR
:
"(?:#{NCNAME_STR}:)?#{NCNAME_STR}"
NAMECHAR
:
'[\-\w\.:]'
NAME
:
"([\\w:]#{NAMECHAR}*)"
NMTOKEN
:
"(?:#{NAMECHAR})+"
NMTOKENS
:
"#{NMTOKEN}(\\s+#{NMTOKEN})*"
REFERENCE
:
"&(?:#{NAME};|#\\d+;|#x[0-9a-fA-F]+;)"
REFERENCE_RE
:
/#{REFERENCE}/
DOCTYPE_START
:
/\A\s*<!DOCTYPE\s/um
DOCTYPE_PATTERN
:
/\s*<!DOCTYPE\s+(.*?)(\[|>)/um
ATTRIBUTE_PATTERN
:
/\s*(#{NAME_STR})\s*=\s*(["'])(.*?)\4/um
COMMENT_START
:
/\A<!--/u
COMMENT_PATTERN
:
/<!--(.*?)-->/um
CDATA_START
:
/\A<!\[CDATA\[/u
CDATA_END
:
/^\s*\]\s*>/um
CDATA_PATTERN
:
/<!CDATA\[(.∗?)\]>/um
XMLDECL_START
:
/\A<\?xml\s/u;
XMLDECL_PATTERN
:
/<\?xml\s+(.*?)\?>/um
INSTRUCTION_START
:
/\A<\?/u
INSTRUCTION_PATTERN
:
/<\?(.*?)(\s+.*?)?\?>/um
TAG_MATCH
:
/^<((?>#{NAME_STR}))\s*((?>\s+#{UNAME_STR}\s*=\s*(["']).*?\5)*)\s*(\/)?>/um
CLOSE_MATCH
:
/^\s*<\/(#{NAME_STR})\s*>/um
VERSION
:
/\bversion\s*=\s*["'](.*?)['"]/um
ENCODING
:
/\bencoding\s*=\s*["'](.*?)['"]/um
STANDALONE
:
/\bstandalone\s*=\s*["'](.*?)['"]/um
ENTITY_START
:
/^\s*<!ENTITY/
IDENTITY
:
/^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u
ELEMENTDECL_START
:
/^\s*<!ELEMENT/um
ELEMENTDECL_PATTERN
:
/^\s*(<!ELEMENT.*?)>/um
SYSTEMENTITY
:
/^\s*(%.*?;)\s*$/um
ENUMERATION
:
"\\(\\s*#{NMTOKEN}(?:\\s*\\|\\s*#{NMTOKEN})*\\s*\\)"
NOTATIONTYPE
:
"NOTATION\\s+\\(\\s*#{NAME}(?:\\s*\\|\\s*#{NAME})*\\s*\\)"
ENUMERATEDTYPE
:
"(?:(?:#{NOTATIONTYPE})|(?:#{ENUMERATION}))"
ATTTYPE
:
"(CDATA|ID|IDREF|IDREFS|ENTITY|ENTITIES|NMTOKEN|NMTOKENS|#{ENUMERATEDTYPE})"
ATTVALUE
:
"(?:\"((?:[^<&\"]|#{REFERENCE})*)\")|(?:'((?:[^<&']|#{REFERENCE})*)')"
DEFAULTDECL
:
"(#REQUIRED|#IMPLIED|(?:(#FIXED\\s+)?#{ATTVALUE}))"
ATTDEF
:
"\\s+#{NAME}\\s+#{ATTTYPE}\\s+#{DEFAULTDECL}"
ATTDEF_RE
:
/#{ATTDEF}/
ATTLISTDECL_START
:
/^\s*<!ATTLIST/um
ATTLISTDECL_PATTERN
:
/^\s*<!ATTLIST\s+#{NAME}(?:#{ATTDEF})*\s*>/um
NOTATIONDECL_START
:
/^\s*<!NOTATION/um
PUBLIC
:
/^\s*<!NOTATION\s+(\w[\-\w]*)\s+(PUBLIC)\s+(["'])(.*?)\3(?:\s+(["'])(.*?)\5)?\s*>/um
SYSTEM
:
/^\s*<!NOTATION\s+(\w[\-\w]*)\s+(SYSTEM)\s+(["'])(.*?)\3\s*>/um
TEXT_PATTERN
:
/\A([^<]*)/um
PUBIDCHAR
:
"\x20\x0D\x0Aa-zA-Z0-9\\-()+,./:=?;!*@$_%#"
Entity constants
SYSTEMLITERAL
:
%Q{((?:"[^"]*")|(?:'[^']*'))}
PUBIDLITERAL
:
%Q{("[#{PUBIDCHAR}']*"|'[#{PUBIDCHAR}]*')}
EXTERNALID
:
"(?:(?:(SYSTEM)\\s+#{SYSTEMLITERAL})|(?:(PUBLIC)\\s+#{PUBIDLITERAL}\\s+#{SYSTEMLITERAL}))"
NDATADECL
:
"\\s+NDATA\\s+#{NAME}"
PEREFERENCE
:
"%#{NAME};"
ENTITYVALUE
:
%Q{((?:"(?:[^%&"]|#{PEREFERENCE}|#{REFERENCE})*")|(?:'([^%&']|#{PEREFERENCE}|#{REFERENCE})*'))}
PEDEF
:
"(?:#{ENTITYVALUE}|#{EXTERNALID})"
ENTITYDEF
:
"(?:#{ENTITYVALUE}|(?:#{EXTERNALID}(#{NDATADECL})?))"
PEDECL
:
"<!ENTITY\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
GEDECL
:
"<!ENTITY\\s+#{NAME}\\s+#{ENTITYDEF}\\s*>"
ENTITYDECL
:
/\s*(?:#{GEDECL})|(?:#{PEDECL})/um
EREFERENCE
:
/&(?!#{NAME};)/
DEFAULT_ENTITIES
:
{
'gt' => [/>/, '>', '>', />/],
'lt' => [/</, '<', '<', /</],
'quot' => [/"/, '"', '"', /"/],
"apos" => [/'/, "'", "'", /'/]
}
MISSING_ATTRIBUTE_QUOTES
:
/^<#{NAME_STR}\s+#{NAME_STR}\s*=\s*[^"']/um
These are patterns to identify common markup errors, to make the error messages more informative.
Using the Pull Parser
This API is experimental, and subject to change.
1 2 3 4 5 | parser = PullParser. new ( "<a>text<b att='val'/>txet</a>" ) while parser.has_next? res = parser. next puts res[ 1 ][ 'att' ] if res.start_tag? and res[ 0 ] == 'b' end |
See the PullEvent class for information on the content of the results. The data is identical to the arguments passed for the various events to the StreamListener API.
Notice that:
1 2 3 4 5 | parser = PullParser. new ( "<a>BAD DOCUMENT" ) while parser.has_next? res = parser. next raise res[ 1 ] if res.error? end |
Nat Price gave me some good ideas for the API.