#primitive_errinfo returns important information regarding the last error as a 5-element array:
1 | [result, enc1, enc2, error_bytes, readagain_bytes] |
result is the last result of primitive_convert.
Other elements are only meaningful when result is :invalid_byte_sequence, :incomplete_input or :undefined_conversion.
enc1 and enc2 indicate a conversion step as a pair of strings. For example, a converter from EUC-JP to ISO-8859-1 converts a string as follows: EUC-JP -> UTF-8 -> ISO-8859-1. So [enc1, enc2] is either [âEUC-JPâ, âUTF-8â] or [âUTF-8â, âISO-8859-1â].
error_bytes and readagain_bytes indicate the byte sequences which caused the error. error_bytes is discarded portion. readagain_bytes is buffered portion which is read again on next conversion.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | # \xff is invalid as EUC-JP. ec = Encoding::Converter. new ( "EUC-JP" , "Shift_JIS" ) ec.primitive_convert(src= "\xff" , dst= "" , nil , 10 ) p ec.primitive_errinfo #=> [:invalid_byte_sequence, "EUC-JP", "UTF-8", "\xFF", ""] # HIRAGANA LETTER A (\xa4\xa2 in EUC-JP) is not representable in ISO-8859-1. # Since this error is occur in UTF-8 to ISO-8859-1 conversion, # error_bytes is HIRAGANA LETTER A in UTF-8 (\xE3\x81\x82). ec = Encoding::Converter. new ( "EUC-JP" , "ISO-8859-1" ) ec.primitive_convert(src= "\xa4\xa2" , dst= "" , nil , 10 ) p ec.primitive_errinfo #=> [:undefined_conversion, "UTF-8", "ISO-8859-1", "\xE3\x81\x82", ""] # partial character is invalid ec = Encoding::Converter. new ( "EUC-JP" , "ISO-8859-1" ) ec.primitive_convert(src= "\xa4" , dst= "" , nil , 10 ) p ec.primitive_errinfo #=> [:incomplete_input, "EUC-JP", "UTF-8", "\xA4", ""] # Encoding::Converter::PARTIAL_INPUT prevents invalid errors by # partial characters. ec = Encoding::Converter. new ( "EUC-JP" , "ISO-8859-1" ) ec.primitive_convert(src= "\xa4" , dst= "" , nil , 10 , Encoding::Converter:: PARTIAL_INPUT ) p ec.primitive_errinfo #=> [:source_buffer_empty, nil, nil, nil, nil] # \xd8\x00\x00@ is invalid as UTF-16BE because # no low surrogate after high surrogate (\xd8\x00). # It is detected by 3rd byte (\00) which is part of next character. # So the high surrogate (\xd8\x00) is discarded and # the 3rd byte is read again later. # Since the byte is buffered in ec, it is dropped from src. ec = Encoding::Converter. new ( "UTF-16BE" , "UTF-8" ) ec.primitive_convert(src= "\xd8\x00\x00@" , dst= "" , nil , 10 ) p ec.primitive_errinfo #=> [:invalid_byte_sequence, "UTF-16BE", "UTF-8", "\xD8\x00", "\x00"] p src #=> "@" # Similar to UTF-16BE, \x00\xd8@\x00 is invalid as UTF-16LE. # The problem is detected by 4th byte. ec = Encoding::Converter. new ( "UTF-16LE" , "UTF-8" ) ec.primitive_convert(src= "\x00\xd8@\x00" , dst= "" , nil , 10 ) p ec.primitive_errinfo #=> [:invalid_byte_sequence, "UTF-16LE", "UTF-8", "\x00\xD8", "@\x00"] p src #=> "" |
Please login to continue.