Smartmatch Operator

Smartmatch Operator

First available in Perl 5.10.1 (the 5.10.0 version behaved differently), binary ~~ does a "smartmatch" between its arguments. This is mostly used implicitly in the when construct described in perlsyn, although not all when clauses call the smartmatch operator. Unique among all of Perl's operators, the smartmatch operator can recurse. The smartmatch operator is experimental and its behavior is subject to change.

It is also unique in that all other Perl operators impose a context (usually string or numeric context) on their operands, autoconverting those operands to those imposed contexts. In contrast, smartmatch infers contexts from the actual types of its operands and uses that type information to select a suitable comparison mechanism.

The ~~ operator compares its operands "polymorphically", determining how to compare them according to their actual types (numeric, string, array, hash, etc.) Like the equality operators with which it shares the same precedence, ~~ returns 1 for true and "" for false. It is often best read aloud as "in", "inside of", or "is contained in", because the left operand is often looked for inside the right operand. That makes the order of the operands to the smartmatch operand often opposite that of the regular match operator. In other words, the "smaller" thing is usually placed in the left operand and the larger one in the right.

The behavior of a smartmatch depends on what type of things its arguments are, as determined by the following table. The first row of the table whose types apply determines the smartmatch behavior. Because what actually happens is mostly determined by the type of the second operand, the table is sorted on the right operand instead of on the left.

Left      Right      Description and pseudocode                               
===============================================================
Any       undef      check whether Any is undefined                    
               like: !defined Any

Any       Object     invoke ~~ overloading on Object, or die

Right operand is an ARRAY:

Left      Right      Description and pseudocode                               
===============================================================
ARRAY1    ARRAY2     recurse on paired elements of ARRAY1 and ARRAY2[2]
               like: (ARRAY1[0] ~~ ARRAY2[0])
                       && (ARRAY1[1] ~~ ARRAY2[1]) && ...
HASH      ARRAY      any ARRAY elements exist as HASH keys             
               like: grep { exists HASH->{$_} } ARRAY
Regexp    ARRAY      any ARRAY elements pattern match Regexp
               like: grep { /Regexp/ } ARRAY
undef     ARRAY      undef in ARRAY                                    
               like: grep { !defined } ARRAY
Any       ARRAY      smartmatch each ARRAY element[3]                   
               like: grep { Any ~~ $_ } ARRAY

Right operand is a HASH:

Left      Right      Description and pseudocode                               
===============================================================
HASH1     HASH2      all same keys in both HASHes                      
               like: keys HASH1 ==
                        grep { exists HASH2->{$_} } keys HASH1
ARRAY     HASH       any ARRAY elements exist as HASH keys             
               like: grep { exists HASH->{$_} } ARRAY
Regexp    HASH       any HASH keys pattern match Regexp                
               like: grep { /Regexp/ } keys HASH
undef     HASH       always false (undef can't be a key)               
               like: 0 == 1
Any       HASH       HASH key existence                                
               like: exists HASH->{Any}

Right operand is CODE:

Left      Right      Description and pseudocode                               
===============================================================
ARRAY     CODE       sub returns true on all ARRAY elements[1]
               like: !grep { !CODE->($_) } ARRAY
HASH      CODE       sub returns true on all HASH keys[1]
               like: !grep { !CODE->($_) } keys HASH
Any       CODE       sub passed Any returns true              
               like: CODE->(Any)

Right operand is a Regexp:

Left      Right      Description and pseudocode                               
===============================================================
ARRAY     Regexp     any ARRAY elements match Regexp                   
               like: grep { /Regexp/ } ARRAY
HASH      Regexp     any HASH keys match Regexp                        
               like: grep { /Regexp/ } keys HASH
Any       Regexp     pattern match                                     
               like: Any =~ /Regexp/

Other:

Left      Right      Description and pseudocode                               
===============================================================
Object    Any        invoke ~~ overloading on Object,
                     or fall back to...

Any       Num        numeric equality                                  
                like: Any == Num
Num       nummy[4]    numeric equality
                like: Num == nummy
undef     Any        check whether undefined
                like: !defined(Any)
Any       Any        string equality                                   
                like: Any eq Any

Notes:

1.
Empty hashes or arrays match.
2.
That is, each element smartmatches the element of the same index in the other array.[3]
3.
If a circular reference is found, fall back to referential equality.
4.
Either an actual number, or a string that looks like one.

The smartmatch implicitly dereferences any non-blessed hash or array reference, so the HASH and ARRAY entries apply in those cases. For blessed references, the Object entries apply. Smartmatches involving hashes only consider hash keys, never hash values.

The "like" code entry is not always an exact rendition. For example, the smartmatch operator short-circuits whenever possible, but grep does not. Also, grep in scalar context returns the number of matches, but ~~ returns only true or false.

Unlike most operators, the smartmatch operator knows to treat undef specially:

use v5.10.1;
@array = (1, 2, 3, undef, 4, 5);
say "some elements undefined" if undef ~~ @array;

Each operand is considered in a modified scalar context, the modification being that array and hash variables are passed by reference to the operator, which implicitly dereferences them. Both elements of each pair are the same:

use v5.10.1;

my %hash = (red    => 1, blue   => 2, green  => 3,
            orange => 4, yellow => 5, purple => 6,
            black  => 7, grey   => 8, white  => 9);

my @array = qw(red blue green);

say "some array elements in hash keys" if  @array ~~  %hash;
say "some array elements in hash keys" if \@array ~~ \%hash;

say "red in array" if "red" ~~  @array;
say "red in array" if "red" ~~ \@array;

say "some keys end in e" if /e$/ ~~  %hash;
say "some keys end in e" if /e$/ ~~ \%hash;

Two arrays smartmatch if each element in the first array smartmatches (that is, is "in") the corresponding element in the second array, recursively.

use v5.10.1;
my @little = qw(red blue green);
my @bigger = ("red", "blue", [ "orange", "green" ] );
if (@little ~~ @bigger) {  # true!
    say "little is contained in bigger";
}

Because the smartmatch operator recurses on nested arrays, this will still report that "red" is in the array.

use v5.10.1;
my @array = qw(red blue green);
my $nested_array = [[[[[[[ @array ]]]]]]];
say "red in array" if "red" ~~ $nested_array;

If two arrays smartmatch each other, then they are deep copies of each others' values, as this example reports:

use v5.12.0;
my @a = (0, 1, 2, [3, [4, 5], 6], 7); 
my @b = (0, 1, 2, [3, [4, 5], 6], 7); 

if (@a ~~ @b && @b ~~ @a) {
    say "a and b are deep copies of each other";
} 
elsif (@a ~~ @b) {
    say "a smartmatches in b";
} 
elsif (@b ~~ @a) {
    say "b smartmatches in a";
} 
else {
    say "a and b don't smartmatch each other at all";
}

If you were to set $b[3] = 4 , then instead of reporting that "a and b are deep copies of each other", it now reports that "b smartmatches in a" . That's because the corresponding position in @a contains an array that (eventually) has a 4 in it.

Smartmatching one hash against another reports whether both contain the same keys, no more and no less. This could be used to see whether two records have the same field names, without caring what values those fields might have. For example:

use v5.10.1;
sub make_dogtag {
    state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };

    my ($class, $init_fields) = @_;

    die "Must supply (only) name, rank, and serial number"
        unless $init_fields ~~ $REQUIRED_FIELDS;

    ...
}

or, if other non-required fields are allowed, use ARRAY ~~ HASH:

use v5.10.1;
sub make_dogtag {
    state $REQUIRED_FIELDS = { name=>1, rank=>1, serial_num=>1 };

    my ($class, $init_fields) = @_;

    die "Must supply (at least) name, rank, and serial number"
        unless [keys %{$init_fields}] ~~ $REQUIRED_FIELDS;

    ...
}

The smartmatch operator is most often used as the implicit operator of a when clause. See the section on "Switch Statements" in perlsyn.

Smartmatching of Objects

To avoid relying on an object's underlying representation, if the smartmatch's right operand is an object that doesn't overload ~~ , it raises the exception "Smartmatching a non-overloaded object breaks encapsulation ". That's because one has no business digging around to see whether something is "in" an object. These are all illegal on objects without a ~~ overload:

 %hash ~~ $object
    42 ~~ $object
"fred" ~~ $object

However, you can change the way an object is smartmatched by overloading the ~~ operator. This is allowed to extend the usual smartmatch semantics. For objects that do have an ~~ overload, see overload.

Using an object as the left operand is allowed, although not very useful. Smartmatching rules take precedence over overloading, so even if the object in the left operand has smartmatch overloading, this will be ignored. A left operand that is a non-overloaded object falls back on a string or numeric comparison of whatever the ref operator returns. That means that

$object ~~ X

does not invoke the overload method with X as an argument. Instead the above table is consulted as normal, and based on the type of X, overloading may or may not be invoked. For simple strings or numbers, "in" becomes equivalent to this:

$object ~~ $number          ref($object) == $number
$object ~~ $string          ref($object) eq $string

For example, this reports that the handle smells IOish (but please don't really do this!):

use IO::Handle;
my $fh = IO::Handle->new();
if ($fh ~~ /\bIO\b/) {
    say "handle smells IOish";
}

That's because it treats $fh as a string like "IO::Handle=GLOB(0x8039e0)" , then pattern matches against that.

doc_perl
2016-12-06 03:27:12
Comments
Leave a Comment

Please login to continue.