-
class sklearn.neighbors.DistanceMetric
-
DistanceMetric class
This class provides a uniform interface to fast distance metric functions. The various metrics can be accessed via the
get_metric
class method and the metric string identifier (see below). For example, to use the Euclidean distance:>>> dist = DistanceMetric.get_metric('euclidean') >>> X = [[0, 1, 2], [3, 4, 5]]) >>> dist.pairwise(X) array([[ 0. , 5.19615242], [ 5.19615242, 0. ]])
Available Metrics The following lists the string metric identifiers and the associated distance metric classes:
Metrics intended for real-valued vector spaces:
identifier class name args distance function ?euclidean? EuclideanDistance sqrt(sum((x - y)^2))
?manhattan? ManhattanDistance sum(|x - y|)
?chebyshev? ChebyshevDistance max(|x - y|)
?minkowski? MinkowskiDistance p sum(|x - y|^p)^(1/p)
?wminkowski? WMinkowskiDistance p, w sum(w * |x - y|^p)^(1/p)
?seuclidean? SEuclideanDistance V sqrt(sum((x - y)^2 / V))
?mahalanobis? MahalanobisDistance V or VI sqrt((x - y)' V^-1 (x - y))
Metrics intended for two-dimensional vector spaces: Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.
identifier class name distance function ?haversine? HaversineDistance - 2 arcsin(sqrt(sin^2(0.5*dx)
-
- cos(x1)cos(x2)sin^2(0.5*dy)))
Metrics intended for integer-valued vector spaces: Though intended for integer-valued vectors, these are also valid metrics in the case of real-valued vectors.
identifier class name distance function ?hamming? HammingDistance N_unequal(x, y) / N_tot
?canberra? CanberraDistance sum(|x - y| / (|x| + |y|))
?braycurtis? BrayCurtisDistance sum(|x - y|) / (sum(|x|) + sum(|y|))
Metrics intended for boolean-valued vector spaces: Any nonzero entry is evaluated to ?True?. In the listings below, the following abbreviations are used:
- N : number of dimensions
- NTT : number of dims in which both values are True
- NTF : number of dims in which the first value is True, second is False
- NFT : number of dims in which the first value is False, second is True
- NFF : number of dims in which both values are False
- NNEQ : number of non-equal dimensions, NNEQ = NTF + NFT
- NNZ : number of nonzero dimensions, NNZ = NTF + NFT + NTT
identifier class name distance function ?jaccard? JaccardDistance NNEQ / NNZ ?matching? MatchingDistance NNEQ / N ?dice? DiceDistance NNEQ / (NTT + NNZ) ?kulsinski? KulsinskiDistance (NNEQ + N - NTT) / (NNEQ + N) ?rogerstanimoto? RogersTanimotoDistance 2 * NNEQ / (N + NNEQ) ?russellrao? RussellRaoDistance NNZ / N ?sokalmichener? SokalMichenerDistance 2 * NNEQ / (N + NNEQ) ?sokalsneath? SokalSneathDistance NNEQ / (NNEQ + 0.5 * NTT) User-defined distance:
identifier class name args ?pyfunc? PyFuncDistance func Here
func
is a function which takes two one-dimensional numpy arrays, and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties- Non-negativity: d(x, y) >= 0
- Identity: d(x, y) = 0 if and only if x == y
- Symmetry: d(x, y) = d(y, x)
- Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)
Because of the Python object overhead involved in calling the python function, this will be fairly slow, but it will have the same scaling as other distances.
Methods
dist_to_rdist
Convert the true distance to the reduced distance. get_metric
Get the given distance metric from the string identifier. pairwise
Compute the pairwise distances between X and Y rdist_to_dist
Convert the Reduced distance to the true distance. -
__init__()
-
x.__init__(...) initializes x; see help(type(x)) for signature
-
dist_to_rdist()
-
Convert the true distance to the reduced distance.
The reduced distance, defined for some metrics, is a computationally more efficent measure which preserves the rank of the true distance. For example, in the Euclidean distance metric, the reduced distance is the squared-euclidean distance.
-
get_metric()
-
Get the given distance metric from the string identifier.
See the docstring of DistanceMetric for a list of available metrics.
Parameters: metric : string or class name
The distance metric to use
**kwargs :
additional arguments will be passed to the requested metric
-
pairwise()
-
Compute the pairwise distances between X and Y
This is a convenience routine for the sake of testing. For many metrics, the utilities in scipy.spatial.distance.cdist and scipy.spatial.distance.pdist will be faster.
Parameters: X : array_like
Array of shape (Nx, D), representing Nx points in D dimensions.
Y : array_like (optional)
Array of shape (Ny, D), representing Ny points in D dimensions. If not specified, then Y=X.
Returns :
??- :
dist : ndarray
The shape (Nx, Ny) array of pairwise distances between points in X and Y.
-
rdist_to_dist()
-
Convert the Reduced distance to the true distance.
The reduced distance, defined for some metrics, is a computationally more efficent measure which preserves the rank of the true distance. For example, in the Euclidean distance metric, the reduced distance is the squared-euclidean distance.
neighbors.DistanceMetric
2017-01-15 04:24:34
Please login to continue.