Calculate the similarity weight for a set of observations

Calculate the similarity weight for a set of observations, based on their distance from some arbitrary points in data space. Observations which are very similar to the point under consideration are given weight 1, while observations which are dissimilar to the point are given weight zero.

similarityweight(
  x,
  data,
  threshold = 1,
  distance = "euclidean",
  lambda = NULL,
  scale = TRUE
)

Arguments

x: A dataframe describing arbitrary points in the space of the data (i.e., with same colnames as data).
data: A dataframe representing observed data.
threshold: Threshold distance outside which observations will be assigned similarity weight zero. This is numeric and should be > 0. Defaults to 1.
distance: The type of distance measure to be used, currently just three types of Minkowski distance: "euclidean" (default), "maxnorm", "manhattan" and also "gower"
lambda: A constant to multiply by the number of categorical mismatches, before adding to the Minkowski distance, to give a general dissimilarity measure. If left NULL, behaves as though lambda is set larger than threshold, meaning that one factor mismatch guarantees zero weight.
scale: defaults to TRUE, in which case numeric variables are scaled to unit sd.

Value

A numeric vector or matrix, with values from 0 to 1. The similarity weights for the observations in data arranged in rows for each row in x.

Details

Similarity weight is assigned to observations based on their distance from a given point. The distance is calculated as Minkowski distance between the numeric elements for the observations whose categorical elements match, or else the Gower distance.

References

O'Connell M, Hurley CB and Domijan K (2017). ``Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R.''Journal of Statistical Software, 81(5), pp. 1-20. <URL:http://dx.doi.org/10.18637/jss.v081.i05>.

Examples

## Say we want to find observations similar to the first observation.
## The first observation is identical to itself, so it gets weight 1. The
## second observation is similar, so it gets some weight. The rest are more
## different, and so get zero weight.

data(mtcars)
similarityweight(x = mtcars[1, ], data = mtcars)
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>           1.0000000           0.5924101           0.0000000           0.0000000 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>           0.0000000           0.0000000           0.0000000           0.0000000 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>           0.0000000           0.0000000           0.0000000           0.0000000 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>           0.0000000           0.0000000           0.0000000           0.0000000 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>           0.0000000           0.0000000           0.0000000           0.0000000 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>           0.0000000           0.0000000           0.0000000           0.0000000 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>           0.0000000           0.0000000           0.0000000           0.0000000 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>           0.0000000           0.0000000           0.0000000           0.0000000 

## By increasing the threshold, we can find observations which are more
## approximately similar to the first row. Note that the second observation
## now has weight 1, so we lose some ability to discern how similar
## observations are by increasing the threshold.

similarityweight(x = mtcars[1, ], data = mtcars, threshold = 5)
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>         1.000000000         0.918482015         0.351387122         0.119726982 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>         0.223929163         0.031252099         0.162084244         0.200548806 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>         0.016452494         0.372445765         0.341439906         0.228739296 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>         0.254706565         0.228247461         0.000000000         0.000000000 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>         0.000000000         0.195133880         0.189331752         0.131099997 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>         0.133932725         0.178208426         0.247953822         0.176162810 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>         0.165570572         0.277782788         0.481028156         0.288120874 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>         0.275217274         0.556533264         0.004848661         0.418874538 

## Can provide a number of points to 'x'. Here we see that the Mazda RX4 Wag
## is more similar to the Merc 280 than the Mazda RX4 is.

similarityweight(mtcars[1:2, ], mtcars, threshold = 3)
#>               Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> Mazda RX4     1.0000000     0.8641367          0              0
#> Mazda RX4 Wag 0.8641367     1.0000000          0              0
#>               Hornet Sportabout Valiant Duster 360 Merc 240D Merc 230
#> Mazda RX4                     0       0          0         0        0
#> Mazda RX4 Wag                 0       0          0         0        0
#>                  Merc 280 Merc 280C Merc 450SE Merc 450SL Merc 450SLC
#> Mazda RX4     0.000000000         0          0          0           0
#> Mazda RX4 Wag 0.003922046         0          0          0           0
#>               Cadillac Fleetwood Lincoln Continental Chrysler Imperial Fiat 128
#> Mazda RX4                      0                   0                 0        0
#> Mazda RX4 Wag                  0                   0                 0        0
#>               Honda Civic Toyota Corolla Toyota Corona Dodge Challenger
#> Mazda RX4               0              0             0                0
#> Mazda RX4 Wag           0              0             0                0
#>               AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9 Porsche 914-2
#> Mazda RX4               0          0                0         0     0.1350469
#> Mazda RX4 Wag           0          0                0         0     0.1136252
#>               Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> Mazda RX4                0              0    0.2608888             0 0.03145756
#> Mazda RX4 Wag            0              0    0.2297539             0 0.07066548