R/similarityweight.R
similarityweight.Rd
Calculate the similarity weight for a set of observations, based on their distance from some arbitrary points in data space. Observations which are very similar to the point under consideration are given weight 1, while observations which are dissimilar to the point are given weight zero.
similarityweight(
x,
data,
threshold = 1,
distance = "euclidean",
lambda = NULL,
scale = TRUE
)
A dataframe describing arbitrary points in the space of the data
(i.e., with same colnames
as data
).
A dataframe representing observed data.
Threshold distance outside which observations will be assigned similarity weight zero. This is numeric and should be > 0. Defaults to 1.
The type of distance measure to be used, currently just three
types of Minkowski distance: "euclidean"
(default),
"maxnorm"
, "manhattan"
and also "gower"
A constant to multiply by the number of categorical
mismatches, before adding to the Minkowski distance, to give a general
dissimilarity measure. If left NULL
, behaves as though lambda
is set larger than threshold
, meaning that one factor mismatch
guarantees zero weight.
defaults to TRUE, in which case numeric variables are scaled to unit sd.
A numeric vector or matrix, with values from 0 to 1. The similarity
weights for the observations in data
arranged in rows for each row
in x
.
Similarity weight is assigned to observations based on their distance from a given point. The distance is calculated as Minkowski distance between the numeric elements for the observations whose categorical elements match, or else the Gower distance.
O'Connell M, Hurley CB and Domijan K (2017). ``Conditional Visualization for Statistical Models: An Introduction to the condvis Package in R.''Journal of Statistical Software, 81(5), pp. 1-20. <URL:http://dx.doi.org/10.18637/jss.v081.i05>.
## Say we want to find observations similar to the first observation.
## The first observation is identical to itself, so it gets weight 1. The
## second observation is similar, so it gets some weight. The rest are more
## different, and so get zero weight.
data(mtcars)
similarityweight(x = mtcars[1, ], data = mtcars)
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 1.0000000 0.5924101 0.0000000 0.0000000
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 0.0000000 0.0000000 0.0000000 0.0000000
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 0.0000000 0.0000000 0.0000000 0.0000000
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 0.0000000 0.0000000 0.0000000 0.0000000
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 0.0000000 0.0000000 0.0000000 0.0000000
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 0.0000000 0.0000000 0.0000000 0.0000000
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 0.0000000 0.0000000 0.0000000 0.0000000
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 0.0000000 0.0000000 0.0000000 0.0000000
## By increasing the threshold, we can find observations which are more
## approximately similar to the first row. Note that the second observation
## now has weight 1, so we lose some ability to discern how similar
## observations are by increasing the threshold.
similarityweight(x = mtcars[1, ], data = mtcars, threshold = 5)
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> 1.000000000 0.918482015 0.351387122 0.119726982
#> Hornet Sportabout Valiant Duster 360 Merc 240D
#> 0.223929163 0.031252099 0.162084244 0.200548806
#> Merc 230 Merc 280 Merc 280C Merc 450SE
#> 0.016452494 0.372445765 0.341439906 0.228739296
#> Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
#> 0.254706565 0.228247461 0.000000000 0.000000000
#> Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
#> 0.000000000 0.195133880 0.189331752 0.131099997
#> Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
#> 0.133932725 0.178208426 0.247953822 0.176162810
#> Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
#> 0.165570572 0.277782788 0.481028156 0.288120874
#> Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> 0.275217274 0.556533264 0.004848661 0.418874538
## Can provide a number of points to 'x'. Here we see that the Mazda RX4 Wag
## is more similar to the Merc 280 than the Mazda RX4 is.
similarityweight(mtcars[1:2, ], mtcars, threshold = 3)
#> Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
#> Mazda RX4 1.0000000 0.8641367 0 0
#> Mazda RX4 Wag 0.8641367 1.0000000 0 0
#> Hornet Sportabout Valiant Duster 360 Merc 240D Merc 230
#> Mazda RX4 0 0 0 0 0
#> Mazda RX4 Wag 0 0 0 0 0
#> Merc 280 Merc 280C Merc 450SE Merc 450SL Merc 450SLC
#> Mazda RX4 0.000000000 0 0 0 0
#> Mazda RX4 Wag 0.003922046 0 0 0 0
#> Cadillac Fleetwood Lincoln Continental Chrysler Imperial Fiat 128
#> Mazda RX4 0 0 0 0
#> Mazda RX4 Wag 0 0 0 0
#> Honda Civic Toyota Corolla Toyota Corona Dodge Challenger
#> Mazda RX4 0 0 0 0
#> Mazda RX4 Wag 0 0 0 0
#> AMC Javelin Camaro Z28 Pontiac Firebird Fiat X1-9 Porsche 914-2
#> Mazda RX4 0 0 0 0 0.1350469
#> Mazda RX4 Wag 0 0 0 0 0.1136252
#> Lotus Europa Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
#> Mazda RX4 0 0 0.2608888 0 0.03145756
#> Mazda RX4 Wag 0 0 0.2297539 0 0.07066548