We recently added visualizations based on Self-Organizing Maps (SOM)
and Uniform Manifold Approximation and Projection (UMAP) for objects in
the `Mercator` package. This addition relies on the core
implementations in the `kohonen` and `umap` packages,
respectively. The main challenge that we faced was that both of those
implementations want to receive the raw data as inputs, while
`Mercator` objects only store a distance matrix computed from the
raw data. The purpose of this vignette is, first, to describe how we
worked around this restriction, and second, to illustrate how to use
these methods in the package.

First we load the package.

`suppressMessages( suppressWarnings( library(Mercator) ) )`

Next, we load a “fake” set of synthetic continuous data that comes with the Mercator package.

```
set.seed(36766)
data(fakedata)
ls()
```

```
## [1] "C" "Cl4" "Delta" "J" "N" "P"
## [7] "SV" "X" "fakeclin" "fakedata" "filename" "hclass"
## [13] "jacc.Vis" "kk" "mercury" "my.binmat" "my.clust" "my.data"
## [19] "neptune" "pts" "sokal.Vis" "tab" "temp.Vis" "venus"
```

`dim(fakedata)`

`## [1] 776 300`

`dim(fakeclin)`

`## [1] 300 4`

Let’s create a UMAP visualization directly from the raw data.

```
library(umap)
<- umap(t(fakedata))
udirect plot(udirect$layout, main="Original UMAP")
```

We aren’t going to do anything fancy with labels or colors in this plot; we just want an idea of the main structure in the data. In particular, it looks like there are seven clusters, divided into one group three and two groups of two each.

Next, we are going to create a `Meractor` object with a
multi-dimensional scaling (MDS) visualization. Inspired by the first
plot, we will arbitrarily assign seven groups. (This assignment uses the
PAM algorithm internally.)

```
<- Mercator(dist(t(fakedata)), "euclid", "mds", 7)
mercury plot(mercury, main = "MDS")
```

Interestingly, here we see a possibly different number of groups. But that’s a separate issue, and we don’t want to distracted from our main thread.

We want to confirm that the actual raw data is not contained in
`mercury`, just the distance matrix.

`summary(mercury)`

```
## An object of the 'Mercator' class, using the ' euclid ' metric, of size
## [1] 300 300
## Contains these visualizations: mds
```

`slotNames(mercury)`

`## [1] "metric" "distance" "view" "palette" "symbols" "clusters"`

`@metric mercury`

`## [1] "euclid"`

`@palette mercury`

```
## NC1 NC2 NC3 NC4 NC5 NC6 NC7 NC8
## "#2E91E5" "#E15F99" "#1CA71C" "#FB0D0D" "#DA16FF" "#222A2A" "#B68100" "#750D86"
## NC9 NC10 NC11 NC12 NC13 NC14 NC15 NC16
## "#EB663B" "#511CFB" "#00A08B" "#FB00D1" "#FC0080" "#B2828D" "#6C7C32" "#778AAE"
## NC17 NC18 NC19 NC20 NC21 NC22 NC23 NC24
## "#862A16" "#A777F1" "#620042" "#1616A7" "#DA60CA" "#6C4516" "#0D2A63" "#AF0038"
```

`@symbols mercury`

`## [1] 16 15 17 18 10 7 11 9`

`table(mercury@clusters)`

```
##
## 1 2 3 4 5 6 7
## 44 68 38 38 37 39 36
```

`class(mercury@view)`

`## [1] "list"`

`names(mercury@view)`

`## [1] "mds"`

`class(mercury@view[["hclust"]])`

`## [1] "NULL"`

`class(D <- mercury@distance)`

`## [1] "dist"`

Next, we add a “umap” visualization to this object.

```
<- addVisualization(mercury, "umap")
mercury plot(mercury, view = "umap", main="UMAP from distance matrix")
```

Notice that the
`plot` method knows that we want to see the internal
“`layout`” component of the `umap` object. It also
automatically assigns axis names that (subliminally?) remind us that we
computed this visualization with UMAP. And it colors the points using
the same values from the initial PAM clustering assignments.

As an aside, we point out that the PAM clustering doesn’t really match the implicit UMAP clustering of the data.

We obviously constructed this plot just from the distance matrix, not from the raw data. It neverheless has a structure essentially the same as the original one. But how?

Well, internally, we use this function:

`:::makeEuclidean Mercator`

```
## function (D)
## {
## M <- as.matrix(D)
## X <- scale(t(scale(t(M^2), scale = FALSE)), scale = FALSE)
## E <- eigen(-X/2, symmetric = TRUE)$values
## R <- min(sum(E > 1e-10), nrow(M) - 1)
## cmdscale(D, k = R)
## }
## <bytecode: 0x000001b8d32067f0>
## <environment: namespace:Mercator>
```

The first four lines of code inside this function are basically the
algorithm used by the `cmdscale` function to compute the
eigenvalues need to create its dimension reduction for visualization.
The count of the number of nonzero eigenvalues (which defines
`R`) is careful to stay away from zero. We do this to avoid
round-off errors. If we miss by one or two eignevalues, than the final
call to `cmdscale` will generate a confusing warning. That final
line, by the way, creates an embedding into Euclidean space that should
preserve almost all of the distances between pairs of points in the
original data set.

To see that, let’s carry out that step explicitly.

```
<- Mercator:::makeEuclidean(D)
X dim(X)
```

`## [1] 300 299`

Now we compute distances between pairs of points in this new embedding.

```
<- dist(X)
D2 <- as.matrix(D) - as.matrix(D2)
delta summary(as.vector(delta))
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.386e-13 -1.421e-14 0.000e+00 2.740e-17 1.421e-14 9.237e-14
```

We see that the absolute errors between any pair of points in the two distance computations are less than \(10^{-12}\). And that is the fundamental reason that we expect equivalent structures whether we apply UMAP to the original data or to the re-embedded data arising from the distance matrix.

The same approach works to compute self-organizing maps from a distance matrix. Here is the result using the data matrix.

```
<- addVisualization(mercury, "som",
mercury grid = kohonen::somgrid(6, 5, "hexagonal"))
plot(mercury, view = "som")
```

And here is the corresponding result using the original data.

```
library(kohonen)
<- som(t(fakedata), grid = somgrid(6, 5, "hexagonal"))
mysom plot(mysom, type = "mapping")
```

This time, we will work a little harder to color the plot in the same way.

`plot(mysom, type = "mapping", pchs=16, col=Mercator:::colv(mercury))`

Qualitatively, most of the SOM plots should be similar. The fundamental exception, however, is the “codes” plot. This plot shows the patterns of intensities in each of the underlying dimensions (averaged over the assigned objects in each node). Because we have performed a multi-dimensional scaling analysis, we have changed the underlying coordinate system. Here are the corresponding plots.

`plot(mysom, type = "codes", main="Original", shape = "straight")`

`plot(mercury, view = "som", type="codes", main="After MDS")`

As we have seen, when we start with a Euclidean distance matrix, we can get qualitatively equivalent results from the original data or from creating a “realization” of the distance matrix in a large Euclidean space. The major advantage, however, arises when we start with a completely different distance metric. In that case, we cannot apply SOM or UMAP at all. But we can still embed that distance matrix into a Eucldean space and then run SOM or UMAP.

This analaysis was performed in the following environment:

`sessionInfo()`

```
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kohonen_3.0.12 umap_0.2.10.0 Mercator_1.1.5
## [4] Thresher_1.1.4 PCDimension_1.1.13 ClassDiscovery_3.4.4
## [7] oompaBase_3.2.9 cluster_2.1.6
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 xfun_0.46 bslib_0.8.0
## [4] ggplot2_3.5.1 lattice_0.22-6 vctrs_0.6.5
## [7] tools_4.4.1 generics_0.1.3 stats4_4.4.1
## [10] flexmix_2.3-19 Polychrome_1.5.1 tibble_3.2.1
## [13] fansi_1.0.6 highr_0.11 pkgconfig_2.0.3
## [16] Matrix_1.7-0 KernSmooth_2.23-24 scatterplot3d_0.3-44
## [19] lifecycle_1.0.4 compiler_4.4.1 munsell_0.5.1
## [22] clue_0.3-65 movMF_0.2-8 htmltools_0.5.8.1
## [25] sass_0.4.9 yaml_2.3.10 pillar_1.9.0
## [28] jquerylib_0.1.4 MASS_7.3-60.2 openssl_2.2.0
## [31] cachem_1.1.0 viridis_0.6.5 mclust_6.1.1
## [34] RSpectra_0.16-2 cpm_2.3 tidyselect_1.2.1
## [37] digest_0.6.36 Rtsne_0.17 slam_0.1-52
## [40] dplyr_1.1.4 kernlab_0.9-32 changepoint_2.2.4
## [43] ade4_1.7-22 fastmap_1.2.0 grid_4.4.1
## [46] oompaData_3.1.3 colorspace_2.1-1 cli_3.6.3
## [49] magrittr_2.0.3 utf8_1.2.4 scales_1.3.0
## [52] rmarkdown_2.27 skmeans_0.2-16 igraph_2.0.3
## [55] nnet_7.3-19 reticulate_1.38.0 gridExtra_2.3
## [58] png_0.1-8 askpass_1.2.0 zoo_1.8-12
## [61] modeltools_0.2-23 evaluate_0.24.0 knitr_1.48
## [64] viridisLite_0.4.2 rlang_1.1.4 Rcpp_1.0.13
## [67] dendextend_1.17.1 glue_1.7.0 jsonlite_1.8.8
## [70] R6_2.5.1
```