This walks through the algorithm for constructing a model from 2-D embedding data and visualising it alongside high-dimensional data. The process involves two major steps:
To begin, we preprocess the 2-D embedding and create hexagonal bins over the layout.
Next, we extract the centroid coordinates and standardised bin counts. These will be used to identify densely populated regions in the 2-D space.
## To extract all bin centroids with bin counts
df_bin_centroids <- merge_hexbin_centroids(centroids_data = all_centroids_df,
counts_data = counts_df)
benchmark_highdens <- 0
## To extract high-densed bins
model_2d <- df_bin_centroids |>
dplyr::filter(n_h > benchmark_highdens)
glimpse(model_2d)
#> Rows: 251
#> Columns: 5
#> $ h <int> 58, 68, 69, 70, 71, 72, 73, 78, 79, 90, 91, 92, 93, 94, 95, 96, 98…
#> $ c_x <dbl> 0.7804473, 0.1641342, 0.2228307, 0.2815272, 0.3402237, 0.3989202, …
#> $ c_y <dbl> -0.01401484, 0.03681781, 0.03681781, 0.03681781, 0.03681781, 0.036…
#> $ n_h <dbl> 4, 1, 5, 6, 9, 5, 3, 3, 6, 7, 5, 2, 1, 5, 9, 1, 1, 5, 6, 10, 1, 4,…
#> $ w_h <dbl> 0.004, 0.001, 0.005, 0.006, 0.009, 0.005, 0.003, 0.003, 0.006, 0.0…
We then triangulate the hexagon centroids to build a wireframe of neighborhood relationships.
## Wireframe
tr_object <- tri_bin_centroids(centroids_data = df_bin_centroids)
str(tr_object)
#> List of 2
#> $ trimesh_object:List of 11
#> ..$ n : int 588
#> ..$ x : num [1:588] -0.1 -0.0413 0.0174 0.0761 0.1348 ...
#> ..$ y : num [1:588] -0.116 -0.116 -0.116 -0.116 -0.116 ...
#> ..$ nt : int 1106
#> ..$ trlist: int [1:1106, 1:9] 1 23 43 22 22 3 45 23 23 64 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:9] "i1" "i2" "i3" "j1" ...
#> ..$ cclist: num [1:1106, 1:5] -0.0707 -0.0413 -0.1293 -0.0707 -0.0413 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:5] "x" "y" "r" "area" ...
#> ..$ nchull: int 68
#> ..$ chull : int [1:68] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ narcs : int 1693
#> ..$ arcs : int [1:1693, 1:2] 2 22 1 2 23 22 43 44 22 23 ...
#> .. ..- attr(*, "dimnames")=List of 2
#> .. .. ..$ : NULL
#> .. .. ..$ : chr [1:2] "from" "to"
#> ..$ call : language tri.mesh(x = centroids_data[["c_x"]], y = centroids_data[["c_y"]])
#> ..- attr(*, "class")= chr "triSht"
#> $ n_h : num [1:588] 0 0 0 0 0 0 0 0 0 0 ...
Using the triangulation object, we generate edges between centroids. We retain only edges connecting densely populated bins.
trimesh_data <- gen_edges(tri_object = tr_object, a1 = hb_obj$a1) |>
dplyr::filter(from_count > benchmark_highdens,
to_count > benchmark_highdens)
## Update the edge indexes to start from 1
trimesh_data <- update_trimesh_index(trimesh_data)
glimpse(trimesh_data)
#> Rows: 655
#> Columns: 10
#> $ from <int> 69, 68, 70, 69, 90, 90, 110, 131, 131, 111, 91, 91, 111…
#> $ to <int> 90, 69, 91, 70, 111, 91, 132, 152, 132, 112, 92, 112, 1…
#> $ x_from <dbl> 0.2228307, 0.1641342, 0.2815272, 0.2228307, 0.1934824, …
#> $ y_from <dbl> 0.03681781, 0.03681781, 0.03681781, 0.03681781, 0.08765…
#> $ x_to <dbl> 0.1934824, 0.2228307, 0.2521789, 0.2815272, 0.2228307, …
#> $ y_to <dbl> 0.08765046, 0.03681781, 0.08765046, 0.03681781, 0.13848…
#> $ from_count <dbl> 5, 1, 6, 5, 7, 7, 4, 1, 1, 6, 5, 5, 6, 6, 6, 7, 3, 3, 5…
#> $ to_count <dbl> 7, 5, 5, 6, 6, 5, 7, 3, 7, 6, 2, 6, 3, 1, 9, 3, 7, 3, 1…
#> $ from_reindexed <int> 3, 2, 4, 3, 10, 10, 22, 35, 35, 23, 11, 11, 23, 24, 4, …
#> $ to_reindexed <int> 10, 3, 11, 4, 23, 11, 36, 49, 36, 24, 12, 24, 37, 25, 5…
We begin by extracting the original data with their assigned hexagonal bin IDs.
nldr_df_with_hex_id <- hb_obj$data_hb_id
glimpse(nldr_df_with_hex_id)
#> Rows: 1,000
#> Columns: 4
#> $ emb1 <dbl> 0.27708147, 0.69717161, 0.77934921, 0.17323121, 0.21793445, 0.593…
#> $ emb2 <dbl> 0.91343544, 0.53767948, 0.39861033, 0.95285002, 0.98320848, 1.048…
#> $ ID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…
#> $ h <int> 427, 287, 226, 446, 468, 495, 132, 292, 415, 229, 289, 91, 155, 4…
We calculate the average high-dimensional coordinates for each bin and retain only the ones matching the 2-D model bins.
model_highd <- avg_highd_data(highd_data = scurve, scaled_nldr_hexid = nldr_df_with_hex_id)
model_highd <- model_highd |>
dplyr::filter(h %in% model_2d$h)
glimpse(model_highd)
#> Rows: 251
#> Columns: 8
#> $ h <int> 58, 68, 69, 70, 71, 72, 73, 78, 79, 90, 91, 92, 93, 94, 95, 96, 98,…
#> $ x1 <dbl> -0.37105188, 0.95812568, 0.85450337, 0.73126005, 0.47416208, 0.2648…
#> $ x2 <dbl> 1.90644261, 0.08539250, 0.09171102, 0.12907776, 0.10794141, 0.12362…
#> $ x3 <dbl> 1.922623, 1.286348, 1.507735, 1.676232, 1.877526, 1.962058, 1.99642…
#> $ x4 <dbl> -0.0082718681, 0.0026502232, 0.0051225053, -0.0043335180, -0.002598…
#> $ x5 <dbl> 0.0018876029, 0.0170739819, 0.0003245598, 0.0021079776, 0.000127695…
#> $ x6 <dbl> 0.016977853, 0.087550004, -0.013016748, -0.035630709, 0.007849772, …
#> $ x7 <dbl> 0.0028050043, -0.0024864689, -0.0039496434, -0.0024049571, 0.001697…
We now combine all components—high-dimensional data, the 2-D model, lifted high-dimensional centroids, and the triangulation—and render the model using an interactive tour.