Title: | Functions for Pre-Processing Data for Multivariate Data Visualisation using Tours |
---|---|
Description: | This is a companion to the book Cook, D. and Laa, U. (2023) <https://dicook.github.io/mulgar_book/> "Interactively exploring high-dimensional data and models in R". by Cook and Laa. It contains useful functions for processing data in preparation for visualising with a tour. There are also several sample data sets. |
Authors: | Dianne Cook [aut, cre] , Ursula Laa [aut] |
Maintainer: | Dianne Cook <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.2 |
Built: | 2024-11-17 03:54:46 UTC |
Source: | https://github.com/dicook/mulgar |
This is data from the 2021 Women's Australian Football League. These are average player statistics across the season, with game statistics provided by the fitzRoy package. If you are new to the game of AFL, there is a nice explanation on Wikipedia. The primary analysis is to summarise the variation using principal component analysis, which gives information about relationships between the statistics or skills sets common in players. One also might be tempted to cluster the players, but there are no obvious clusters so it could be frustrating. At best one could partition the players into groups, while recognising there are no absolutely distinct and separated groups.
A dataset with 381 rows and 35 columns
player identification details
player statistics for the match
require(dplyr) data(aflw) glimpse(aflw)
require(dplyr) data(aflw) glimpse(aflw)
This data is simulated to use for testing. It has three dimensions of variability and two of noise. It is created from a 3 factor model. All variables are linearly associated.
A dataset with 200 rows and 5 columns
five numeric variables
plane
box_pca <- prcomp(box) ggscree(box_pca)
box_pca <- prcomp(box) ggscree(box_pca)
This data was collated by Weihao (Patrick) Li as part of his Honours research at Monash University. It contains fire ignitions as detected from satellite hotspots, and processed using the spotoroo package, augmented with measurements on weather, vegetation, proximity to human activity. The cause variable is predicted based on historical fire ignition data collected by County Fire Authority personnel.
A dataset with 1021 rows and 60 columns
unique id, and spatiotemporal information for each fire ignition
vegetation variables
average rainfall, on that day, and over last 7, ..., 720 days
solar exposure, on that day, and over last 7, ..., 720 days
max temperature, on that day, and over last 7, ..., 720 days
min temperature, on that day, and over last 7, ..., 720 days
average wind speed, on that day, and for last 1-24 months
distance to nearest road
distance to nearest county fire authority facility
distance to nearest camp site
predicted ignition cause, accident, arson, burning_off, lightning
require(dplyr) data(bushfires) glimpse(bushfires)
require(dplyr) data(bushfires) glimpse(bushfires)
Simulated data with different structure
A datasets with differing number of rows and columns
numeric variables
require(ggplot2) ggplot(c1, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
require(ggplot2) ggplot(c1, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
For a data matrix, compute the sample variance-covariance, which is used to compute the Mahalanobis distance.
calc_mv_dist(x)
calc_mv_dist(x)
x |
multivariate data set |
This is useful for checking distance arise from a multivariate normal sample.
vector of length n
require(ggplot2) require(tibble) data(aflw) aflw_std <- apply(aflw[,7:35], 2, function(x) (x-mean(x, na.rm=TRUE))/ sd(x, na.rm=TRUE)) d <- calc_mv_dist(aflw_std[,c("goals","behinds", "kicks","disposals")]) d <- as_tibble(d, .name_repair="minimal") ggplot(d, aes(x=value)) + geom_histogram()
require(ggplot2) require(tibble) data(aflw) aflw_std <- apply(aflw[,7:35], 2, function(x) (x-mean(x, na.rm=TRUE))/ sd(x, na.rm=TRUE)) d <- calc_mv_dist(aflw_std[,c("goals","behinds", "kicks","disposals")]) d <- as_tibble(d, .name_repair="minimal") ggplot(d, aes(x=value)) + geom_histogram()
Returns the square root of the sum of squares of a vector
calc_norm(x)
calc_norm(x)
x |
numeric vector |
numeric value
x <- rnorm(5) calc_norm(x)
x <- rnorm(5) calc_norm(x)
This data is simulated to use for testing. It has three elliptical clusters in mostly variables 2 and 4. They are not equidistant.
A dataset with 300 rows and 6 columns
five numeric variables
class variable
simple_clusters
clusters_pca <- prcomp(clusters[,1:5]) ggscree(clusters_pca)
clusters_pca <- prcomp(clusters[,1:5]) ggscree(clusters_pca)
This data is simulated to use for testing. It has two small spherical clusters, and a curve cluster and a sine wave cluster.
A dataset with 300 rows and 6 columns
five numeric variables
clusters
require(ggplot2) ggplot(clusters_nonlin, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
require(ggplot2) ggplot(clusters_nonlin, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
Take an array of a projection sequence, and turn into a tibble with numbered projections
convert_proj_tibble(t1)
convert_proj_tibble(t1)
t1 |
tour projection sequence |
tbl1 tibble
require(tourr) t1 <- interpolate(save_history(flea[, 1:6], grand_tour(4), max = 2)) tbl1 <- convert_proj_tibble(t1)
require(tourr) t1 <- interpolate(save_history(flea[, 1:6], grand_tour(4), max = 2)) tbl1 <- convert_proj_tibble(t1)
This function generates points by transforming points on the surface of a sphere.
gen_vc_ellipse(vc, xm = rep(0, ncol(vc)), n = 500)
gen_vc_ellipse(vc, xm = rep(0, ncol(vc)), n = 500)
vc |
symmetric square matrix describing the variance-covariance matrix which defines the shape of the ellipse. |
xm |
center of the ellipse, a vector of length equal to the dimension of vc |
n |
number of points to generate |
matrix of size n x p
require(ggplot2) require(tibble) ell2d <- gen_vc_ellipse(vc = matrix(c(4, 2, 2, 6), ncol=2, byrow=TRUE), xm = c(1,1)) ell2d <- as_tibble(ell2d) ggplot(ell2d, aes(x = V1, y = V2)) + geom_point() + theme(aspect.ratio=1)
require(ggplot2) require(tibble) ell2d <- gen_vc_ellipse(vc = matrix(c(4, 2, 2, 6), ncol=2, byrow=TRUE), xm = c(1,1)) ell2d <- as_tibble(ell2d) ggplot(ell2d, aes(x = V1, y = V2)) + geom_point() + theme(aspect.ratio=1)
This function generates points on the surface of an ellipse with the same center and variance-covariance of the provided data.
gen_xvar_ellipse(x, n = 100, nstd = 1)
gen_xvar_ellipse(x, n = 100, nstd = 1)
x |
multivariate data set. |
n |
number of points to generate |
nstd |
scale factor for size of ellipse, in terms of number of standard deviations |
This is useful for checking the equal variance-covariance assumption from linear discriminant analysis.
matrix of size n x p
data(aflw) aflw_vc <- gen_xvar_ellipse(aflw[,c("goals","behinds", "kicks","disposals")], n=500) require(ggplot2) ggplot(aflw_vc, aes(x=goals, y=behinds)) + geom_point() + theme(aspect.ratio=1) if (interactive()) { require(tourr) animate_slice(aflw_vc, rescale=TRUE, v_rel=0.02) aflw_all <- rbind(aflw_vc, aflw[,c("goals","behinds", "kicks","disposals")]) clrs <- c(rep("orange", 500), rep("black", nrow(aflw))) animate_xy(aflw_all, col=clrs) }
data(aflw) aflw_vc <- gen_xvar_ellipse(aflw[,c("goals","behinds", "kicks","disposals")], n=500) require(ggplot2) ggplot(aflw_vc, aes(x=goals, y=behinds)) + geom_point() + theme(aspect.ratio=1) if (interactive()) { require(tourr) animate_slice(aflw_vc, rescale=TRUE, v_rel=0.02) aflw_all <- rbind(aflw_vc, aflw[,c("goals","behinds", "kicks","disposals")]) clrs <- c(rep("orange", 500), rep("black", nrow(aflw))) animate_xy(aflw_all, col=clrs) }
Takes data returned by mclustBIC()
, converts to a tibble
for plotting.
ggmcbic(mc, cl = 1:nrow(mc), top = ncol(mc))
ggmcbic(mc, cl = 1:nrow(mc), top = ncol(mc))
mc |
mclustBIC object |
cl |
subset of clusters to show |
top |
number to indicate how many models to show, default "all" |
mc_bic a ggplot object
require(mclust) data(clusters) clusters_BIC <- mclustBIC(clusters[,1:5], G=2:6) ggmcbic(clusters_BIC) ggmcbic(clusters_BIC, top=4) data(simple_clusters) clusters_BIC <- mclustBIC(simple_clusters[,1:2]) ggmcbic(clusters_BIC, cl=2:5, top=3)
require(mclust) data(clusters) clusters_BIC <- mclustBIC(clusters[,1:5], G=2:6) ggmcbic(clusters_BIC) ggmcbic(clusters_BIC, top=4) data(simple_clusters) clusters_BIC <- mclustBIC(simple_clusters[,1:2]) ggmcbic(clusters_BIC, cl=2:5, top=3)
Takes a PCA object returned by prcomp()
, extracts
the standard deviations of the principal components (PC), and
plots these against the PC number. The guidance line assumes that
all of the variables have been standardised prior to PCA.
ggscree(pc, q = 2, guide = TRUE, cumulative = FALSE)
ggscree(pc, q = 2, guide = TRUE, cumulative = FALSE)
pc |
PCA object |
q |
number of principal components to show, default 2 (you should change) |
guide |
logical whether to compute and add a typical value of the variance, if the data was full-dimensional |
cumulative |
logical whether to draw cumulative variance |
scree a ggplot object
data(aflw) aflw_std <- apply(aflw[,7:35], 2, function(x) (x-mean(x, na.rm=TRUE))/ sd(x, na.rm=TRUE)) aflw_pca <- prcomp(aflw_std[,c("goals","behinds", "kicks","disposals")]) ggscree(aflw_pca, q=3)
data(aflw) aflw_std <- apply(aflw[,7:35], 2, function(x) (x-mean(x, na.rm=TRUE))/ sd(x, na.rm=TRUE)) aflw_pca <- prcomp(aflw_std[,c("goals","behinds", "kicks","disposals")]) ggscree(aflw_pca, q=3)
Following the slice definition available in tourr
this function returns a ggplot2
display of a slice
defined via the projection onto two of the variables.
Note that because the underlying function works with any
projection, the axis labels need to be set by the user.
ggslice(data, h, v1 = 1, v2 = 2, center = NULL, col = NULL)
ggslice(data, h, v1 = 1, v2 = 2, center = NULL, col = NULL)
data |
data frame containing only variables used for the display |
h |
slice thickness |
v1 |
column number of variable mapped to x-axis |
v2 |
column number of variable mapped to y-axis |
center |
center point vector used for anchoring the slice, if NULL the mean of the data is used |
col |
grouping vector mapped to color in the display |
ggplot2 object showing the sliced data
ggslice_projection
d <- geozoo::sphere.hollow(4, 1000)$points ggslice(d, 0.3, 1, 2) ggslice(d, 0.3, 1, 2, center = c(0, 0, 0.7, 0))
d <- geozoo::sphere.hollow(4, 1000)$points ggslice(d, 0.3, 1, 2) ggslice(d, 0.3, 1, 2, center = c(0, 0, 0.7, 0))
Generate slice display
ggslice_projection(data, h, proj, center = NULL, col = NULL)
ggslice_projection(data, h, proj, center = NULL, col = NULL)
data |
data frame containing only variables used for the display |
h |
slice thickness |
proj |
projection matrix from p to 2 dimensions |
center |
center point vector used for anchoring the slice, if NULL the mean of the data is used |
col |
grouping vector mapped to color in the display |
ggplot2 object showing the sliced data
ggslice
d <- geozoo::sphere.hollow(4, 1000)$points ggslice_projection(d, 0.3, tourr::basis_random(4)) ggslice_projection(d, 0.3, tourr::basis_random(4), center = c(0.4, 0.4, 0.4, 0.4))
d <- geozoo::sphere.hollow(4, 1000)$points ggslice_projection(d, 0.3, tourr::basis_random(4)) ggslice_projection(d, 0.3, tourr::basis_random(4), center = c(0.4, 0.4, 0.4, 0.4))
Supplements a data set with information needed to draw a dendrogram. Intermediate cluster nodes are added as needed, and positioned at the centroid of the combined clusters. Note that categorical variables need to be factors.
hierfly(data, h = NULL, metric = "euclidean", method = "ward.D2", scale = TRUE)
hierfly(data, h = NULL, metric = "euclidean", method = "ward.D2", scale = TRUE)
data |
data set |
h |
an hclust object |
metric |
distance metric to use, see |
method |
cluster distance measure to use, see |
scale |
logical value whether to scale data or not, default TRUE |
list with data and edges and segments
data(clusters) cl_dist <- dist(clusters[,1:5]) cl_hw <- hclust(cl_dist, method="ward.D2") require(ggdendro) ggdendrogram(cl_hw, type = "triangle", labels = FALSE) clusters$clw <- factor(cutree(cl_hw, 3)) cl_hfly <- hierfly(clusters, cl_hw, scale=FALSE) if (interactive()) { require(tourr) glyphs <- c(16, 46) pch <- glyphs[cl_hfly$data$node+1] require(colorspace) clrs <- heat_hcl(length(unique(cl_hfly$data$clw))) pcol <- clrs[cl_hfly$data$clw] ecol <- clrs[cl_hfly$data$clw[cl_hfly$edges[,1]]] animate_xy(cl_hfly$data[,1:5], edges=cl_hfly$edges, col=pcol, pch=pch, edges.col=ecol, axes="bottomleft") }
data(clusters) cl_dist <- dist(clusters[,1:5]) cl_hw <- hclust(cl_dist, method="ward.D2") require(ggdendro) ggdendrogram(cl_hw, type = "triangle", labels = FALSE) clusters$clw <- factor(cutree(cl_hw, 3)) cl_hfly <- hierfly(clusters, cl_hw, scale=FALSE) if (interactive()) { require(tourr) glyphs <- c(16, 46) pch <- glyphs[cl_hfly$data$node+1] require(colorspace) clrs <- heat_hcl(length(unique(cl_hfly$data$clw))) pcol <- clrs[cl_hfly$data$clw] ecol <- clrs[cl_hfly$data$clw[cl_hfly$edges[,1]]] animate_xy(cl_hfly$data[,1:5], edges=cl_hfly$edges, col=pcol, pch=pch, edges.col=ecol, axes="bottomleft") }
Takes data returned by Mclust()
, extracts
parameter estimates, and computes points on
ellipses.
mc_ellipse(mc, npts = 100)
mc_ellipse(mc, npts = 100)
mc |
Mclust object |
npts |
Number of points to simulate for each cluster, default 100 |
mc_ellipses data frame
require(mclust) data(simple_clusters) clusters_mc <- Mclust(simple_clusters[,1:2], G=2, modelname="EEI") mce <- mc_ellipse(clusters_mc, npts=400) require(ggplot2) sc <- simple_clusters sc$cl <- factor(clusters_mc$classification) ggplot() + geom_point(data=sc, aes(x=x1, y=x2, colour=cl)) + geom_point(data=mce$ell, aes(x=x1, y=x2, colour=cl), shape=4) + geom_point(data=mce$mn, aes(x=x1, y=x2, colour=cl), shape=3) + theme(aspect.ratio=1, legend.position="none")
require(mclust) data(simple_clusters) clusters_mc <- Mclust(simple_clusters[,1:2], G=2, modelname="EEI") mce <- mc_ellipse(clusters_mc, npts=400) require(ggplot2) sc <- simple_clusters sc$cl <- factor(clusters_mc$classification) ggplot() + geom_point(data=sc, aes(x=x1, y=x2, colour=cl)) + geom_point(data=mce$ell, aes(x=x1, y=x2, colour=cl), shape=4) + geom_point(data=mce$mn, aes(x=x1, y=x2, colour=cl), shape=3) + theme(aspect.ratio=1, legend.position="none")
This data is originally from http://ifs.tuwien.ac.at/dm/download/multiChallenge-matrix.txt, and provided as a challenge for non-linear dimension reduction.It was used as an example in Lee, Laa, Cook (2023) https://doi.org/10.52933/jdssv.v2i3.
A dataset with 400 rows and 11 columns
cluster label
numeric variables
clusters
require(ggplot2) ggplot(multicluster, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
require(ggplot2) ggplot(multicluster, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
Returns the normalised vector, where the sum of squares is equal to 1
norm_vec(x)
norm_vec(x)
x |
numeric vector |
numeric vector
x <- rnorm(5) norm_vec(x)
x <- rnorm(5) norm_vec(x)
This function takes the PCA and produces a wire frame of the PCA to examine with the data in a tour. The purpose is to see how well the variance is explained. The model will be centered at the mean, and extend 3 SDs towards the edge of the data, which is assuming that the data is standardised.
pca_model(pc, d = 2, s = 1)
pca_model(pc, d = 2, s = 1)
pc |
PCA object |
d |
number of dimensions to use, default=2 |
s |
scale model, default=1 |
a list of points and edges
data(plane) plane_pca <- prcomp(plane) plane_m <- pca_model(plane_pca) plane_m_d <- rbind(plane_m$points, plane) if (interactive()) { require(tourr) animate_xy(plane_m_d, edges=plane_m$edges, axes="bottomleft") }
data(plane) plane_pca <- prcomp(plane) plane_m <- pca_model(plane_pca) plane_m_d <- rbind(plane_m$points, plane) if (interactive()) { require(tourr) animate_xy(plane_m_d, edges=plane_m$edges, axes="bottomleft") }
This is data from the 2018 testing, available from https://webfs.oecd.org/pisa2018/SPSS_STU_QQQ.zip. A subset of the data containing only Australia and Indonesia, and the simulated scores for math, reading and science.
A data set with 26371 rows and 31 columns
Country (Australia, Indonesia)
simulated scores for math, reading and science
require(dplyr) data(pisa) pisa %>% count(CNT)
require(dplyr) data(pisa) pisa %>% count(CNT)
This data is simulated to use for testing. It has two dimensions of variability and three of noise. It is created from a 2 factor model, where all variables are related.
A data set with 100 rows and 5 columns
five numeric variables
box
plane_pca <- prcomp(plane) ggscree(plane_pca)
plane_pca <- prcomp(plane) ggscree(plane_pca)
This data is simulated to use for testing. It has three dimensions of variability and two of noise. It is created from a 2 factor non-linear model. All variables are associated.
A dataset with 100 rows and 5 columns
five numeric variables
plane, box
plane_nonlin_pca <- prcomp(plane_nonlin) ggscree(plane_nonlin_pca)
plane_nonlin_pca <- prcomp(plane_nonlin) ggscree(plane_nonlin_pca)
This function computes the group variance-covariance matrices, and produces a weighted average. It is useful for examining the linear discriminant analysis model.
pooled_vc(x, cl, prior = rep(1/length(unique(cl)), length(unique(cl))))
pooled_vc(x, cl, prior = rep(1/length(unique(cl)), length(unique(cl))))
x |
multivariate data set, matrix. |
cl |
class variable |
prior |
prior probability for each class, must sum to 1, default all equal |
matrix
data(clusters) pooled_vc(clusters[,1:5], clusters$cl)
data(clusters) pooled_vc(clusters[,1:5], clusters$cl)
This function generates a sample of size n from a multivariate normal distribution
rmvn(n = 100, p = 5, mn = rep(0, p), vc = diag(rep(1, p)))
rmvn(n = 100, p = 5, mn = rep(0, p), vc = diag(rep(1, p)))
n |
number of points to generate |
p |
dimension |
mn |
mean of the distribution, a vector of length equal to the dimension of vc |
vc |
symmetric square matrix describing the variance-covariance matrix which defines the shape of the ellipse. |
matrix of size n x p
require(ggplot2) d <- mulgar::rmvn(n=100, p=2, mn = c(1,1), vc = matrix(c(4, 2, 2, 6), ncol=2, byrow=TRUE)) ggplot(data.frame(d), aes(x = x1, y = x2)) + geom_point() + theme(aspect.ratio=1)
require(ggplot2) d <- mulgar::rmvn(n=100, p=2, mn = c(1,1), vc = matrix(c(4, 2, 2, 6), ncol=2, byrow=TRUE)) ggplot(data.frame(d), aes(x = x1, y = x2)) + geom_point() + theme(aspect.ratio=1)
This data is simulated to use for testing. It has two spherical clusters, and two variables.
A dataset with 137 rows and 3 columns
two numeric variables
class variable
clusters
require(ggplot2) ggplot(simple_clusters, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
require(ggplot2) ggplot(simple_clusters, aes(x=x1, y=x2)) + geom_point() + theme(aspect.ratio=1)
This data is a subset of images from https://quickdraw.withgoogle.com The subset was created using the quickdraw R package at https://huizezhang-sherry.github.io/quickdraw/. It has 6 different groups: banana, boomerang, cactus, flip flops, kangaroo. Each image is 28x28 pixels.
A data frame with 1200 rows and 786 columns
grey scale 0-255
all NA, you need to predict this
unique id for each sketch
sketches_train
require(ggplot2) data("sketches_test") x <- sketches_test[sample(1:nrow(sketches_test), 1), ] xm <- data.frame(gry=t(as.matrix(x[,1:784])), x=rep(1:28, 28), y=rep(28:1, rep(28, 28))) ggplot(xm, aes(x=x, y=y, fill=gry)) + geom_tile() + scale_fill_gradientn(colors = gray.colors(256, start = 0, end = 1, rev = TRUE )) + theme_void() + theme(legend.position="none")
require(ggplot2) data("sketches_test") x <- sketches_test[sample(1:nrow(sketches_test), 1), ] xm <- data.frame(gry=t(as.matrix(x[,1:784])), x=rep(1:28, 28), y=rep(28:1, rep(28, 28))) ggplot(xm, aes(x=x, y=y, fill=gry)) + geom_tile() + scale_fill_gradientn(colors = gray.colors(256, start = 0, end = 1, rev = TRUE )) + theme_void() + theme(legend.position="none")
This data is a subset of images from https://quickdraw.withgoogle.com The subset was created using the quickdraw R package at https://huizezhang-sherry.github.io/quickdraw/. It has 6 different groups: banana, boomerang, cactus, flip flops, kangaroo. Each image is 28x28 pixels. This data would be used to train a classification model.
A data frame with 5998 rows and 786 columns
grey scale 0-255
what the person was asked to draw
unique id for each sketch
require(ggplot2) data("sketches_train") x <- sketches_train[sample(1:nrow(sketches_train), 1), ] # print(x$word) xm <- data.frame(gry=t(as.matrix(x[,1:784])), x=rep(1:28, 28), y=rep(28:1, rep(28, 28))) ggplot(xm, aes(x=x, y=y, fill=gry)) + geom_tile() + scale_fill_gradientn(colors = gray.colors(256, start = 0, end = 1, rev = TRUE )) + theme_void() + theme(legend.position="none")
require(ggplot2) data("sketches_train") x <- sketches_train[sample(1:nrow(sketches_train), 1), ] # print(x$word) xm <- data.frame(gry=t(as.matrix(x[,1:784])), x=rep(1:28, 28), y=rep(28:1, rep(28, 28))) ggplot(xm, aes(x=x, y=y, fill=gry)) + geom_tile() + scale_fill_gradientn(colors = gray.colors(256, start = 0, end = 1, rev = TRUE )) + theme_void() + theme(legend.position="none")
This function generates a grid of points to match the nodes from the self-organising map (SOM), and jitters points from the data so they can be seen relative to the grid. This allows the clustering of points by SOM to be inspected.
som_model(x_som, j_val = 0.5)
som_model(x_som, j_val = 0.5)
x_som |
object returned by kohonen::som |
j_val |
amount of jitter, should range from 0-1, default 0.3 |
data this object contains
original variables from the data
map1, map2 location of observations in 2D som map, jittered
distance distances between observations and the closest node
id row id of data
net this object contains
values of the nodes in the high-d space
map1, map2 nodes of the som net
distance distances between observations and the closest node
id row id of net
edges from, to specifying row ids of net to connect with lines
edges_s x, xend, y, yend for segments to draw lines to form 2D map
require(kohonen) data(clusters) c_grid <- kohonen::somgrid(xdim = 5, ydim = 5, topo = 'rectangular') c_som <- kohonen::som(as.matrix(clusters[,1:5]), grid = c_grid) c_data_net <- som_model(c_som) require(ggplot2) ggplot() + geom_segment(data=c_data_net$edges_s, aes(x=x, xend=xend, y=y, yend=yend)) + geom_point(data=c_data_net$data, aes(x=map1, y=map2), colour="orange", size=2, alpha=0.5)
require(kohonen) data(clusters) c_grid <- kohonen::somgrid(xdim = 5, ydim = 5, topo = 'rectangular') c_som <- kohonen::som(as.matrix(clusters[,1:5]), grid = c_grid) c_data_net <- som_model(c_som) require(ggplot2) ggplot() + geom_segment(data=c_data_net$edges_s, aes(x=x, xend=xend, y=y, yend=yend)) + geom_point(data=c_data_net$data, aes(x=map1, y=map2), colour="orange", size=2, alpha=0.5)