EmbedSOM vs. archeology

Miroslav Kratochvíl

2020-01-18

Smithsonian Institute provides a whole load of interesting data, among them a 3D model of a woolly mammoth skeleton and of a T-rex skeleton eating triceratops skeleton!

In this vignette, we convert them to flat versions.

Getting the data

The data is available as STL models. You should be able to get a list of 3D point coordinates from the STL either directly, or using some of the available commandline tools (e.g. stl2gts). At the end, you should end up having a 3-column matrix of point coordinates. I saved them to mammoth.points and trex.points and loaded them accordingly. The models have hundreds of thousands of individual points!

mammoth <- read.table('mammoth.points', header=F)
trex <- read.table('trex.points', header=F)
print(dim(mammoth))
## [1] 999778      3
print(dim(trex))
## [1] 499470      3

The data is organized as expected:

par(mar=rep(0,4))
EmbedSOM::PlotEmbed(mammoth[,c(2,3)])

par(mar=rep(0,4))
EmbedSOM::PlotEmbed(-trex[,c(1,2)])

A flat woolly mammoth

We may embed the mammoth using the “standard” approach. I use an extra large grid to get extra detail, but smaller grids usually suffice (and may compute much faster).

set.seed(1)
print(system.time(
e <- EmbedSOM::EmbedSOM(mammoth, parallel=T,
       map=EmbedSOM::SOM(mammoth, xdim=32, ydim=32, parallel=T, batch=T))
)[3])
## elapsed 
##  20.873

Let us color the mammoth components by color, so that we see which leg belongs where:

par(mar=rep(0,4))
EmbedSOM::PlotEmbed(e, data=mammoth, red=1, green=2, blue=3, alpha=.1)

We can observe several things:

Let’s try to fix some of the problems:

set.seed(1)
mammoth[,3]<-mammoth[,3]*0.5 #pretend the mammoth is not that tall
print(system.time(
e <- EmbedSOM::EmbedSOM(mammoth, parallel=T,
       map=EmbedSOM::SOM(mammoth, xdim=24, ydim=24, rlen=20, parallel=T, batch=T))
)[3])
## elapsed 
##  19.078
par(mar=rep(0,4))
EmbedSOM::PlotEmbed(e, data=mammoth, red=1, green=2, blue=3, alpha=.1)

Almost good.

A flat T-rex eating a flat triceratops

The breakfast scene is slightly overcrowded if embedded by plain SOMs:

set.seed(1)
print(system.time(
e <- EmbedSOM::EmbedSOM(trex, parallel=T,
       map=EmbedSOM::SOM(trex, xdim=32, ydim=32, rlen=20, parallel=T, batch=T))
)[3])
## elapsed 
##   17.29
par(mar=rep(0,4))
EmbedSOM::PlotEmbed(e, data=trex, blue=3, alpha=.2)

SOM-less embedding (with random landmarks) alleviates this problem (we use UMAP from the uwot package to organize 1000 randomly chosen landmarks, and project the rest of the dataset there):

set.seed(1)
print(system.time(
e <- EmbedSOM::EmbedSOM(trex, parallel=T,
       map=EmbedSOM::RandomMap(trex, 1000, coordsFn=EmbedSOM::uwotCoords(min_dist=2)))
)[3])
## elapsed 
##   5.373
par(mar=rep(0,4))
EmbedSOM::PlotEmbed(e, data=trex, blue=3, alpha=.2)