EmbedSOM vs. mESC cell differentiation in time

Miroslav Kratochvíl, Abhishek Koladiya

2020-01-18

Authors of FLOWMAP have used a nice dataset of differentiating stem cells that is nice for showing how to embed pseudotime.

The article that describes the dataset and experiment has been published in Nature Protocols: Ko, M.E., Williams, C.M., Fread, K.I. et al. FLOW-MAP: a graph-based, force-directed layout algorithm for trajectory mapping in single-cell time course datasets. Nat Protoc (2020) doi:10.1038/s41596-019-0246-3.

The data (FCS files) can be downloaded from CytoBank: https://community.cytobank.org/cytobank/experiments/71953 .

Here, we mainly aim to reproduce the FLOWMAP visualization as seen in Figure 6a in the article, without downsampling, and in a fraction of original reported time (the authors report tens of minutes).

Preparing data

Let’s first aggregate the data set and extract some useful information for later plotting (mainly sample type and the timestamps). Assuming the data reside in current directory, we can load them using FlowSOM:

set.seed(1)
files <- dir(pattern='.*\\.fcs')

ff <- FlowSOM::AggregateFlowFrames(files, cTotal=1000000)

(We have suppressed several warnings that report breakage in CytoBank-originating FCS files.)

Extract times and sample categories:

filesTime <- as.numeric(sapply(files, function(x) {l <- nchar(x); substr(x, start=7, stop=l-4)}))
filesSample <- unname(factor(sapply(files, function(x) substr(x,1,2))))
cellFile <- ff@exprs[,'File']

Pick out interesting parameters and transform the data:

d <- asinh(0.2 * ff@exprs[,c(11:13,15,17:20,22:26,28:30,32:42)])
print(dim(d)) #see how large the data is
## [1] 731924     27

Make a scrambling vector (used later to avoid the overplotting artifacts):

scramble <- sample(nrow(d),nrow(d))

Finally, the pseudotime. The original range is roughly from 0 to 11 (in days), we add a bit of normal “smudge” to avoid accumulation of data at discrete points.

dtime <- cbind(d, filesTime[cellFile]+rnorm(nrow(d)))
colnames(dtime)[ncol(dtime)] <- "Time"
dtime[,'Time'] <- 3*scale(dtime[,'Time'])

Embedding with normal EmbedSOM

Let’s see how the data look through a self-organizing map (SOM). This may be interesting if you wanted to cluster the data later. Propertiess of SOM-based embedding make it extremely easy to see all populations, but not very simple for a human to observe any trajectories in the data (especially if the data has not been extensively cleaned).

We can construct the SOM and run the embedding right on it:

set.seed(1)
time <- system.time(
  e <- EmbedSOM::EmbedSOM(
         data=dtime,
         map=EmbedSOM::SOM(
           dtime,
           xdim=24, ydim=24, rlen=20,
           batch=T, parallel=T), 
         parallel=T)
)
print(time[3])
## elapsed 
##  25.058

We can reproduce the rainbow-like coloring used in FLOWMAPR with help of colorspace package:

timeColor <- function(n, alpha=1)
  colorspace::rainbow_hcl(n=n, alpha=alpha, start=10, end=330, c=250, l=60)

The time-development plot then looks as such (notice the scrambling):

plotPseudotime <- function(e) {
  par(mar=rep(0,4))
  EmbedSOM::PlotEmbed(
    e[scramble,],
    data=dtime[scramble,],
    alpha=.1, expression.colors=timeColor,
    'Time')
  legend('topleft',
    legend=rep('',13), col=timeColor(13),
    pch=15, pt.cex=2, cex=.6,
    title="Time progress", horiz=T)
}

plotPseudotime(e)

The sample-origin plot can be done in a similar way:

plotOrigin <- function(e) {
  par(mar=rep(0,4))
  EmbedSOM::PlotEmbed(
    e[scramble,],
    alpha=.1, cluster.colors=colorspace::qualitative_hcl,
    clust=filesSample[cellFile[scramble]])
  legend('topleft',
    c(AE="endoderm", `B4`="mesoderm", `N2`="ectoderm")[levels(filesSample)],
    col=colorspace::qualitative_hcl(nlevels(filesSample)),
    pch=15, pt.cex=2, cex=.6,
    title="Sample group", horiz=T)
}

plotOrigin(e)