Methodology of spatial pattern comparisons and subsequent heirarchical clustering


  • This method allows data from images of in situ gene expression patterns to be computationally compared for spatial similarities and then clustered into groups that show similar expression patterns.
  • Domains depicting regions of gene expression from input photographs are annotated to the correct spatial positions within the EMAP 3D Digital Atlas by EMAGE staff and housed in the EMAGE database. The following example images are all wholemount data at Theiler Stage 17:
Raw data images
Corresponding spatial annotations

  • The different coloured domains in the spatial annotation images can be selected for use in the analysis. These denote regions of strongest, moderate and weakest signal intensity and are shown in red, yellow and blue respectively in the spatial annotations above.


  • Similarity measurements are then calculated between every image pair in the selected dataset. Currently we use the Jaccard Index (V) which is defined as the ratio of the shared features of two entities (d1 and d2: in this case the spatial annotation domains of two assays) to the total number of all features in the two entities. It can be expressed as such:

For example, Jaccard Index similarity values of the patterns denoting combined strongest, moderate and weakest signal intensities for the Pax6, Sox10 and Dlx5 examples above would be:

  • The output files from the Jaccard Index pair-wise comparisons are tab delimited .txt files where both columns and rows correspond to the IDs for each assay. In the example below, when a pattern is compared to itself (e.g. EMAGE:1024 vs EMAGE:1024) the Jaccard value is 1 because the two input spatial regions are identical. Where it is compared to another pattern, the Jaccard Index will be less than one. If the Jaccard Index is 0, the two patterns do not intersect. The closer a Jaccard Index value is to 1, the more similar the two patterns are.

  • The next step of the process is to perform hierarchical clustering of the numerical Jaccard Index values. This will place two assays near each other in the output files if they have similar Jaccard Index values across the board with all other assays in the set. Clustering is done using a version of a program (Cluster) that was originally developed by Michael Eisen for analysing microarray expression data. There are three output files for each comparison: (file extension .cdt, .gtr and.atr).

    In our current set-up, we have pre-calculated the clustering using the following parameters: un-centred correlation similarity metric followed by complete linkage clustering.

    If you want to perform the clustering using other parameters, you can
    install the open source version of the cluster program from here and then download and read in the appropriate .txt Jaccard Index comparison file (which can be downloaded from the relevant emage cluster analysis page).

  • Visualisation of the clustered results can be done in three ways. These are presented as viewing options 1,2 and 3.

    Viewing Option 1

    A java applet can be started straight from the web browser. The applet is based on Java TreeView (see below) and has been enhanced to allow viewing of image data on the cluster tree. You can click on the tree and view heat map representations of sites in the embryo that express the genes on each branch. The contributing raw data images for the selected branch may also be viewed. A slider tool is available to select multiple branches from the tree, and the tree can also be searched for examples of a gene of interest or EMAGE:IDs.

    You may have to increase your system's Java Memory to load large trees in this viewer.

    Viewing Option 2

    The images are arranged (left to right) in rows according to the order (top to bottom) that they appear in the tree described above. This places the images into blocks that display similarities of expression pattern (but provides no information as to the tree structure).
    Viewing Option 3

    You can load the approriate .cdt, .gtr and.atr files (these can be downloaded from the relevant emage cluster analysis page) into the original version of Java TreeView, This was developed by Michael Eisen for viewing the output of clustered microarray data and can be downloaded here.

    This image shows the dendrogram and the identities of the corresponding branches in JavaTreeView.

    The left panel display the dendrogram and corresponding colour matrix which relates to the output values from the Jaccard Index values (ranging between intense red for a value of 1, to black for a value of 0). A branch has been selected and is shown in red.

    The middle panel displays a close-up of the matrix for the selected branch of the dendrogram.

    The right panel lists the genes which correspond to each branch.

    TreeView can be formatted such that the gene names link to the EMAGE database to show data images.