# Introduction to PairViz

#### Catherine B. Hurley and R.W. Oldford

#### 2022-08-12

`PairVizIntroduction.Rmd`

PairViz is an R package which provides orderings of objects for visualisation purposes. The problem of constructing an ordering can be regarded as traversing possibly weighted graphs. In this vignette, we focus on graphs and their traversal. Applications to data visualisation are given in accompanying vignettes.

## PairViz installation

PairViz is installed from CRAN in the usual way.

`install.packages("PairViz")`

PairViz uses the graph data structure provided by the graph package which is located on the Bioconductor repository rather than CRAN. To install this, use

```
if (!requireNamespace("graph", quietly = TRUE)){
install.packages("BiocManager")
BiocManager::install("graph")
}
```

In this vignette, we will also show plots of the graph structure, which requires package igraph from CRAN. The following utility function will be helpful:

```
requireNamespace("igraph")
#> Loading required namespace: igraph
if (!requireNamespace("igraph", quietly = TRUE)){
install.packages("igraph")
}
igplot <- function(g,weights=FALSE,layout=igraph::layout_in_circle,
vertex.size=60, vertex.color="lightblue",...){
g <- igraph::graph_from_graphnel(as(g, "graphNEL"))
op <- par(mar=c(1,1,1,1))
if (weights){
ew <- round(igraph::get.edge.attribute(g,"weight"),3)
igraph::plot.igraph(g,layout=layout,edge.label=ew,vertex.size=vertex.size,vertex.color=vertex.color,...)
}
else
igraph::plot.igraph(g,layout=layout,vertex.size=vertex.size,vertex.color=vertex.color,...)
par(op)
}
```

Alternatively, graphs may be plotted using package Rgraphviz which is installed from Bioconductor:

```
if (!requireNamespace("Rgraphviz", quietly = TRUE)){
install.packages("BiocManager")
BiocManager::install("Rgraphviz")
}
# For a graph g use
plot(g)
```

## Graphs

A graph is a mathematical structure made up of nodes, and edges joining those nodes. We consider only graphs where edges are undirected. In some cases, edges are assigned weights, reflecting some measure of importance.

A complete graph is a graph where there is an edge connecting every pair of nodes. Here are two such complete graphs, k4 with four nodes and k5 with five nodes.

```
suppressPackageStartupMessages(library(PairViz))
k4 <- mk_complete_graph(4)
k5 <- mk_complete_graph(5)
igplot(k4)
igplot(k5)
```

## Traversing graph edges

We will first construct graph traversals that visit every edge of the graph at least once. The PairViz function `eulerian()`

gives such traversals:

```
eulerian(k5)
#> [1] "1" "2" "3" "1" "4" "2" "5" "3" "4" "5" "1"
```

Mathematically speaking, an Eulerian path of a graph is a path that visits every edge of a graph exactly once. An Eulerian tour of a graph is an Eulerian path that is closed, i.e. ends up at the starting node. For the graph `k5`

, one such Eulerian tour goes from 1 ->2 -> 3 -> 1 and so on until it ends back at node 1, as given by `eulerian(k5)`

. It is well-known that a graph has an Eulerian tour if every node has an even number of edges. This condition holds for a complete graph with an odd number of nodes, such as `k5`

.

Similarly, a graph has an Eulerian path if all but two nodes have an even number of edges. If we remove one edge, say that connecting nodes 1 and 2 from `k5`

, these two nodes have an odd number of edges while other nodes are even.

```
k5a <- removeEdge("1", "2", k5)
igplot(k5a)
eulerian(k5a)
#> [1] "1" "3" "2" "4" "1" "5" "3" "4" "5" "2"
```

The Eulerian for `k5a`

starts at one of the odd nodes (here “1”) and visits all edges ending at “2”, the other odd node.

Most graphs are not Eulerian, that is they do not meet the conditions for an Eulerian path to exist. The graph `k4`

for instance, has four nodes and all have three edges. In this case, any path visiting all edges must visit some edges more than once. This is what `eulerian(k4)`

does:

```
eulerian(k4)
#> [1] "1" "3" "2" "4" "3" "4" "1" "2"
```

If you look closely you will see the edge connecting nodes “3” and “4” is visited twice. An other way to think of it is that extra edges need to be added to a graph so that all but two nodes become even. For `k4`

only one extra edge is needed, and in the example above that extra edge connects nodes “3” and “4”.

In summary, for any graph, the function `eulerian()`

returns a path visiting all edges at least once. If the graph is Eulerian, the returned path visits each edge exactly once. Otherwise, some edges are visited twice. For more details of this algorithm, see Hurley and Oldford (2010, 2011).

## Edge traversal of a weighted graph

First construct a graph with weighted edges.

```
n <- LETTERS[1:5]
g <- new("graphNEL",nodes=n)
efrom <- n[c(1,1,2,2,2,4)]
eto <- n[c(2:3,3:5,5)]
ew <- c(8,9,5:7,1)
g <- addEdge(efrom, eto, g, ew)
```

To plot the graph use

`igplot(g, weights=TRUE,edge.label.color="black")`

```
eulerian(g)
#> [1] "E" "D" "B" "C" "A" "B" "E"
```

As the graph `g`

is weighted, the eulerian algorithm starts at the edge with the lowest weight, and then proceeds to visit every edge, preferring lower weight edges whenever there is a choice of edge to be visited next. If the `eulerian()`

function is called with the option `weighted=FALSE`

, then weights are ignored.

```
eulerian(g, weighted=FALSE)
#> [1] "A" "B" "D" "E" "B" "C" "A"
```

In this case, the Eulerian path visits edges going next to the first available node in `nodes(g)`

.

## Edge traversal of a complete graph

For complete graphs, PairViz includes some alternative constructions of Eulerian (or nearly Eulerian) paths.

The function

`eseq()`

uses a recursive algorithm, and forms a path on 1,..,n by appending extra edges on to the tour on 1,…,(n-2). For more details of the`eseq`

algorithm, see Hurley and Oldford (2011).The function

`eseqa()`

gives a different sequence. For odd n, the tour starts at 1, then takes steps of size 1,2,…,m repeatedly, where m is (n-1)/2. For even n, the path constructed is formed as`eseqa(n+1)`

, followed by dropping node n+1. (Note that any permulation of the step sizes 1,2,…,m would do just as well).

Two other functions are

`kntour_drop()`

takes an Euler tour on the complete graph with nodes 1,2,..n as input for odd n, and removes all occurences of n from the sequence. The result is a nearly Eulerian path on the complete graph with n-1 nodes.`kntour_add()`

takes an Euler tour on the complete graph with nodes 1,2,..n as input for odd n, and appends a path joining node n+1 to each of nodes 1,2,…,n . The result is a nearly Eulerian path on the complete graph with n-1 nodes.

## Edge traversal of a complete graph via Hamiltonian decompositions

A Hamiltonian path of a graph is a path that visits every node of a graph exactly once. For a general graph, determining whether such a path exists is difficult, and in fact is an NP-hard problem. For a complete graph of n nodes, the path 1,2,…,n is a Hamiltonian path.

Consider a complete graph with an odd number of nodes such as `k5`

.

```
ew <- rep(1, length(edgeNames(k5)))
s1 <- c(1,5,9,10)
ew[s1]<- 5
ec <- rep("grey40", length(ew))
ec[s1]<- "cyan"
igplot(k5, edge.width=ew, edge.color=ec)
ew[]<- 1
s2 <- c(2,6:8)
ew[s2]<- 5
ec[] <- "grey40"
ec[s2]<- "magenta"
igplot(k5, edge.width=ew, edge.color=ec)
```

The blue hamiltonian path visits the nodes in order 1,2,3,5,4. The pink hamiltonian path visits the nodes in order 1,3,4,2,5. For a complete graph with an odd number of nodes such as `k5`

, it is possible to construct an Eulerian tour by concatenating together a number of Hamiltonians. This construction is known as a Hamiltonian decomposition.

```
ew[]<- 5
ec[s1]<- "cyan"
s3 <- 3:4
ec[s3]<- "rosybrown1"
igplot(k5, edge.width=ew, edge.color=ec)
igplot(k5, edge.width=ew, edge.color=ec)
```

In the figure above, the light red edge from 4 to 1 connects the two hamiltonians, and the remaining light red edge from 5 to 1 connects the last node of the pink path to the first node from the blue path to complete the tour.

In PairViz, the two hamiltonian paths above are constructed by

```
hpaths(5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 5 4
#> [2,] 1 3 4 2 5
```

If you want the eulerian composed of the two hamiltonians, use

```
hpaths(5, matrix=FALSE)
#> [1] 1 2 3 5 4 1 3 4 2 5 1
```

The above constructions produce paths on complete graphs with an even number of nodes which are nearly eulerian. Look at

```
hpaths(6)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 2 6 3 5 4
#> [2,] 2 3 1 4 6 5
#> [3,] 3 4 2 5 1 6
hpaths(6, matrix=FALSE)
#> [1] 1 2 6 3 5 4 2 3 1 4 6 5 3 4 2 5 1 6
```

Each row of the result of hpaths produces a hamiltonian on k6 (a complete graph with 6 nodes). When the hamiltonians are concatenated, all edges on k6 are visited. With a close inspection, we see the edges connecting the rows (here 4-2 and 5-3) are visted twice. There is no need to connect the first node of row 1 to the last node of row 3, as this edge has already been visited.

Other isomorphic decompositions are obtained by supplying the first Hamiltonian as an argument to hpaths. Then all node labels are rearranged accordingly, as in the code snippet below.

```
hpaths(1:5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 1 2 3 4 5
#> [2,] 1 3 5 2 4
```

## Edge traversal of a weighted complete graph via hamiltonian decompositions

Consider the complete graph `k7`

whose edge weights are given by

```
set.seed(123)
k7 <- mk_complete_graph(7)
ew <- sample(numEdges(k7),numEdges(k7)) # a vector of edgeweights
```

The easiest way to assign weights to edges is to use a distance matrix:

```
d7 <- matrix(0,7,7)
d7[lower.tri(d7)] <- ew
d7[upper.tri(d7)]<- t(d7)[upper.tri(d7)]
d7
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 0 15 19 14 3 10 2
#> [2,] 15 0 6 11 5 4 21
#> [3,] 19 6 0 9 13 18 8
#> [4,] 14 11 9 0 1 12 20
#> [5,] 3 5 13 1 0 17 7
#> [6,] 10 4 18 12 17 0 16
#> [7,] 2 21 8 20 7 16 0
# or using the shortcut function edge2dist from PairViz
#d7 <- as.matrix(edge2dist(ew))
```

Now `d7`

is a symmetric matrix of distances, where the values in `ew`

specify the distances in order 1-2, 1-3, …., 1-7, 2-7,…, 3-7, …,6-7.

```
k7 <- mk_complete_graph(d7)
igplot(k7, weights=TRUE,edge.label.color="black", vertex.label.cex=2,vertex.size=30)
```

```
# Unfortunately, plot.igraph does not show graph edge weights automatically, you have to
# input them as above. You might want to check that the igraph
# matches that of ew.
igraph::E(igraph::graph_from_graphnel(k7))
#> + 21/21 edges from 67e0cca (vertex names):
#> [1] 1--2 1--3 1--4 1--5 1--6 1--7 2--3 2--4 2--5 2--6 2--7 3--4 3--5 3--6 3--7
#> [16] 4--5 4--6 4--7 5--6 5--7 6--7
```

Like `hpaths()`

, the function `weighted_hpaths()`

finds Hamiltonians which when concatenated visit all edges at least once. However `weighted_hpaths()`

uses a greedy algorithm to make the first Hamiltonian low weight with weights increasing for successive Hamiltonians.

```
weighted_hpaths(d7)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 6 2 5 4 1 7 3
#> [2,] 6 5 1 2 3 4 7
#> [3,] 6 1 3 5 7 2 4
# this version returns the eulerian
weighted_hpaths(d7, matrix=FALSE)
#> [1] 4 5 7 1 6 2 3 4 1 5 2 7 3 6 4 2 1 3 5 6 7 4
```

We can easily calculate the edge weights and their sum for the first Hamiltonian.

```
o <- weighted_hpaths(d7, matrix=FALSE)
o1 <- o[1:8] # include the 8th to form the tour
d7e <- dist2edge(d7)
# d7e is a vector giving edge weights in order (1,1)... (1,7), (2,3),.. (2,7) etc
h1weights <- path_weights(d7e, o1) # the edge weights for o1
# the same as
d7[cbind(head(o1,-1), o1[-1])]
#> [1] 1 3 2 8 6 4 12
h1weights
#> [1] 1 3 2 8 6 4 12
sum(h1weights)
#> [1] 36
```

The second and third Hamiltonians have edges whose sums are

```
o2 <- o[8:15]
sum(path_weights(d7e, o2))
#> [1] 84
o3 <- o[15:22]
sum(path_weights(d7e, o3))
#> [1] 111
```

The `weighted_hpaths()`

function uses `order_tsp()`

, a TSP solver (by default nearest insertion) to come up with the first Hamiltonian. This algorithm uses heuristics, and may not find the overall winner. As the graph has just 7 nodes, it is possible to use brute force evaluation to find the shortest hamiltonian tour.

```
order_best(d7,cycle=TRUE)
#> [1] 4 5 1 7 3 2 6
order_tsp(d7,cycle=TRUE)
#> [1] 4 5 7 1 6 2 3
```

In this example, we can confirm that `order_tsp`

finds the shortest Hamiltonian tour. As `order_best`

is computationally highly demanding, if you try it for graphs with more that `maxexact`

(defaults to 9) nodes, only a sample of permutations is evaluated.

## Comparisions of edge traversals on an unweighted complete graph

Consider the graph `k7`

. At his stage, we have four algorithms for forming Eulerians which do not use weights. These are

```
e1 <- eseq(7)
e2 <- eseqa(7)
e3 <- eulerian(7) # same path as eulerian(k7, weighted=FALSE)
h1 <- hpaths(7, matrix=FALSE)
```

The plot of `eseq(7)`

is coloured to show its recursive construction. `eseq(3)`

is in tan, the additional edges added for `eseq(5)`

are in blue, while those for `eseq(7)`

are in pink.

The plot of `eseqa(7)`

shows its construction; edges are a distance 1 (in tan), 2 (in blue) and 3 (in pink) apart repeatedly, considering the last node 7 and the first to be a distance 1 apart.

In `eulerian(7)`

, edges connecting to node 1 are in tan, edges from nodes 2 and 3 are in blue excepting those connecting node 1, while remaining edges involving nodes 4,5, 6 and 7 only are in pink. This display illustrates that the eulerian algorithm always moves to the lowest available node.

In the plot of `hpaths(7)`

the three concatenated hamiltonians are coloured blue, pink and tan.

From these displays, if you want an eulerian visiting the low numbered nodes first, use `eseq()`

or `eulerian()`

. If you want an Eulerian where visits to a node are spread out, pick `eseqa()`

or `hpaths()`

. And of course, if the Hamiltonian property is important, then `hpaths()`

is the best choice.

## Comparisions of edge traversals on an weighted complete graph

We have two different Eulerians for an edge-weighted `k7`

. These are

```
e4 <- eulerian(d7) # same path as eulerian(k7)
h2 <- weighted_hpaths(d7, matrix=FALSE)
```

which are plotted below. The colouring scheme used here is the same as that for the plots of `eulerian(7)`

and `hpaths(7)`

.

Recall that the goal of our traversals of edge-weights graphs was to visit low-weight edges early in the path. To check this, we can find the edges weights for the paths using

```
d7e <- dist2edge(d7)
path_weights(d7e, e4) # the edge weights for e4
#> [1] 1 3 2 7 5 4 10 14 9 6 11 12 16 8 13 17 18 19 15 21 20
path_weights(d7e, h2) # the edge weights for h2
#> [1] 1 7 2 10 4 6 9 14 3 5 21 8 18 12 11 15 19 13 17 16 20
```

Plotting these, we see that the edge weights for both `e4`

and `h2`

increase as the index increases. The path based on Hamiltonians is less successful at ordering the edge weights as it must visit a sequence of Hamiltonians. The edge weights for the first Hamiltonian are in blue, for the second in magenta and the third in tan.

## References

C.B. Hurley and R.W. Oldford, Pairwise display of high dimensional information via Eulerian tours and Hamiltonian decompositions. Journal of Computational and Graphical Statistics. 19(10), pp. 861–886, 2010.

C.B. Hurley and R.W. Oldford, Eulerian tour algorithms for data visualization and the PairViz package. Computational Statistics, 26(4), pp 613–633, 2011.