2022-11-11

Find ancestor (R function)

We have a pedigree and want to find a sub-pedigree that lists the ancestors of some individuals. Here is an R function for this, but first, lets show an example:

library(package = "pedigreemm")
library(package = "graph")
library(package = "Rgraphviz")

ped <- data.frame( id = c(  1,   2,   3,   4,   5,   6,   7,   8,   9,  10),
                  fid = c( NA,  NA,   2,   2,   4,   2,   5,   5,  NA,   8),
                  mid = c( NA,  NA,   1,  NA,   3,   3,   6,   6,  NA,   9))
ped2 <- with(ped, pedigree(sire = fid, dam = mid, label = id))
g <- as(t(as(ped2, "sparseMatrix")), "graph")
plot(g)

Now the function

traceAncestors <- function(ids, ped, missing = NA) {
  # ids - a vector of individuals, possibly not unique
  # ped - data.frame of global pedigree with id, father, and mother columns
  
  # Take pedigree rows for ids
  sel <- ped[[1]] %in% ids
  ret <- ped[sel, ]
  # Find their parents (new ids)
  ids <- c(ped[[2]][sel], ped[[3]][sel])
  # ... that are unique and known
  ids <- unique(ids[!ids %in% missing])
  # ... that are not ids already
  ids <- ids[!ids %in% ret[[1]]]
  
  # Loop
  while (length(ids) > 0) {
    # Take pedigree rows for new ids
    sel <- ped[[1]] %in% ids
    ret <- rbind(ped[sel, ], ret)
    # Find their parents (new ids)
    ids <- c(ped[[2]][sel], ped[[3]][sel])
    # ... that are unique and known
    ids <- unique(ids[!ids %in% missing])
    # ... that are not ids already
    ids <- ids[!ids %in% ret[[1]]]
  }
  return(ret)
}

And a few examples

> traceAncestors(ids = 4, ped = ped)
  id fid mid
2  2  NA  NA
4  4   2  NA
> 
> traceAncestors(ids = 6, ped = ped)
  id fid mid
1  1  NA  NA
2  2  NA  NA
3  3   2   1
6  6   2   3
> 
> traceAncestors(ids = c(4, 6), ped = ped)
  id fid mid
1  1  NA  NA
2  2  NA  NA
3  3   2   1
4  4   2  NA
6  6   2   3
> 
> traceAncestors(ids = c(4, 6, 10), ped = ped)
   id fid mid
1   1  NA  NA
5   5   4   3
2   2  NA  NA
3   3   2   1
8   8   5   6
9   9  NA  NA
4   4   2  NA
6   6   2   3
10 10   8   9 

1 comment:

Gorjanc Gregor said...

We could probably cut out the first part by allocating an empty data.frame and run just the loop!