Read line by line of a file in R

Are you using R for data manipulation for later use with other programs, i.e., a workflow something like this:
  1. read data sets from a disk,
  2. modify the data, and
  3. write it back to a disk.
All fine, but of data set is really big, then you will soon stumble on memory issues. If data processing is simple and you can read only chunks, say only line by line, then the following might be useful:

## File
file <- "myfile.txt"
## Create connection
con <- file(description=file, open="r")
## Hopefully you know the number of lines from some other source or
com <- paste("wc -l ", file, " | awk '{ print $1 }'", sep="")
n <- system(command=com, intern=TRUE)
## Loop over a file connection
for(i in 1:n) {
  tmp <- scan(file=con, nlines=1, quiet=TRUE)
  ## do something on a line of data 
Created by Pretty R at inside-R.org


Parse arguments of an R script

R can be used also as a scripting tool. We just need to add shebang in the first line of a file (script):


and then the R code should follow.

Often we want to pass arguments to such a script, which can be collected in the script by the commandArgs() function. Then we need to parse the arguments and conditional on them do something. I came with a rather general way of parsing these arguments using simply these few lines:

## Collect arguments
args <- commandArgs(TRUE)
## Default setting when no arguments passed
if(length(args) < 1) {
  args <- c("--help")
## Help section
if("--help" %in% args) {
      The R Script
      --arg1=someValue   - numeric, blah blah
      --arg2=someValue   - character, blah blah
      --arg3=someValue   - logical, blah blah
      --help              - print this text
      ./test.R --arg1=1 --arg2="output.txt" --arg3=TRUE \n\n")
## Parse arguments (we expect the form --arg=value)
parseArgs <- function(x) strsplit(sub("^--", "", x), "=")
argsDF <- as.data.frame(do.call("rbind", parseArgs(args)))
argsL <- as.list(as.character(argsDF$V2))
names(argsL) <- argsDF$V1
## Arg1 default
if(is.null(args$arg1)) {
  ## do something
## Arg2 default
if(is.null(args$arg2)) {
  ## do something
## Arg3 default
if(is.null(args$arg3)) {
  ## do something
## ... your code here ...
Created by Pretty R at inside-R.org

It is some work, but I find it pretty neat and use it for quite a while now. I do wonder what others have come up for this task. I hope I did not miss some very general solution.


Kmečka pamet

Zdrava kmečka pamet lahko nadomesti skoraj vsak nivo izobrazbe, Vendar noben nivo izobrazbe ne more nadomestiti zdrave kmečke pameti. Arthur Schopenha


The contribution of dominance and inbreeding depression in estimating variance components for litter size in Pannon White rabbits

Our paper on analysis of dominance in Pannon White rabbits has been accepted and will appear in Journal of Animal breeding and Genetics. Abstract says:

In a synthetic closed population of Pannon White rabbits, additive (VA), dominance (VD) and permanent environmental (VPe) variance components as well as doe (bFd) and litter (bFl) inbreeding depression were estimated for the number of kits born alive (NBA), number of kits born dead (NBD) and total number of kits born (TNB). The data set consisted of 18,398 kindling records of 3883 does collected from 1992 to 2009. Six models were used to estimate dominance and inbreeding effects. The most complete model estimated VA and VD to contribute 5.5 ± 1.1% and 4.8 ± 2.4%, respectively, to total phenotypic variance (VP) for NBA; the corresponding values for NBD were 1.9 ± 0.6% and 5.3 ± 2.4%, for TNB, 6.2 ± 1.0% and 8.1 ± 3.2% respectively. These results indicate the presence of considerable VD. Including dominance in the model generally reduced VA and VPe estimates, and had only a very small effect on inbreeding depression estimates. Including inbreeding covariates did not affect estimates of any variance component. A 10% increase in doe inbreeding significantly increased NBD (bFd = 0.18 ± 0.07), while a 10% increase in litter inbreeding significantly reduced NBA (bFl = −0.41 ± 0.11) and TNB (bFl = −0.34 ± 0.10). These findings argue for including dominance effects in models of litter size traits in populations that exhibit significant dominance relationships.

The contribution of dominance and inbreeding depression in estimating variance components for litter size i... by

Genomic evaluations using similarity between haplotypes

Our paper on using haplotypes to build relationships between individuals has been accepted and will appear in Journal of Animal breeding and Genetics. Abstract says:

Long-range phasing and haplotype library imputation methodologies are accurate and efficient methods to provide haplotype information that could be used in prediction of breeding value or phenotype. Modelling long haplotypes as independent effects in genomic prediction would be inefficient due to the many effects that need to be estimated and phasing errors, even if relatively low in frequency, exacerbate this problem. One approach to overcome this is to use similarity between haplotypes to model covariance of genomic effects by region or of animal breeding values. We developed a simple method to do this and tested impact on genomic prediction by simulation. Results show that the diagonal and off-diagonal elements of a genomic relationship matrix constructed using the haplotype similarity method had higher correlations with the true relationship between pairs of individuals than genomic relationship matrices built using unphased genotypes or assumed unrelated haplotypes. However, the prediction accuracy of such haplotype-based prediction methods was not higher than those based on unphased genotype information.

Genomic evaluations using similarity between haplotypes by