Comments on Gregor Gorjanc (gg): Functions dim, nrow, and ncol for shell

awk 'BEGIN{ getline; print NF }' may be p...

2013-06-11T09:16:30.966+02:00

awk 'BEGIN{ getline; print NF }'

may be pretty quick too ;)

I compiled my earlier Awk program for this using Awka

http://awka.sourceforge.net

as it reads right through to check for any unequal lines.

My newer beautiful Fortran code gives output like:

file wc beagle_chr21.genotypes_chr21.gz.gprobs.gz

Field counts for "beagle_chr21.genotypes_chr21.gz.gprobs.gz":

L 1 Len 86548 NFields 9801: "l.4977 col.4979 col.4979 col.4979 col.4981 col.498"

Number of lines = 33826
Length of longest line = 86548 chars
Total number of words = 331528626
Maximum words per line = 9801
Constant word count per line? = T
Length of longest word = 10 chars

Francois, I did some testing on a file with 550 ro...

2012-12-29T06:48:47.698+01:00

Francois, I did some testing on a file with 550 rows and 100001 columns with 100 repetitions (the code is bellow) and behold:

"My" approach:
- nrow ~6.8sec
- ncol ~1.3sec

"Pure awk" approach:
- nrow ~17.7sec
- ncol ~19.0sec

So "pure awk" approach is way slower. Though few seconds up and down do not really save the day, unless this would be really mission critical. Perhaps there is a way to say to awk in the nrow case to read just first columns and in the ncol case to read only the first line.

time for i in $(seq 1 1 100); do wc -l F2Chip9Genotype.txt > tmp; done

time for i in $(seq 1 1 100); do head -n 1 F2Chip9Genotype.txt | awk '{ print NF }' > tmp; done

time for i in $(seq 1 1 100); do awk 'END{ print NR }' F2Chip9Genotype.txt > tmp; done

time for i in $(seq 1 1 100); do awk 'END{ print NF }' F2Chip9Genotype.txt > tmp; done

I like your stuff - might be actually more efficie...

2012-12-28T21:37:54.946+01:00

I like your stuff - might be actually more efficient. I knew people will propose better ways to make this even faster and neater, which is one of the reasons I bothered posting this on blog! Thanks!!!

Hi Gregor, a shorter and simpler version of : h...

2012-12-28T16:57:25.693+01:00

Hi Gregor,

a shorter and simpler version of :

head -n 1 filename | awk '{ print NF }'

could be :

awk 'END{print NF}' filename

Likewise the number of line could be :
awk 'END{print NR}' filename

But I admit these may be less obvious.