tag:blogger.com,1999:blog-6715598735361401237.post5181912206762824085..comments2023-07-23T13:34:48.842+02:00Comments on Gregor Gorjanc (gg): Functions dim, nrow, and ncol for shellGorjanc Gregorhttp://www.blogger.com/profile/07815994784120702971noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-6715598735361401237.post-91469610269781727242013-06-11T09:16:30.966+02:002013-06-11T09:16:30.966+02:00awk 'BEGIN{ getline; print NF }'
may be p...awk 'BEGIN{ getline; print NF }'<br /><br />may be pretty quick too ;)<br /><br />I compiled my earlier Awk program for this using Awka <br /><br />http://awka.sourceforge.net<br /><br />as it reads right through to check for any unequal lines.<br /><br />My newer beautiful Fortran code gives output like:<br /><br />file wc beagle_chr21.genotypes_chr21.gz.gprobs.gz<br /><br />Field counts for "beagle_chr21.genotypes_chr21.gz.gprobs.gz":<br /><br />L 1 Len 86548 NFields 9801: "l.4977 col.4979 col.4979 col.4979 col.4981 col.498"<br /><br />Number of lines = 33826<br />Length of longest line = 86548 chars<br />Total number of words = 331528626<br />Maximum words per line = 9801<br />Constant word count per line? = T<br />Length of longest word = 10 charsDavid Duffyhttps://www.blogger.com/profile/12142997170025811780noreply@blogger.comtag:blogger.com,1999:blog-6715598735361401237.post-24242585085985198282012-12-29T06:48:47.698+01:002012-12-29T06:48:47.698+01:00Francois, I did some testing on a file with 550 ro...Francois, I did some testing on a file with 550 rows and 100001 columns with 100 repetitions (the code is bellow) and behold:<br /><br />"My" approach:<br />- nrow ~6.8sec<br />- ncol ~1.3sec<br /><br />"Pure awk" approach:<br />- nrow ~17.7sec<br />- ncol ~19.0sec<br /><br />So "pure awk" approach is way slower. Though few seconds up and down do not really save the day, unless this would be really mission critical. Perhaps there is a way to say to awk in the nrow case to read just first columns and in the ncol case to read only the first line.<br /><br />time for i in $(seq 1 1 100); do wc -l F2Chip9Genotype.txt > tmp; done<br /><br />time for i in $(seq 1 1 100); do head -n 1 F2Chip9Genotype.txt | awk '{ print NF }' > tmp; done<br /><br />time for i in $(seq 1 1 100); do awk 'END{ print NR }' F2Chip9Genotype.txt > tmp; done<br /><br />time for i in $(seq 1 1 100); do awk 'END{ print NF }' F2Chip9Genotype.txt > tmp; doneGorjanc Gregorhttps://www.blogger.com/profile/07815994784120702971noreply@blogger.comtag:blogger.com,1999:blog-6715598735361401237.post-45361404039041945782012-12-28T21:37:54.946+01:002012-12-28T21:37:54.946+01:00I like your stuff - might be actually more efficie...I like your stuff - might be actually more efficient. I knew people will propose better ways to make this even faster and neater, which is one of the reasons I bothered posting this on blog! Thanks!!!Gorjanc Gregorhttps://www.blogger.com/profile/07815994784120702971noreply@blogger.comtag:blogger.com,1999:blog-6715598735361401237.post-75579617723147085202012-12-28T16:57:25.693+01:002012-12-28T16:57:25.693+01:00Hi Gregor,
a shorter and simpler version of :
h...Hi Gregor, <br /><br />a shorter and simpler version of :<br /><br />head -n 1 filename | awk '{ print NF }'<br /><br />could be :<br /><br />awk 'END{print NF}' filename<br /><br />Likewise the number of line could be :<br />awk 'END{print NR}' filename<br /><br />But I admit these may be less obvious.Anonymousnoreply@blogger.com