Gregor Gorjanc (gg): 03/01/2008

2008-03-31

Calculation of genotype probabilities

I am calculating genotype probabilities for PrP gene in our local sheep breeds. I am using iterative allelic peeling with incomplete penetrance (Thallman et al., 2001a, 2001b) as implemented in the GenoProb program, written by Thallman. Yes, I am using his method since he was kind enough to share the software with me. Of course, the methods does what I need. I did a bit of research on the available methods and from what I see one can use the following for animal livestock pedigrees:

iterative peeling (see van Arendonk et al., 1989 and papers citing it!),
Elston-Stewart iterative peeling (ESIP) sampler (Fernández et. al, 2001),
descent graphs samplers (see Henshall and Tier, 2003) and
allelic peeling (Thallman et al., 2001a, 2001b).

Get ready for some heavy reading if you dicided to look at the papers, especially the Fernández et. al (2001). Their method really seems great, but I got lost in the details. Huh, and I almoast always say that authors do not put enough details in the papers. It is probably just the problem that I have a hard time following their paper. There is a nice overview of methods for calculation of genotype probabilities at the beginning of their paper!

Fernández, S. A., Fernando, R. L., Guldbrandtsen, B., Totir, L. R. & Carriquiry, A. L. Sampling genotypes in large pedigrees with loops. Genetics Selection Evolution 33, 337-367 (2001). http://dx.doi.org/10.1051/gse:2001122

Henshall, J. M. and Tier, B. An algorithm for sampling descent graphs in large complex pedigrees efficiently. GR, 2003, 81(3):205-212. http://dx.doi.org/10.1017/S0016672303006232

Thallman, R. M. and Bennet, G. L. and Keele, J. W. and Kappes, S. M. Efficient computation of genotype probabilities for loci with many alleles: I. Allelic peeling. JAS, 2001a, 79(1):26-33. http://jas.fass.org/cgi/reprint/79/1/26

Thallman, R. M. and Bennet, G. L. and Keele, J. W. and Kappes, S. M. Efficient computation of genotype probabilities for loci with many alleles: II. Iterative method for large, complex pedigrees. JAS, 2001b, 79(1):34-44. http://jas.fass.org/cgi/reprint/79/1/34

van Arendonk, J. A. M. and Smith, C. and Kennedy, B. W. Method to estimate genotype probabilities at individual loci in farm livestock. TAG, 1989, 78:735-740.

Število živine, podrobni podatki, Slovenija, 1.12. 2007

SURS je objavil stalež živine na dan 1.12.2007. Zame je bistveno tole: "Število drobnice se v letu 2007 ni bistveno spremenilo". Nekaj je sicer večjih sprememb pri številu plemenskih ovc, ki so bile prvič pripuščene in sicer za "ne mlečne" pasme. Sicer pa je število ovc skupaj ocenjeno na 131.180 (~86.500 "ne mlečnih" plemenskih ovc in ~4.000 mlečnih plemenskih ovc), za koze pa na 28.228 (~14.000 "ne mlečnih" plemenskih koz in ~5.000 mlečnih plemenskih koz).

Tukaj pa so povezave do preglednic - pišem tukaj, da ne bom spet polnil svoje zaznamke:

Pa še povezava do celotnega seznama.

Multigenome projects

Daniel Gianola has posted this on AGDG list. Interesting. The pace of gene studies is increasing every day.

Nature 2008, 451: 234

Next-generation human genomics has arrived. The first large-scale whole-genome sequencing project has now begun in China, and an international multi-genome sequencing programme is hot on its heels. The Yanhuang Project, which will sequence the entire genomes of 100 Chinese individuals over 3 years was announced by the Beijing Genomics Institute (BGI) on 8 January. Ye Jia, a spokeswoman for the project, said that once it is completed, the BGI aims to sequence the genomes of thousands more people, including ethnic groups from other Asian countries. And a large international project, which aims to sequence the genomes of close to 1,000 individuals, is expected to be formally unveiled by the US National Institutes of Health in Bethesda, Maryland, and the Wellcome Trust Sanger Institute in Cambridge, UK, later this week. As yet it doesn’t have a name, but is informally called the ‘1,000 genomes’ project and the ‘Multigenome project’. It will probably include the hundreds of individuals who participated in the International HapMap Project — an ongoing study of genetic diversity — as well as hundreds of other individuals. The BGI will also participate in the 1,000 genomes project, says director Yang Huanming. However, only participants who meet the ethics and consent rules decided on by the international collaboration will be able to join that study, he says. The projects usher in what many scientists think will be a new era of large-scale genomics — made possible with rapid-sequencing technologies — that will lead to more powerful comparisons between and within populations. Last year, scientists Craig Venter and James Watson became the first to release their complete individual DNA sequences. And a team led by George Church at Harvard University in Cambridge, Massachusetts, has begun the ‘Personal Genome Project’ that will examine portions of DNA from ten individuals who have agreed to share their information with the rest of the world. But the Yanhuang Project — named after two emperors thought to be the ancestors of China’s largest ethnic group — is the first to examine the entire genomes of private individuals. The first individual sequenced in the Yanhuang Project was a researcher; the second paid 10 million yuan (about US$1.4 million) to have his genome sequenced, Yang says. It is unclear whether such people will qualify for the international project, whose rules on confidentiality of data and the informed consent of participants may differ from China’s. Whole-genome sequencing studies are expected to deepen our scientific understanding of populations such as the Chinese, whose genetics have not been studied in great detail. The findings will inform medical research specific to those populations, and improve our understanding of human history, says Rasmus Nielsen of the University of California, Berkeley. “One of the exciting things about having so many sequences from Chinese individuals is that we will be able to say how much genetic exchange there has been between continents since [early humans migrated] out of Africa. That’s been very hotly debated.” The sequencing will allow scientists to add more detail to their maps of human diversity. The last large study of diversity, the HapMap, analysed only single-nucleotide polymorphisms, or SNPs — places in which DNA differs between two individuals by just one letter of the genetic code. This approach allows scientists to hunt for relatively common genetic variants. But the evidence linking disease to rare variants is growing, says Richard Myers, director of the Stanford Human Genome Center in Palo Alto, California. Whole-genome sequencing will improve detection of these rare variants, and offer a more complete understanding of the genetics of many human traits, he predicts. “It’s going to be very useful to sequence genomes from all populations and have large enough numbers so you can do comparisons between populations,” Myers says. “Even if you don’t care about disease, it’s going to help us look at human population history and phenotypes not relevant to disease, such as craniofacial structure, eye colour, hair colour and other fascinating things.”

Jane Qiu and Erika Check Hayden

2008-03-29

Molly objavlja slike kozličev na svojem blogu. Povezava do zadnje slike.

Klavne lastnosti sesnih kozličev in jagnjet

Carcass composition and meat quality of equally mature kids and lambs
V. A. C. Santos, S. R. Silva, J. M. T. Azevedo
http://dx.doi.org/10.2527/jas.2007-0780

2008-03-27

Animal Breeding Pedigree

http://www.animalgenome.org/lush/

2008-03-25

Ogromno povezav za razična področja statistike

Statlink

O zgodovini statistike in statistikov

Statistics index

Milijoni prihranka ob poenotenju informacijske infrastrukture na osnovi Red Hat Linux 5

SpaceTime

SpaceTime really rocks! Unfortunately, it does not work on Linux;)

2008-03-21

Multipli testi

Ste kdaj naredili kakšnega od multiplih testov in ugotovili, da obstajajo razlike med rezultati testov? Jaz že nekajkrat. Od takrat imam malo odpora do teh testov. Do razlik prihaja zaradi različnega namena testov. In zakaj to počnemo? Da ne bi govorili o razlikah, če teh ni. Torej si želimo, da bi bile ocene parametrov in pripadajoče standardne napake (SE) kar se da primerne - konzervativne. Kako pa lahko zagotovimo, da so ocene parametrov in SE konzervativne?

Andrew Gelman priporoča hierarhični model, ki "poskrbi", da parametre z malo informacije "potegnemo" k povprečju in s tem dobimo bolj konzervativne ocene.

2008-03-15

Inbreeding coefficient and multilocus heterozygosity

I just came accross this article:

Slate et al. 2004. Understanding the relationship between the inbreeding coefficient and multilocus heterozygosity: theoretical expectations and empirical data
Heredity (2004)93, 255–265. doi:10.1038/sj.hdy.6800485

And I was surprised to read in the abstract that:

"Multilocus heterozygosity was only weakly correlated with inbreeding coefficient, and heterozygosity was not positively correlated between markers more often than expected by chance. Inbreeding coefficient, but not multilocus heterozygosity, detected evidence of inbreeding depression for morphological traits"

Hmm. I am missing something ... I always thought this two measures should be correlated.

2008-03-14

Koeficienti "sorodstva"

Sam sem se že velikokrat zmotil glede t.i. koeficientov sorodstva. Pred časom sem si vzel nekaj časa in si pri sebi razčistil za kaj gre. Ker vidim, da nekateri drugi prav tako mešajo zadeve kot sem jih jaz, sem se odločil napisati tale post.

V angleški literaturi (vsaj tisto kar jaz berem) se za opis sorodstva najpogosteje pojavljajo sledeči izrazi (brez kakšnega posebnega vrstnega reda):

coeffcient of relationship
coeffcient of relatedness
coeffcient of kinship
coeffcient of parentage
coeffcient of coancestry
coeffcient of consanguinity
coeffcient of identity
coeffcient of inbreeding

Strela. Toliko izrazov za eno in isto figo. Pa je res ena in ista figa? Če pogledam v moj slovar (ASP32, v1.40), dobim tole:

relationship - sorodstvo, sorodnost, zveza, odnos, razmerje, soseščina, degree of ~ sorodstvena stopnja
relatedness - sorodnost, sorodstveno razmerje
kinship - (krvno) sorodstvo, sorodnost
parentage - poreklo, rod, vir
coancestry - ... brez zadetka ...
consanguinity - krvno sorodstvo, sorodnost
identity - identiteta, istovetnost, enakost
inbreeding - parjenje med sorodnimi vrstami

Glede na slovar bi lahko rekli, da gre za eno in isto zadevo - razen za identity. OK, izločim lahko "identity", ostalo pa gre za isto zadevo. Super! Ampak ko bereš en članek in drugi članek pa še kakšno knjigo, po možnosti dve, tri, vidiš, da avtorji govorijo o podobnih, a vendarle različnih konceptih. O čem govorijo?

Za "coefficient of relationship" lahko mirno uporabimo prevod koeficient sorodstva. Ta prevod je najbolj logičen. Ob tem koeficientu večina avtorjev navaja delo Wright-a iz leta 1922, v katerem je koeficient sorodstva izpeljal kot korelacijo med plemenskimi vrednostmi. Plemenska vrednost ali tudi aditivna genetska vrednost je seštevek povprečnih učinkov alelov. Vsak ve, da diploidni organizmi prejmejo pol genoma (tistega v jedru) od očeta in pol od matere. ker je delitev naključna, imata npr. brat in sestra polovico alelov (pojavnih oblik gena) enakih. Če smo natančni, naj bi imela brat in sestra v povprečju pol alelov enakih. Ker je plemenska vrednost seštevek povprečnih učinkov alelov, je logično, da bo korelacija med plemensko vrednostjo brata in plemensko vrednostjo sestre enaka 0,5, saj imata polovico alelov skupnih in je tako polovica njunih plemenskih vrednosti "enaka". No povsem enak rezultat je dobil tudi Fisher leta 1918 (glej tabelo na sedmi strani članka; Fisher, 1918). Prikažimo še na primeru. Predpostavimo, da imamo dva osebka, ki sta nesorodna in nista inbridirana - nista rezultat parjenja v sorodstvu. Potem morata imeti na enem lokusu vsak po dva alela, ki sta različna po izvoru. Oče je po genotipu AB, mati pa CD, kjer so A, B, C in D aleli različni po poreklu - imajo različni izvor. Če imata ta dva osebka potomce, so ti lahko po genotipu AC, AD, BC in BD (naredil sem vse možne kombinacije alelov od staršev). V preglednici sem namenoma naredil dva potomca, zaradi prikaza izračuna koeficientov. Predpostavimo, da sta brat in sestra. Oba imata možnosti za vse od štirih naštetih genotipov:


Oče: AB		Mati: CD

Sin: AC		Hči: AC
Sin: AD		Hči: AD
Sin: BC		Hči: BC
Sin: BD		Hči: BD

Koliko alelov, ki so identično po poreklu imata brat in sestra? Naj bi jih imela eno polovico. Preverimo. Naredil bom tabelo, kjer bom na vrh nanizal možne genotipe enega potomca in na levo stran nanizal možne genotipe drugega potomca. Če se ujemata oba alela (kot v celici zgoraj levo), vstavim vrednost 1,0. Če se ujema le po en alel, vstavim vrednost 0,5. Če se ne ujema noben alel, vstavim 0.0.

	AC	AD	BC	BD

AC	1,0	0,5	0,5	0,0
AD	0,5	1,0	0,0	0,5
BC	0,5	0,0	1,0	0,5
BD	0,0	0,5	0,5	1,0

Sedaj vse vrednosti seštejem in dobim vrednost 8. Vseh možnosti je 16. Torej so bili aleli po poreklu identični v 8 primerih od 16 oziroma v 8/16 = 1/2 (polovici) primerov. Torej lahko rečemo, da je koeficient sorodstva enak verjetnosti, da imata dva osebka na določenem lokusu po poreklu identične alele. Pozor! Tukaj so v obravnavi VSI aleli.

Kaj je pa sedaj z ostalimi koeficienti? Sam bi si upal reči, da je "coefficient of relatedness" res ista figa kot "coefficient of relationship".

V istem članku je Wright (1922) definiral "coefficient of inbreeding" kot korelacijo med gametami, ki se združita in posledično ustvarita nov osebek. Pri diploidnih organizmih nosijo gamete enojno število kromosomov in s tem samo po en alel z določenega lokusa. Torej lahko rečemo, da je Wright definiral koeficient inbridinga kot korelacijo med dvema aleloma, ki sta naključno izbrana - prvi alel od prvega osebka in drugi alel od drugega osebka. Če sta osebka sorodna, obstaja možnost, da bo izbrani par alele po poreklu identičen. Torej je koeficient inbridinga enak verjetnosti, da sta dva naključno izbrana alela od dveh osebkov identična po poreklu. Pozor! Tukaj gre samo za dva alela; po en od vsakega osebka.

Poglejmo na našem primeru. Sin in hči lahko producirata gamete, ki nosijo enega izmed alelov: A, B, C ali D. Naredim vse možne kombinacije med aleli in dobim tole:

	A	B	C	D

A	AA	AB	AC	AD
B	AB	BB	BC	BD
C	AC	BC	CC	CD
D	AD	BD	CD	DD

Vseh kombinacij je 16 in v 4 primerih sta alela identična po poreklu. Torej je koeficient inbridinga enak 4/16 = 1/4.

Kaj pa "coefficient of kinship", "coefficient of parentage", "coefficient of coancestry" in "coefficient of consanguinity"? Moj slovar pravi, da naj bi šlo za enak prevod kot pri zgornjem koeficientu sorodstva. Ampak, ko berem literaturo, opažam, da se ta poimenovanja vežejo na delo Malecot-a (1948). Ta je definiral "Les coefficients de parenté". Sam ne znam francosko in njegove knjige še videl nisem (razen nekaj odsekov pri Google Books). Vrednost tega koeficienta naj bi bila za brata in sestro 1/4. Hmm. Res ne gre za isto zadevo. Ali pač? Ne, ni šans, saj vsak ve, da imata brat in sestra polovico skupnih genov (alelov). Torej so tile koeficienti nekaj drugega. Kaj? Epperson (1999) je opisoval delo Malecot-a, in med drugim pove, da je Malecot "Les coefficients de parenté" kasneje imenoval kot "identity by descent". Tole lahko mirno prevedemo kot identično po poreklu. Nagylaki (1989) je tudi opisval delo Malecot-a in jasno nakazal, da je Malecot-ov koeficient verjetnost, da sta dva naključno izbrana alela identična po poreklu. Če izberemo alela od enega osebka, dobimo koeficient inbrindinga tega osebka, če pa izberemo alela od dveh osebkov, dobimo koeficient inbridinga njunega potomca. Torej so "Malecot's coefficient", "coefficient of kinship", "coefficient of parentage", "coefficient of coancestry" in "coefficient of consanguinity" pravzaprav le druga poimenovanja za koeficient inbridinga in ne za koeficient sorodstva. Kot smo videli na primeru, je koeficient inbridinga odvisen od sorodstva med starši. Če starši niso sorodni, potem potomec ni inbridiran. Če sta starša sorodna in nista inbridirana, bo koeficient inbridinga pri potomcu enak polovici koeficienta sorodstva med staršema.

Epperson, B. K. 1999. Gustave Malécot, 1911–1998: Population Genetics Founding Father. Genetics, 152: 477-484.
http://www.genetics.org/cgi/content/full/152/2/477

Fisher, R. A. 1918. The correlation between relatives on the supposition of Mendelian inheritance. Royal Society of Edinburgh from Transactions of the Society, 52: 399-433.
http://digital.library.adelaide.edu.au/coll/special/fisher/9.pdf

Malécot, G. 1948. Les mathématiques de l'hérédité. Masson et Cie., Paris.
http://books.google.com/books?id=S5oKAAAAMAAJ&q=%22Les+math%C3%A9matiques+de+l'h%C3%A9r%C3%A9dit%C3%A9%22&dq=%22Les+math%C3%A9matiques+de+l'h%C3%A9r%C3%A9dit%C3%A9%22&hl=sl&pgis=1

Nagylaki, T. 1989. Gustave Malecot and the transition from classical to modern population genetics. Genetics, 122:253-268
http://www.genetics.org/cgi/reprint/122/2/253

Wright, S. 1922. Coefficients of inbreeding and relationship. Americna Naturalist, 56: 330-339.
http://dx.doi.org/10.1086/279872

2008-03-08

Blue-eye and red-eye removal with Picnik

I often get blue-eyes when taking photos of animals. There are a lot of tutorials about removal of red-eye, however they do not work for my problems with blue-eyes, sometimes even with red-eyes ;) I discovered recently Picnik, which is a web application for image editing. It is free and you can even start to use it without registration. Well, free set of features are also available in tools like Picasa or Gimp, however {red,blue}-eye removal is really fancy. Instead of tweaking the red channel it pops a fake eye on {red,blue}-eye area. I guess that the algorithm works with color intensity, but I am not really interested in the details, since this tool in general does what I want. Perhaps additional brushing in Gimp is needed to remove a fake look.

2008-03-06

Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits

Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000008

2008-03-01

Paying people or getting paid for you job!

Excellent pots by Torjo at

http://torjo.blogspot.com/2008/03/paying-your-people.html

	AC	AD	BC	BD

AC	1,0	0,5	0,5	0,0
AD	0,5	1,0	0,0	0,5
BC	0,5	0,0	1,0	0,5
BD	0,0	0,5	0,5	1,0

	AC	AD	BC	BD

AC	1,0	0,5	0,5	0,0
AD	0,5	1,0	0,0	0,5
BC	0,5	0,0	1,0	0,5
BD	0,0	0,5	0,5	1,0

	AC	AD	BC	BD

AC	1,0	0,5	0,5	0,0
AD	0,5	1,0	0,0	0,5
BC	0,5	0,0	1,0	0,5
BD	0,0	0,5	0,5	1,0