Oak Ridge National Laboratory

Physics Division

Physics Division Seminars

Physics Division Seminars bring us speakers on a variety of physics related subjects. Usually these are held in the Building 6008 large Conference Room, at 3:00 pm on the chosen day, but times and locations may vary. For more information, contact our seminar chairman,

Alfredo Galindo-Uribarri
Tel (Office): (865) 574-6124  (FAX): (865) 574-1268


Thu., June 08, 2006, at 3:00 p.m. (refreshments at 2:40 p.m.)

We Are All Africans, The Genographic Project and Inferring Common Origins from mtDNA

Gyan Bhanot, IBM Research and Institute for Advanced Study, Princeton
Building 6008 Conference Room

Human migratory events inferred from observed variations in mtDNA and nr-Y Chromosome suggest an African origin for all extant humans. Mutations define markers for population events. Coalescence theory suggests that most present day polymorphisms are recent and carry no useful information about deep ancestry. Only mutations that robustly distinguish large clusters of individuals represent ancient population events.

The Genographic Project is a 5-year collaboration between IBM and National Geographic to collect data from indiginous populations to discover this story of human migrations in detail. The "public" aspect of the project allows anyone to submit a sample to find out where they fit on the mtDNA or Y tree of migrations and so and learn a part of their own story "Out of Africa". The present talk give a description of the basic biology of such analysis including a discussion of the current state of the art of inferring migration phylogeny.

Next, I will present new results from an analysis of 1737 complete mtDNA sequences from public databases using PCA and unsupervised consensus ensemble k-clustering. Our analysis clearly shows that the purely African clades L0/L1, L2 and L3 are the oldest. It also suggests that the M, N clades share a common ancestory with L3 and emerged in two separate migrations, the first of which gave rise to the M clade and the second to the N clade. A major result is that in the N clade, the genetic distance between the A and B/R5 haplogroups is much smaller than that between B/R5 and J/T/H/V/U. As a result, our migration tree places the B/R5 groups in close proximity to A rather than to T, J, H, V or U haplogroups. We find a detailed substructure for the M clade tree with 14 additional branches for the MD haplogroup.

For each of the L0/L1, L2 and L3 clades we find several subgroups. We also provide detailed protocols for classification of each haplogroup with a predictive accuracy exceeding 90%.

If time permits, I will discuss an application of these methods to the detection of cancer progression. Analysis of a dataset from Ma, et al., clearly show several distinct disease subtypes separate and clearly identifiable pathways. It also suggets that the specific pathways that lead to the varieties of grades and types of disease may have been initiated very early in the development of the disease phenotype, contrary to the current view of cancer progression.