D.Sc.,
Washington University in St. Louis
Tel: (617) 552-3571
E-mail: marth@bc.edu
Dr. Marth's Website
Fields of Interest
DNA sequence variation, genome data mining and informatics, population genetics,
medical genetics
Academic Profile
The genetic blueprint of our species, the sequence of human DNA, is nearly identical
from person to person. Genetic variations, features within our genome that do
show variation within different individuals, result from mutation events, and
are passed down from generation to generation. Genetic variations are important
because they carry the patterns imprinted by the inheritance of these mutations
and hence permit the reconstruction of our origins and demographic past. As
importantly, a subset of genetic variations alters gene expression or function,
and results in phenotypic variance such as variations of height. In some cases,
a variant form (or allele) causes disease. My laboratory is interested in several
aspects of sequence variation research.
- Discovery of single-nucleotide polymorphisms (SNPs) in DNA sequence
data: Efficient polymorphism detection requires that sequences representing
the same loci from multiple individuals are correctly clustered and accurately
aligned in a base-to-base fashion. Apparent sequence differences are then
examined to determine if they represent true polymorphisms as opposed to sequencing
errors. We have developed a set of algorithms and a corresponding software
package, PolyBayes (http://genome.wustl.edu/gsc/polybayes) that implements
these steps. PolyBayes is one of the primary methods used in genome-scale
as well as locus-scale SNP discovery. Current work focuses on polymorphism
mining in model organisms, and building pipelines for public use.
- Population genetic theory development: Random genetic drift,
the mutation process, recombination, long-term demography, and selection act
collectively to shape the landscape of human polymorphism structure. Higher
mutation rates lead to more polymorphisms; random drift drives most novel
mutations to extinction while preserving some; recombination shuffles mutations
that originally arose on the same chromosome and breaks down allelic association;
population bottlenecks reduce genetic diversity; selection promotes the spread
of an advantageous allele, allowing non-functional alleles in close proximity
to “hitchhike” with it. Based on the powerful methodology termed
the “coalescent” we have developed mathematical models and simulation
procedures to describe the shape of two characteristic SNP distributions,
marker density and the allele frequency spectrum under complex scenarios of
demographic history, and realistic recombination rates. Current research is
aimed at refining these models, and at describing other characteristic distributions
such as the distribution of inter-marker spacing.
- Reconstruction of human demographic history: Changes in
long-term population size, such as population expansion, collapse, or bottleneck
imprint genome-wide SNP distributions, e.g. expansion gives rise to many rare
alleles, a collapse preferentially weeds out rare alleles leading to an over-representation
of high-frequency or common alleles. Human polymorphism and genotype data
available on the genome scale now provides data sufficiency to infer these
patterns for large world populations. Our own results show that long-term
demographic history was different for some of these populations: European
and Asian groups have undergone a population bottleneck, an event that was
not observed in African samples. Current research aims at better understanding
of these population-specific differences, and describing the spatial aspects
of human variation structure as observed along the human chromosomes.
- Human haplotype structure and the HapMap: A haplotype is
a combination of alleles at adjacent marker locations, co-inherited from generation
to generation. Co-inheritance can be disrupted by recombination events that
occur between markers. Recent results indicate that human haplotype structure
is characterized by long (tens of kilobases) regions where allelic association
hence haplotypes are preserved. These regions, termed “haplotype blocks”,
are interrupted by regions of minimal allelic association. Haplotypes within
blocks can be described by a small subset of markers that defined them, permitting
substantial savings in genotyping cost. This fact prompted the HapMap initiative,
a project aimed at describing human haplotype structure at a fine scale, in
multiple populations. Haplotypes are governed by the same forces that give
rise to polymorphism structure, hence the same principles can be used in their
analysis. Current research in this lab focuses on understanding how general
haplotype blocks are, how uniform they are across different human population
groups, how deep sampling is required to find them in a stable fashion. Answering
these questions is critical to find the right experimental design for the
HapMap project, and to ensure the generality and utility of this costly resource.
- Medical genetics: The main driving force behind public and private investments into variation resources is the promise that these resources will be useful in tracking down the genetic causes of heritable diseases. The goal is to either find the specific functional mutations that cause disease, or to find molecular markers that are predictive of disease susceptibility, response to treatment, and possible side effects. The difficulty is that common diseases affecting millions of people are thought to be multi-factorial i.e. susceptibility depends on a possibly very large number of genes, the individual effect of each gene possibly being very modest. This means that the effects of a given locus are very difficult to measure. This lab is interested in discovering those features in genome variation data that can be interpreted as signatures of causative loci. We are also interested in developing tools that bring the fruits of the HapMap project to the specialized laboratory involved in hunting down disease genes.
Representative Publications
Marth, G.T., Czabarka, E., Murvai, J., and Sherry, S.T. 2004. The allele frequency
spectrum reveals differential demographic histories in three large world populations.
Genetics 166: 351–372. (link
to PubMed abstract)
Marth, G.T., Cutler, D., Wooding, S., Schuler, G., Yeh, R., Davenport, R., Agarwala, R., Church, D., Wheelan, S., Baker, J., Ward, M., Kholodov, M., Phan, L., Czabarka, E., Murvai, J., Cutler, D., Wooding, S., Rogers, A., Chakravarti, A., Harpending, H.C., Kwok, P.Y., and Sherry, S.T. 2003. Sequence variations in the public human genome data reflect a bottlenecked population history. Proceedings of the National Academy of Sciences USA 100: 376–381. (link to PubMed abstract)
Marth, G.T. 2003. Computational SNP discovery in DNA sequence data. In: Single Nucleotide Polymorphisms: Methods and Protocols (Ed. Kwok, P.Y.), Humana Press. (link to PubMed abstract)
Weber, J.L., David, D., Heil, J., Fan, Y., Zhao, C., and Marth, G.T. 2002. Human diallelic insertion/deletion polymorphisms. American Journal of Human Genetics 71: 854–62. (link to PubMed abstract)
Marth, G., Yeh, R., Minton, M., Donaldson, R., Li, Q., Duan, S., Davenport, R., Miller, R.D., and Kwok, P.Y. 2001. Single-nucleotide polymorphisms in the public domain: how useful are they? Nature Genetics 27: 371–2. (link to PubMed abstract)
Sachidanandam, R., Weissman, D., Schmidt, S.C., Kakol, J.M., Stein, L.D., Marth, G., Sherry, S., Mullikin, J.C., Mortimore, B.J., Willey, D.L., Hunt, S.E., Cole, C.G., Coggill, P.C, Rice, C.M., Ning, Z., Rogers, J., Bentley, D.R., Kwok, P.Y., Mardis, E.R., Yeh, R.T., Schultz, B., Cook, L., Davenport, R., Dante, M., Fulton, L., Hillier, L., Waterston, R.H., McPherson, J.D., Gilman, B., Schaffner, S., Van Etten, W.J., Reich, D., Higgins, J., Daly, M.J., Blumenstiel, B., Baldwin, J., Stange-Thomann, N., Zody, M.C., Linton, L., Lander, E.S., Altshuler, D. The international SNP map working group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409: 928–33. (link to PubMed abstract)
Marth, G.T., Yandell, M.D., Korf, I., Gu, Z., Yeh, R.T., Zakeri, H., Stitziel, N.O., Hillier, L., Kwok, P.Y., and Gish, W. 1999. A general approach to single-nucleotide polymorphism discovery. Nature Genetics 23: 452–456. (link to PubMed abstract)
Dear, S., Durbin, R., Hillier, L., Marth, G., Thierry-Mieg, J., and Mott, R. 1998. Sequence assembly with CAFTOOLS. Genome Research 8: 260–7. (link to PubMed abstract)
| Return to Faculty List | Biology Home | Graduate Studies |