General: 663

General: 664

Projects

Home



P7: An improved method and software for analyzing metagenomic data

Author: Joseph Paulson , Advisor: Mihai Pop (CS and CBCB)


Problem Statement Presentation

Project Proposal

Abstract

This document outlines the project proposal for the 2011-2012 AMSC 663/664 course series. The project is to develop Metastats 2.0, a software package analyzing metagenomic data. We propose two major improve- ments to the Metastats software and the underlying statistical methods. The rst extension of Metastats is a mixed-model zero-in ated Gaussian distribution that allows Metastats to account for a common characteristic of metagenomic data: the presence of many features with zero counts due to under sampling of the community. The number of 'missing' features (zero counts) is correlated to the amount of sequencing performed, thereby biasing abundance measurements and the di erential abundance statistics derived from them. In the second extension we describe new approaches for data normalization that enable a more accurate assessment of di er- ential abundance by reducing the covariance between individual features implicitly introduced by the traditionally used ratio-based normalization. We provide an introduction to the project and then provide an outline for the implementation, validation, and deliverable. A timeline for major milestones is provided.



MidYear Progress Report and Presentation

Final Presentation , Final Report