Abstract:
|
BACKGROUND: Reduction in the cost of genomic assays has
generated large amounts of biomedical-related data. As a result,
current studies perform multiple experiments in the same
subjects. While Bioconductor's methods and classes implemented
in different packages manage individual experiments, there is
not a standard class to properly manage different omic datasets
from the same subjects. In addition, most R/Bioconductor
packages that have been designed to integrate and visualize
biological data often use basic data structures with no clear
general methods, such as subsetting or selecting samples.
RESULTS: To cover this need, we have developed MultiDataSet, a
new R class based on Bioconductor standards, designed to
encapsulate multiple data sets. MultiDataSet deals with the
usual difficulties of managing multiple and non-complete data
sets while offering a simple and general way of subsetting
features and selecting samples. We illustrate the use of
MultiDataSet in three common situations: 1) performing
integration analysis with third party packages; 2) creating new
methods and functions for omic data integration; 3)
encapsulating new unimplemented data from any biological
experiment. CONCLUSIONS: MultiDataSet is a suitable class for
data integration under R and Bioconductor framework. |