Metadata Management and Semantics in Microarray Repositories
The number of microarray and other high-through-put experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management.
We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frameword), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Brazma A Hingamp P Quackenbush J Sherlock G Spellman P Stoeckert C Aach J Ansorge W Ball CA Causton HC Gaasterland T Glenisson P Holstege FC Kim IF Markowitz V Matese JC Parkinson H Robinson A Sarkans U Schulze-Kremer S Stewart J Taylor R Vilo J Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001; 29(4):365-371.
MGED (Microarray Gene Expression Data) Society [Internet]. [cited 2011 May 3] ( http://www.mged.org/
Field D Sansone SA. A special issue on data standards. OMICS. 2006; 10(2): 84-93.
Barrett T Troup DB Wilhite SE Ledoux P Rudnev D Evangelista C Kim IF Soboleva A Tomashevsky M Marshall KA Phillippy KH Sherman PM Muertter RN Edgar R. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009. 37(Suppl. 1): D885-D890.
Parkinson H Kapushesky M Kolesnikov N Rustici G Shojatalab M Abeygunawardena N Berube H Dylag M Emam I Farne A Holloway E Lukk M Malone J Mani R Pilicheva E Rayner TF Rezwan F Sharma A Williams E Bradley XZ Adamusiak T Brandizi M Burdett T Coulson R Krestyaninova M Kurnosov P Maguire E Neogi SG Rocca-Serra P Sansone SA Sklyar N Zhao M Sarkans U Brazma A. ArrayExpress update-from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009; 37(Suppl. 1): D868-D872.
Sugawara H Ogasawara O Okubo K Gojobori T Tateno Y. DDBJ with new system and face. Nucleic Acids Res. 2008; 36(Suppl. 1): D22-D24.
Bioconductor open source software for bioinformatics [Internet]. [cited 2011 May 3] ( http://www.bioconductor.org
Zhu Y Davis S Stephens R Meltzer PS Chen Y. GEO-metadb: powerful alternative search engine for the GEO. Bioinformatics. 2008; 24(23): 2798-2800.
Bekel T Henckel K Küster H Meyer F Mittard RV Neuweger H Paarmann D Rupp O Zakrzewski M Pühler A Stoye J Goesmann A. The Sequence Analysis and Management System - SAMS 2.0: data management and sequence analysis adapted to changing requirements from traditional sanger sequencing to ultrafast sequencing technologies. J Biotechnol. 2009; 140(1-2): 3-12.
Te Pas MF Hulsegge I Coster A Pool MH Heuven HH Janss LL. Biochemical pathways analysis of microarray results: regulation of myogenesis in pigs. BMC Dev Biol. 2007; 7: 66.
Rayner TF Rocca-Serra P Spellman PT Causton HC Farne A Holloway E Irizarry RA Liu J Maier DS Miller M Petersen K Quackenbush J Sherlock G Stoeckert CJ Jr White J Whetzel PL Wymore F Parkinson H Sarkans U Ball CA Brazma A. A simple spreadsheet-based MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics. 2006; 7: 489.
ArrayExpress Functional Genomics Database at European Bioinformatics Institute. [cited 2011 May 3] ( http://www.ebi.ac.uk/microarray-as/ae/
DC (Dublin Core) Metadata Initiative [Internet]. [cited 2011 May 3] ( http://dublincore.org/
Lors RK van Ginneken AM van der Lei J. OpenSDE: a strategy for expressive and flexible structured data entry. Int J Med Inform. 2005; 74(6): 481-490.
MAQC Consortium Shi L Reid LH Jones WD Shippy R Warrington JA Baker SC Collins PJ de Longueville F Kawasaki ES Lee KY Luo Y Sun YA Willey JC Setterquist RA Fischer GM Tong W Dragan YP Dix DJ Frueh FW Goodsaid FM Herman D Jensen RV Johnson CD Lobenhofer EK Puri RK Schrf U Thierry-Mieg J Wang C Wilson M Wolber PK Zhang L Amur S Bao W Barbacioru CC Lucas AB Bertholet V Boysen C Bromley B Brown D Brunner A Canales R Cao XM Cebula TA Chen JJ Cheng J Chu TM Chudin E Corson J Corton JC Croner LJ Davies C Davison TS Delenstarr G Deng X Dorris D Eklund AC Fan XH Fang H Fulmer-Smentek S Fuscoe JC Gallagher K Ge W Guo L Guo X Hager J Haje PK Han J Han T Harbottle HC Harris SC Hatchwell E Hauser CA Hester S Hong H Hurban P Jackson SA Ji H Knight CR Kuo WP LeClerc JE Levy S Li QZ Liu C Liu Y Lombardi MJ Ma Y Magnuson SR Maqsodi B McDaniel T Mei N Myklebost O Ning B Novoradovskaya N Orr MS Osborn TW Papallo A Patterson TA Perkins RG Peters EH Peterson R Philips KL Pine PS Pusztai L Qian F Ren H Rosen M Rosenzweig BA Samaha RR Schena M Schroth GP Shchegrova S Smith DD Staedtler F Su Z Sun H Szallasi Z Tezak Z Thierry-Mieg D Thompson KL Tikhonova I Turpaz Y Vallanat B Van C Walker SJ Wang SJ Wang Y Wolfinger R Wong A Wu J Xiao C Xie Q Xu J Yang W Zhang L Zhong S Zong Y Slikker W Jr. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006; 24(9): 1151-1161.
Barrett T Troup DB S Wilhite SE Ledoux P Rudnev D Evangelista C Kim IF Soboleva A Tomashevsky M Edgar R. NCBI GEO: mining tens of millions of expression profiles — database and tools update (MINiML). Nucleic Acids Res. 2007; 35(Suppl. 1): D760-D765.
ISO 11179 Metadata Registries (MDR). [cited 2011 May 3] ( http://www.iso.org
SPARQL Query Language for RDF. [cited 2011 May 3] ( http://www.w3.org/TR/rdf-sparql-query/
Miller H Norton CN Sarkar IN. Genbank and PubMed: How connected are they? BMC Res Notes. 2009; 2: 101.
Garde S Chen R Leslie H Beale T McNicoll I Heard S. Archetype-based knowledge management for semantic interoperability of electronic health records. Stud Health Technol Inform. 2009; 150: 1007-1011.
REACTOME a curated knowledgebase of biological pathways. [cited 2011 May 3] ( http://www.reactome.org
hcard Format for representing people organizations and places. [cited 2011 May 3] ( http://www.w3.org/2006/03/hcard
vcard (Format for electronic business cards). [cited 2011 May 3] ( http://www.w3.org/TR/vcard-rdf
W3C PIM (Personal Information Management) Vocabulary. [cited 2011 May 3] ( http://www.w3.org/2000/10/swap/pim/
Esposito M. An ontological and non-monotonic rulebased approach to label medical images. Third Proceedings of the Third International Institute of Electrical and Electronics Engineers (IEEE) Conference on Signal-Image Technologies and Internet-Based System (SITIS) Shanghai People's Republic of China 16-18 December 2007: 603-611.
Protégé open source ontology editor and knowledge-base framework. [cited 2011 May 3] ( http://protege.stanford.edu/
W3C XML Schema Validator. [cited 2011 May 3] ( http://www.w3c.org/2001/03/webdata/xsv
W3C RDF Validation Service. [cited 2011 May 3] ( http://www.w3.org/RDF/Validator/
jDREW A Java Deductive Reasoning Engine for the Web (SPARQL RuleML support). [cited 2011 May 3] ( http://www.jdrew.org/
SWRL (Semantic Web Rule Language). [cited 2011 May 3] ( http://www.w3.org/Submission/SWRL/
W3C Rule Interchange Format. [cited 2011 May 3] ( http://www.w3.org/2005/rules/
SPARQLer an online RDF Query platform on the public domain. [cited 2011 May 3] ( http://www.sparql.org/query.html