2008
Poster for ISMB 2008 presenting the notion of building various types of applications out of AIDA components that were created by diverse experts represented in the AID collaboration.
[Abstract]- Applications of AIDA knowledge management components
By Marco Roos, M. Scott Marshall, Piter T. de Boer, Kasper van den Berg, Sophia Katrenko, Edgar Meij, Willem R. van Hage, Pieter W. Adriaans
Given the important role of knowledge in biology, knowledge in a machine readable form can be an important asset for bioinformatics. We present two applications of AIDA (Adaptive Information Disclosure Application), a collection of knowledge management components. One is a workflow that extends a semantic model with putative relations between proteins and diseases extracted from literature by machine learning techniques. The other extends vBrowser, a virtual resource browser tool, with the ability to find relevant biological resources (e.g. data, workflows, documents) via semantic relationships.
Central to our semantic web approach is the separation of a ‘virtual knowledge space’ from its applications. In other words, knowledge is disclosed and accessed in a knowledge space rather than being coded into the application. The workflow adds knowledge to this space with knowledge extraction, while vBrowser accesses the knowledge resources for use during search. We use RDF and OWL to represent knowledge and Sesame (http://openRDF.org) to store RDF and OWL representations of knowledge.
The workflow contains the following steps: (i) add the ontology that you want to extend to Sesame (e.g. a model that contains the protein EZH2), (ii) extract the entities of interest from the ontology (e.g. EZH2), (iii) retrieve abstracts from Medline for these entities, (iv) extract proteins and protein-protein relationships from the abstracts, (v) add a ranking score to the discoveries, (vi) query OMIM with the extracted proteins and retrieve the disease labels (service from the National Institute of Genetics in Japan), (vii) add the discoveries and their interrelationships to the repository, (viii) export the enriched ontology to the knowledge space where for instance vBrowser can be used to explore the results. Future work includes metrics to more effectively retrieve biologically interesting suggestions from semantic data.
We show how the vBrowser can be used to browse both data resources and knowledge resources from the same basic interface. We show how vBrowser uses an AIDA thesaurus service to improve finding resources such as Medline documents and workflows on myExperiment.org. We found thesauri terms effective for search and advocate SKOS for its intuitive ‘broader/narrower-than’ relationships. We further show that the protein-disease relationships resulting from our knowledge capture workflow as well as the documents that contained these relationships can be accessed as knowledge resources from the vBrowser. We think OWL can adequately represent the knowledge in many biological cartoon models and have used it to represent the workflow provenance in our knowledge capture workflow.
- Applications of AIDA knowledge management components
2007

Posters for the UK e-Science All Hands meeting in 2007 and ISMB/ECCB 2007 in Vienna, Switzerland presenting the AIDA toolbox and a text mining workflow using Web services from the toolbox built by experts from different fields in the AID collaboration, and first semantic results.
[AIDA demonstration Abstract]- My BioAID: personalised text mining with web services
from the AIDA toolbox
Marco Roos, Sophia Katrenko, Willem R. van Hage, Edgar Meij, Frans Verster,
M. Scott Marshall, and Pieter Adriaans
The AIDA toolbox is a suite of web services for ontology-supported information extraction on the Grid. AIDA has routines for ontology alignment, ontology supported query construction, named entity recognition and concept learning. It is built on the basis of indexes as used by Lucene (an information retrieval engine), is embedded in a standard workflow environment, and supports semantic web standards such as RDF and OWL. The AIDA toolbox is a modular platform for dynamic adaptive information extraction in an e-Science
environment.
We demonstrate bioinformatics applications of the AIDA toolbox with workflows created in Taverna (http://taverna.sourceforge.net). The workflows extend a ‘seed’ of biological knowledge with knowledge extracted from MEDLINE abstracts. For instance, we can ask whether the information presented in a review that links epigenetics factors to human diseases is complete. We show how we turn such information into a small ‘proto-ontology’ using the ontology editor Protégé/OWL (http://protege.stanford.edu/), and how we extend that with diseases discovered in MEDLINE abstracts by a workflow that links information retrieval, information extraction, and ontology handling services from the AIDA toolbox, and a number of services provided by Taverna. The workflow produces an enriched ontology from the proto-ontology that was used as input. It shows that we can acquire additional information about a topic that would otherwise be impossible to obtain by reading any one review. The resulting knowledge is not as comprehensive as that which a human reader can achieve, but we obtain an unprejudiced overview of previously acquired knowledge that can potentially cut through boundaries of biological subdomains. Because input and results are in machine-readable forms, they can be stored and used in future applications.
Another important aspect of our demonstration is the versatility that a web service/workflow implementation provides. Instead of a single monolithic application, AIDA consists of components by which personalized workflows (applications) can be built. These workflows may also include services or workflows provided by others. We make no assumptions about how a bioinformatics study using AIDA components should appear. For instance, AIDA contains machine learning components that allow one to discover the biological concepts of one’s own choosing (e.g. diseases). Parts of workflows can be reused, for instance to discover bioinformatics resources on the web. We demonstrate this versatility, or ‘personalization’, by showing a number of alternative workflows.
- My BioAID: personalised text mining with web services