La libreria BioJava

(1)

BioJava

An Open-Source Java Library

for Bioinformatics

(taken from M. Pocock, BioJava Consulting LTD, presentation)

What is BioJava?

●

Java code (Java2 required – 1.2 and higher)

●

Open-Source

●

Bioinformatics

●

Library for building Applications

●

Sequence Centric

●

Part of the Open Bioinformatics Foundation

(OBF)

(2)

Where is BioJava?

●

http://www.biojava.org

●

mailto:biojava-l@biojava.org

●

#biojava on irc.openprojects.net

Who is BioJava?

●

35+ Developers in most continents and

time-zones

●

Core team >5 individuals

(3)

A look at some API Stuff

●

BioJava is a collection of ~40 packages

organized in 9 main categories

●

Each category contains classes and interfaces

devoted to particular tasks, such as:

– sequence handling

– running external programs – utilities

– graphical interfaces – sequence alignment

What’s Been There for a While?

● Sequences with hierarchical features ● Sequence databases

● Sequence IO

– Various sequence formats (embl, genbank, gff, swissprot…) – Object model can be bypassed for high-performance scanning

● Probability distributions over symbols and Dynamic

programming toolkit

(4)

What’s Reasonably New?

● TagValue parser API ● Sequence Search APIs

– Interoperable with BioJava XML-based parsers for many common sequence search algorithms

● Pure-Java SSAHA implementation ● Bit-packed sequence storage ● Taxonomies

● Literature References ● Phred

What’s Recently Improved?

● Gap handling

– Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps

● DAS Client is now very robust

– Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented

● More ‘framey’ annotation bundles ● Sequence Rendering

(5)

Java 1.4-reliant Source

● Java 1.4 offers APIs that are really useful for

Bioinformatics

– Logging

– NIO interfaces for fast IO and raw data access – Regular expressions

– Cascading Exceptions

● Biojava code relying on 1.4 APIs are conditionally built

– SSAHA implementation

– Some parsers and handlers for TagValue – Restriction enzyme digests

OBDA Support

● OBDA is a joint project between the various Open-Bio

projects which is attempting to establish a unified access route for sequences in local and remote sequence

databases.

● BIOCORBA – corba sequence interfaces

● BioSQL – relational tables and standard semantics for

storing sequences

● BioFetch – cgi-bin-based sequence fetching ● XEMBL – xml-based sequence fetching

● Bio Directories – configuration file for resolving

(6)

Things We’d Like To Do in the Near

Future

● Support non-DNA areas of Bioinformatics

– Cladistics, evolutionary trees, clusters – Expression data

– Proteomics

– Networks/pathways – Biochemical reactions

● Integrate pre- and post-1.4 exception systems ● Modify the change notification system

– Better synchronization and transaction support – Easier to optimize events that don’t have listeners – More robust handling of event cascades

What Will We See in BioJava 2?

● Pervasive use of Ontologies – Storing annotating data

– Definition of processing pipelines (e.g. customizing parsers) – Bindings between BioJava interfaces and external data sources

● Das, biosql, biocorba

– Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches ● Much more code generation

– Push most repetitive code into code generators – Auto-generate much of the event notification web

(7)

And the Biggest Change of All?

●

Make the library accessible to casual developers

for writing throw-away scripts as well as system

architects

– Documentation – Tutorials – Training

– Utility classes (e.g. SeqIOTools)

Some Contributors

Simon Brocklehurst Samiul Hasan Ron Kuhn Nimesh Singh Mike Jones Michael Jones Matthew Pocock Mathieu Wiepert Martin Senger Mark Schreiber Lei Lai Kim Rutherford Keith James Kalle Näslund Jason Stajich Hanning Ni Greg Cox Gerald Loeffler David Waring David Huen David H. Klatte Colin Hardman Brian Osborne Brian King Brian Gilman