• Non ci sono risultati.

La libreria BioJava

N/A
N/A
Protected

Academic year: 2021

Condividi "La libreria BioJava"

Copied!
7
0
0

Testo completo

(1)

BioJava

An Open-Source Java Library

for Bioinformatics

(taken from M. Pocock, BioJava Consulting LTD, presentation)

What is BioJava?

Java code (Java2 required – 1.2 and higher)

Open-Source

Bioinformatics

Library for building Applications

Sequence Centric

Part of the Open Bioinformatics Foundation

(OBF)

(2)

Where is BioJava?

http://www.biojava.org

mailto:biojava-l@biojava.org

#biojava on irc.openprojects.net

Who is BioJava?

35+ Developers in most continents and

time-zones

Core team >5 individuals

(3)

A look at some API Stuff

BioJava is a collection of ~40 packages

organized in 9 main categories

Each category contains classes and interfaces

devoted to particular tasks, such as:

– sequence handling

– running external programs – utilities

– graphical interfaces – sequence alignment

What’s Been There for a While?

● Sequences with hierarchical features ● Sequence databases

● Sequence IO

– Various sequence formats (embl, genbank, gff, swissprot…) – Object model can be bypassed for high-performance scanning

● Probability distributions over symbols and Dynamic

programming toolkit

(4)

What’s Reasonably New?

● TagValue parser API ● Sequence Search APIs

– Interoperable with BioJava XML-based parsers for many common sequence search algorithms

● Pure-Java SSAHA implementation ● Bit-packed sequence storage ● Taxonomies

● Literature References ● Phred

What’s Recently Improved?

● Gap handling

– Consistent algebra for representing ambiguities (e.g. n), compound symbols (e.g. codons) and gaps

● DAS Client is now very robust

– Distributed sequence API allows DAS-like distributed sequence databases to be easily built and implemented

● More ‘framey’ annotation bundles ● Sequence Rendering

(5)

Java 1.4-reliant Source

● Java 1.4 offers APIs that are really useful for

Bioinformatics

– Logging

– NIO interfaces for fast IO and raw data access – Regular expressions

– Cascading Exceptions

● Biojava code relying on 1.4 APIs are conditionally built

– SSAHA implementation

– Some parsers and handlers for TagValue – Restriction enzyme digests

OBDA Support

● OBDA is a joint project between the various Open-Bio

projects which is attempting to establish a unified access route for sequences in local and remote sequence

databases.

● BIOCORBA – corba sequence interfaces

● BioSQL – relational tables and standard semantics for

storing sequences

● BioFetch – cgi-bin-based sequence fetching ● XEMBL – xml-based sequence fetching

● Bio Directories – configuration file for resolving

(6)

Things We’d Like To Do in the Near

Future

● Support non-DNA areas of Bioinformatics

– Cladistics, evolutionary trees, clusters – Expression data

– Proteomics

– Networks/pathways – Biochemical reactions

● Integrate pre- and post-1.4 exception systems ● Modify the change notification system

– Better synchronization and transaction support – Easier to optimize events that don’t have listeners – More robust handling of event cascades

What Will We See in BioJava 2?

● Pervasive use of Ontologies – Storing annotating data

– Definition of processing pipelines (e.g. customizing parsers) – Bindings between BioJava interfaces and external data sources

● Das, biosql, biocorba

– Pervasive querying making any BioJava application an Object Data Store with easy routes for data-providers to optimize searches ● Much more code generation

– Push most repetitive code into code generators – Auto-generate much of the event notification web

(7)

And the Biggest Change of All?

Make the library accessible to casual developers

for writing throw-away scripts as well as system

architects

– Documentation – Tutorials – Training

– Utility classes (e.g. SeqIOTools)

Some Contributors

Simon Brocklehurst Samiul Hasan Ron Kuhn Nimesh Singh Mike Jones Michael Jones Matthew Pocock Mathieu Wiepert Martin Senger Mark Schreiber Lei Lai Kim Rutherford Keith James Kalle Näslund Jason Stajich Hanning Ni Greg Cox Gerald Loeffler David Waring David Huen David H. Klatte Colin Hardman Brian Osborne Brian King Brian Gilman

Riferimenti

Documenti correlati

In conclusion, we sincerely appreciate Dr Rosario ’s suggestions to test the diagnostic performance of the Ca/P ratio in a particular setting of patients with NPHPT pre- senting

Controversy&mapping&is&a&teaching&and&research&method&derived&from&the&Science&and&

di integrazione della comunità, prodotto della sua cultura e portatore della sintesi dell’insieme dei valori dello Stato e del suo popolo. Lo Stato, dunque, non è più dominus,

The proposed EvFS method considers run-to-failure and right-censored degradation trajectories as agents whose state of knowledge on the actual RUL of the testing equipment

Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the

Los casos de PPC con valor de presente extendido se enmarcan en un contexto verbal del plano del presente, tal y como se observa en los ejemplos, donde para la narración se

The ultrasound examination revealed the presence of a cervix and corpus of a uterus, hypoplastic uterine horns, and small gonads with an echogenicity similar to a testis..

Each one has a different progenitor content and only in one fraction (CD49f ++ /P-Cad - ) we were able to detect a subset of adult stem cells able. to regenerate a