I’ve been at the Kasabi hack-day today. Kasabi is a start-up, currently in beta, and set up by the team semantic web company Talis to create a platform for easy-publishing of linked data, and for providing end-users with APIs and tools to dig into linked datasets.
Given some linked data, Kasabi takes care of providing linked data URIs (identifiers) that you can look-up (dereference) to get a human or machine readable view of data, and providing a SPARQL endpoint for running queries against the dataset. In addition if offers helpful search and lookup APIs, and an out-of-the-box Google Refine reconciliation service. Plus you can define your own APIs, using stored SPARQL queries, or Linked Data API Syntax, packaging up common queries or tasks for end-users, and transforming the results into many different formats.
Converting IATI to Linked Data
To explore the potential of the Kasabi platform I decided to write some XSLT that would transform IATI data coming from the IATI Explorer Toolkit into the RDFXML format. Given the limited time, I didn’t worry too much about how the data was modeled, nor about the long-term identifiers that elements in the dataset might have, but instead, using the IATIStandard.org activity standard documentation, a bit of intuition, and with some tips from Linked Data Patterns co-author Leigh Dodds, I’ve mocked up a very basic transformation from IATI into linked data.
The XSLT I’ve used is now up on GitHub, and is open to modification and updates – so if you’ve got ideas for how an IATI RDF model could be improved, do check out a copy and get updating. The current model is particularly in need of more links to other datasets, and could benefit from greater re-use of existing vocabularies (and a lot more commenting and annotation). It also needs to handle missing values a lot better.
With a few custom php scripts I’ve been able to grab all 14,000 or so currently published iati-activities and upload them into the Kasabi data store.
Using the Linked Data
Before you can access the linked data in the beta you will need to sign-up and get an API key, and subscribe to all the APIs for IATI data. Once subscribed, anytime you need access to something, just make sure your apikey is passed via the ?apikey= querystring parameter.
With an API Key any URIs within the data.kasabi.com domain can be dereferenced. Look them up with a browser, and you get a human-accessible view of the data. Look them up with an application that supports RDF and you should get back data.
From the dataset page (http://beta.kasabi.com/dataset/iati/) on the main Kasabi site you’ll find a number of standard APIs you can use too. The Search API lets you find countries, organizations or terms within the data. The Reconcilliation API can be used from Google Refine. The Augmentation API would, if we had enough links articulated in our store, make it possible to combine knowledge from different datasets.
The SPARQL API is particularly powerful, as it lets us run queries across the whole dataset. For example, the SPARQL query below allows us to find all the projects with UNICEF as a participating organization.
|1 2 3 4 5 6 7 8 9 10 11||
As our IATI data model becomes richer, and we are able to add more data to an IATI data store, such queries can become more advanced.
Linked data has a steep learning curve – and it’s still not an easily accessible end-user technology. But platforms like Kasabi are making it easier to work with – and some of the other projects demonstrated in today’s hack-day, including using the Views modules in Drupal to query remote SPARQL datasets, and make data available for exploration and visualization from within the Drupal interface, hold some impressive potential.
We’ve got some way to go before there is a clear IATI linked data model, but we’re also not far off being able to get some really interested prototypes working to explore the benefits of doing so and to help the process on it’s way.