Home » Communication & Dissemination » G4HE and Open Data – what we have learned

G4HE and Open Data – what we have learned

Much of the Jisc-led G4HE project has been work to develop tools to provide management reports for universities, based on data from the Research Councils’ Gateway to Research (GtR).  This has been an illuminating experience in many ways, and that experience might be of interest to others looking to build services over open data.

Once we had an agreed set of requirements for the reports, in March 2013, the development team at Cottage Labs started creating initial, alpha versions of the reports to share with a group of research managers, our testers group.  This was part of the agile development methodology we used, and implied that we needed regularly to have iterations of the tools to show testers.  At the same time, however, we were working closely with the Research Councils to understand the GtR data.  These data are collated from seven Research Councils, from a number of source systems within those Councils, and the GtR project has put considerable effort into normalising these data to present to GtR users.  Nevertheless, because of its provenance, it has been challenging to understand the data sufficiently to develop iterations of the tools for our testers to explore, and we have often had to put caveats on those tools to the effect that our testers should focus on the functionality rather than the outputs of the tools, as we were not confident that those outputs were “accurate”.  The word “accurate” needs some explanation, however.  The data from GtR are “reliable”, in the sense that they do accurately reflect the activities recorded by the Research Councils during the process of awarding grants and recording the outcomes from those grants.  However, from a user’s perspective, those data could be misinterpreted, in the sense that they do not represent the phenomena that users think they do.  A key example for us has been the concept of “collaboration”, which has both precise and vague definitions for users and Research Councils (both individually and collectively), and that has made it quite hard to develop tools that report to research managers on the research collaborations in which their university is involved. The GtR project is refining its data dictionary to minimise misinterpretations, by users, of the data going forward.

The experiences have made us much more aware of the importance of understanding data provenance, and the need often for not only documentation on this, but also the need to speak with the source data architect who knows many of the nuances of the data being made available.  It has also made us aware of the challenges in making public data that were originally collected mainly for internal purposes.  While some of the data on GtR has been openly available before, through other Research Council systems, it has been collected mainly to help the Councils monitor and report on the research they fund, and to inform their planning.  Now that these data are intended to be used externally, for example by SMEs interested in working with academic researchers, their limitations with respect to the needs of other external users, such as research managers in universities, have become increasingly apparent.  Exposing the data does, however, make it possible for it to be supplemented by other data in services by and for third parties.  This is only possible where widely adopted keys and vocabularies are used to ensure semantic interoperability.  ORCID is one such key, for researcher names, and the Research Councils are looking at how to exploit the ORCID initiative in GtR.  This is part of a wider acceptance across data providers and users wherever possible to use commonly accepted definitions and protocols for exchanging information; allowing organisations to structure their data internally, to meet their specific needs, while presenting that data to the “outside” world using a common structure and standards so the data can be correctly interpreted and consumed by “external” users.  While there is a wider acceptance that adopting common standards is desirable, data providers do need well-evidenced business cases to enable them to persuade their parent organisations to commit to this approach.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: