Home » Technical » G4HE architecture overview

G4HE architecture overview

The above diagram demonstrates the architecture of the software system we are building for the G4HE project. The core parts are:

Get data from remote sources

We have developed (and will develop more) processes that can be run on our server to retrieve relevant data from suitable remote sources. This section is built in a plugin style so that when other suitable remotes are identified, all we have to do is write another ingest method and add it to the list. After these processes run, we have a local copy of all relevant data, and we can re-run these processes whenever we need to re-sync with the remotes.

Build our own index

We then build a series of processor methods that will read from our local copy and alter it as required to meet the use cases of the G4HE project. This data will be stored in our own index for use by our own API and UI. Any time we need to tweak our data, we will be able to re-run these processes to re-build our index relying only on our local copy of the remotes – so we do not need to poll the remotes every time we rebuild our index.

Regular tidying and updating

After building our own index, we have another set of processes that run at regular intervals to do things like clean errors in the data and to identify entities that should be the same but for some reason are not. These processes will become more complex as the user interface (UI) provides better methods for the user community to interact with the data. Whenever we rebuild our index, we will either run these fixes back into the index during the re-build process, or re-run the tidying processes altogether (using a store of the changes submitted by users to bring the index back to the same state).


Penultimately (or finally, depending on your requirement) we make all the processed data in our index available via our own customised API. This data can then be used by the HE community or, of course, interacted with via our own UI.


Demonstrating the value of our own data processing and API exposures, we provide our own UI to the data on top of our own API. As per current requirements, this will offer collaboration reports via a simple interface including a download as CSV method, and also benchmarking reports with a similar download option. As we build more functionality onto the UI, for example the ability to specify groups of researchers, we will be able to send such specifications back via our API into our own index, providing ongoing improvement to the data.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: