The above diagram demonstrates the architecture of the software system we are building for the G4HE project. The core parts are:
Get data from remote sources
We have developed (and will develop more) processes that can be run on our server to retrieve relevant data from suitable remote sources. This section is built in a plugin style so that when other suitable remotes are identified, all we have to do is write another ingest method and add it to the list. After these processes run, we have a local copy of all relevant data, and we can re-run these processes whenever we need to re-sync with the remotes.
Build our own index
We then build a series of processor methods that will read from our local copy and alter it as required to meet the use cases of the G4HE project. This data will be stored in our own index for use by our own API and UI. Any time we need to tweak our data, we will be able to re-run these processes to re-build our index relying only on our local copy of the remotes – so we do not need to poll the remotes every time we rebuild our index.
Regular tidying and updating
After building our own index, we have another set of processes that run at regular intervals to do things like clean errors in the data and to identify entities that should be the same but for some reason are not. These processes will become more complex as the user interface (UI) provides better methods for the user community to interact with the data. Whenever we rebuild our index, we will either run these fixes back into the index during the re-build process, or re-run the tidying processes altogether (using a store of the changes submitted by users to bring the index back to the same state).
The G4HE API
Penultimately (or finally, depending on your requirement) we make all the processed data in our index available via our own customised API. This data can then be used by the HE community or, of course, interacted with via our own UI.
Our G4HE UI
Demonstrating the value of our own data processing and API exposures, we provide our own UI to the data on top of our own API. As per current requirements, this will offer collaboration reports via a simple interface including a download as CSV method, and also benchmarking reports with a similar download option. As we build more functionality onto the UI, for example the ability to specify groups of researchers, we will be able to send such specifications back via our API into our own index, providing ongoing improvement to the data.