Data Engineering
Data engineering works with scientific communities to make the best use of the data they collect and use. Data is a key ingredient driving modern science and as data sizes and data complexities increase, the need for well-engineered data infrastructure to keep data usable and reusable becomes ever more important.
Data Engineering designs and develop tools and services to support the good management of research data so that it can safely be captured, catalogued and stored, and then accessed by scientists for further analysis.
Expertise and technologies:
- Data annotation, metadata and ontologies
- Data catalogues and search tools
- Data integration and exchange
- Data integrity and security
- Data repositories
- Data representation and formatting
- Data transformation and aggregation
- Workflow management systems
Groups
DAFNI
This group provides the technical expertise for the DAFNI Programme [DAFNI – Scientific Computing, Science and Technology Facilities Council (STFC)].
The Group provides operational support to the DAFNI Platform [DAFNI login | Data & Analytics Facility for National Infrastructure – DAFNI], and undertakes a programme of development to extend the capability of the web-based user environment to support research into the Engineering of National Infrastructure Systems.
Further, this group works closely with the DAFNI user community, directly and within research projects to enable user to best use DAFNI to further their research aims.
The DAFNI Group Lead is Jens Jensen, with Tom Kirkham as the DAFNI Science Lead.
More information on DAFNI can be found on its dedicate website here.
The group works collaboratively with UKCRIC, ESPRC and JASMIN.
Data & Software Engineering
This group develops high-quality software enabling science facilities and researchers in different domains to manage their data, while using and promoting policies and best practices around research software and data.
Led by Antony Wilson, this group’s activities centre around the design, implementation and support of a wide range of software enabling researchers to catalogue and discover their experimental data; capture analysis workflows; make links between the different research outputs such as data, publications and software; support the research software development lifecycle; and use, promote and develop best practices around research software.
Current work includes the Diamond Data Store, Facilities Data Pipeline, FAIR Impact, and ALC projects.
Data Systems & Services
This group has expertise in all parts of the stack which we use to develop and manage services whilst also maintaining and monitoring the performance and security of the underlying systems.
Led by Gemma Poulter, this group provides services for the deposit, management and distribution of the research that STFC supports (e.g. data, software, documents). These systems maintain the integrity of the research and make it available to the research community to enable reusable and open science, in essence ensuring the research satisfies the principles of FAIR (Findability, Accessibility, Interoperability, and Reusability). To achieve this, the group adopt the most appropriate tools and processes which involve researching and developing new products in addition to evolving and maintaining existing ones.
Data Technology
This group unites IT specialists with a background in data intensive science and data intensive business applications.
Led by Vasily Bunakov, the group contributes to nationally and internationally funded projects focussed on building a digital research infrastructure and digital twins in physical sciences.
It works collaboratively with University of Southampton, UK Catalysis Hub, Henry Royce Institute.
Data Engineering provides the expertise and tools to collect, catalogue, and manage the data from research.
By keeping data accessible and reusable researchers can get the best science from data, now and into the future.
Dr Brian Matthews, Theme Lead, Data Engineering
Current Projects
Anvil
Anvil is a research software testing platform that is available and free to use for the following user groups:
- STFC internal staff
- UK academics working on CCP/HEC projects
- Existing (prior to 2019) Anvil users
BatCat
BatCAT is a research and innovation project funded by the EU’s Horizon Europe programme. The project is a collaboration between 18 partner organisations from 9 European countries, coordinated by NMBU.
The project aims to create a digital twin for battery manufacturing by developing a cross-chemistry data space for two technologies: Li-ion and Li-S coin cells and redox flow batteries. The project will also address three challenges in digital manufacturing: Design, operation, and trust.
BatCAT is closely connected to BIG-MAP and BATTERY 2030+, EOSC, EMMC, and OntoCommons, ensuring a community and industry uptake of the results.
CCP-WSI Catalogue of Research Outputs
CAT-WSI is a rich resource for the WSI community connecting Projects, Test Cases and Software currently, with further research outputs to be added soon. Its search capability enables enhanced discoverability, gaps in the community to be identified and potential repetition reduced. Catalogue users can find and compare codes, discover relevant Projects and Test Cases, form new and productive collaborations, and increase the visibility of their work resulting in higher impact. All this will help researchers and industry partners alike to achieve more in the field of Wave Structure Interaction.
DAFNI - Centre for Greening Finance and Investment
Centre for Greening Finance and Investment: UK Centre for Greening Finance and Investment (CGFI)
DAFNI - CReDO
The Climate Resilience Demonstrator programme: Climate Resilience Demonstrator (CReDo) – Connected Places Catapult
DAFNI - CROSSEU
For more information, please visit CROSSEU – Advancing climate resilience
DAFNI - NERC Digital Solutions
For more information, please visit £8m NERC Digital Solutions Programme – NERC Digital Solutions Programme scription
DAFNI -OpenLand
OpenLAND is a three-year project supported by £4 million of funding from UK Research and Innovation (UKRI) and government partners, through the Land Use for Net Zero, Nature and People (LUNZ) programme.
OpenLAND is led by the Tyndall Centre at the University of East Anglia (UEA) and funded by the Biotechnology and Biological Sciences Research Council within UKRI.
This project will provide decision makers with the insights urgently needed to put the UK on a path to deliver net zero emissions by 2050, while also delivering climate resilient soil health, food security, and biodiversity net-gain.
Building on the existing spatially explicit modelling framework, OpenCLIM, developed under previous UKRI funding, OpenLAND will re-use existing model connections on DAFNI to enhance or create new workflows.
DAFNI underpinned OpenCLIM and is a legacy space, accessible by stakeholders and researchers alike beyond the end of the project. OpenLAND will build on this legacy framework to solve further complex problems in climate resilience. Models and data not appropriate for DAFNI will be hosted on JASMIN.
eData
eData is the digital archive developed in collaboration with Open Science, that collects, preserves, and makes available research data produced or collected by STFC staff.
ePubs
ePubs developed in collaboration with Open Science, is the open archive for STFC research publications.
ICAT
The ICAT project provides a metadata catalogue and related components to support experimental data management for large-scale facilities, linking all aspects of the research lifecycle from proposal through to data and article publication. The ICAT catalogue originated at STFC, but is also used at ESRF, HZB, SESAME, ALBA and SIRIUS who together form the ICAT collaboration.
InfraPortal
InfraPortal is a UKRI-funded catalogue that contains information on hundreds of research and innovation infrastructures available to UK researchers and innovators. These include major equipment, resources such as collections, archives or scientific data, e-infrastructures such as data and computing systems, and communication networks.
PSDS
PSDS is an EPSRC-funded National Research Facility provided by the University of Southampton, and Science and Technology Facilities Council. Its purpose is to provide a common access point to data resources within the Physical Sciences to all staff, students and other members of UK academic institutions. By providing a common point of access, free at the point of use, the service aims to provide benefit to the research community by maximising the use of resources via common academic licencing, and adding value as a common hub for aggregating and integrating data resources for the Physical Sciences.
The online platform currently provides access to a number of state-of-the-art chemistry databases and tools for the benefit of the research community.
UrbanAIR
A newly funded project.
Past Projects
DAFNI - OpenCLIM
For more information, please visit OpenCLIM – Tyndall Centre for Climate Change Research
DOME
For more information, please visit DOME 4.0 – EU Project
EOSCPilot
For more information, please visit The European Open Science Cloud for Research Pilot Project. | EOSCpilot | Project | Fact sheet | H2020 | CORDIS | European Commission
ExPaNDS
For more information, please visit expands.eu
NERC-NetZero-DRI
For more information, please visit UKRI Net Zero Digital Research Infrastructure Scoping Project | net-zero-dri – UKRI Net Zero Digital Research Infrastructure Scoping Project
PaNOSC
For more information, please visit The Photon and Neutron Open Science Cluster (PaNOSC) – Panosc