1 Introduction
1.1 Motivation
The FAIR principles are formulated in generic terms and without reference to any particular scientific discipline [1]. According to Jacobsen et al. (2020), ”[t]his has likely contributed to [their] broad adoption […], because individual stakeholder communities can implement their own FAIR solutions. However, it has also resulted in inconsistent interpretations that carry the risk of leading to incompatible implementations. Thus, […] for true interoperability we need to support convergence in implementation choices that are widely accessible and (re)-usable.” [2]
Communities who wish to develop their own FAIR conventions may greatly benefit from successful real-world examples that allow researchers and practitioners to identify, evaluate, and select best practices for research data management (RDM). However, the application of RDM in practical applications is still lacking in literature. Many approaches and models exist in theory, but their value to the community remains low if their practice is not systematically examined.
This paper presents lessons learned from the creation and management of FAIR data and metadata in two recent robotics field research projects, RoBivaL and DeeperSense, conducted at the Robotics Innovation Center (RIC) of the German Research Institute for Artificial Intelligence (DFKI) [3] [4]. The paper is an extended and revised version of a presentation given at the NFDI4Ing Conference 2023 [5].
We chose to present RoBivaL and DeeperSense together for two complementary reasons. On the one hand, their commonalities allow us to generalize RDM principles to some degree in certain circumstances: Both projects are field studies, conducted over extended periods of time by diverse research teams from multiple institutions and disciplines. Their differences, on the other hand, allow us to examine the application of RDM principles in different research scenarios: RoBivaL is a terrestrial robotics project studying the performances of different hardware systems at a single location in the context of agriculture. DeeperSense is an underwater robotics project developing an AI for the translation of sonar outputs into camera-like images based on training data collected both at a laboratory and at multiple field locations.
Openly available datasets are crucial to advancing the field of robotics. One reason is that robotics research often requires access to specialized hardware, sensors, and environments. Making high-quality data accessible to more researchers, including those without the means to collect such data on their own, fosters innovation from a wider range of perspectives. Furthermore, open datasets provide common ground for the community to develop and benchmark algorithms collaboratively on standardized data.
Our discussion of RDM, however, is not supposed to be applicable just to robotics. In principle, our findings can be applied to any collaborative (field) research project employing humans and technical systems for data acquisition in multiple steps and iterations. The purpose of our study is to derive requirements and strategies for the creation and management of “rich” metadata in the FAIR sense.
1.2 Outline
Section 2 presents related work on FAIR RDM in general, on open data in robotics in particular, and on formal knowledge representation in robotics.
Section 3 features brief summaries of RoBivaL and DeeperSense. We present their overall project objectives and contrast their base data requirements from a high-level perspective.
The main body of the paper is divided into three parts (Sections 4 - 6). Specific desiderata and additional related work are introduced at the beginning of each part if necessary. The models and concepts presented in these parts were extracted from the experience in RoBivaL and DeeperSense and shall be applied in future projects.
The first main part (Section 4) discusses the content dimension of FAIR RDM. We distinguish between executive metadata necessary for producers to achieve their project goals, and reusable metadata necessary for reusers to satisfy the FAIR principles. We introduce this distinction into the FAIR data debate, because we believe that data producers are more likely to adopt FAIR principles if the specific needs of producers are taken into account by the FAIR community. Both metadata types are illustrated with examples from RoBivaL and DeeperSense. Further, this section introduces base elements for a model of the metadata creation process at the micro level in the context of collaborative metadata management. We relate our model to the ”processing step” class of the Metadata4Ing (M4I) ontology. Though M4I acknowledges the existence of metadata, it does not appear to address the process of metadata creation.
The second main part (Section 5) expands the distinction between different stakeholder groups from the previous section and explores the social dimension of collaborative FAIR RDM more broadly. We argue that a FAIR research data manager acts as a link between three social domains where they perform different primary tasks. We are not aware of a discussion about the social implications of FAIR RDM, but we believe such a discussion to be indispensable for a definition of the FAIR data manager role.
The third main part (Section 6) examines the time dimension of collaborative and iterative FAIR RDM at the macro level. Based on a critical appraisal of prominent data lifecycle models, we suggest a model of a self-improving data lifecycle geared towards collaborative and iterative RDM. We introduce a data provision phase which is necessary for internal collaboration. Further, we introduce an evaluation phase at the end of the lifecycle complementing the planning phase at its beginning, to foster iterative improvement of the data management system. Lastly, we recognize that planning and evaluation are different kinds of activities than data creation, provision, processing, publishing, etc., which gives rise to a lifecycle model with two nested loops. Our model is illustrated with lessons learned from RoBivaL and DeeperSense.
2 Related work
This section presents some recent developments in open data and FAIR RDM with applications in robotics, and on formal knowledge representation in robotics.
The need for large-scale, real-world datasets in robotics is highlighted by the rise of Robotics Foundation Models (RFMs) [6]. Extensive multimodal training data can lead to high-level task performance in many different scenarios, as illustrated by the RT model class presented by Brohan et al. (2023) [7]. The RT-1-X model was trained on the Open X-Embodiment dataset, developed and published by Google DeepMind in 2024 in collaboration with over 20 research institutions [8] [9]. This initiative demonstrates the benefits of shared open data. Still, according to Firoozi et al. (2024), the scarcity of robot-relevant training data remains a major open research challenge in the improvement of RFMs [10].
Robotics-enabled marine research has seen some advancements towards FAIR RDM. Schoening et al. (2022) observe that published marine image datasets have been lacking metadata to describe their high technical heterogeneity; the authors propose a concept for image FAIR digital objects (iFDOs) as a remedy [11]. Motta et al. (2023) present a method for the creation of FAIR marine robotic telemetry data and metadata about marine robotic missions; they observe a general lack of controlled vocabularies in robotics [12]. In the context of space robotics, Dominguez et al. (2020) developed a modular framework for multisensor data fusion including a suite of data management tools; their approach for describing complex data processing systems might feed into FAIR metadata components for robotics [13]. Arundel et al. (2023) offer a data management case study focused on conveyance of big data over multiple stages; though motivated by geospatial data processing, their methods seem applicable to many domains, including robotics [14].
Several initiatives in science and industry are working to represent robotics knowledge using formal ontologies and terminologies. Olivares et al. (2019) review five ontology-based approaches to autonomous robotics and quote four additional ontological efforts in robotics that either don’t address autonomy or lack relevant qualities [15]. The IEEE 1872 group of standards comprises six ontologies which were released between 2015 and 2024 [16], [17], [18]. They define more than 100 terms addressing general robotics concepts, robot parts, pose, tasks, and autonomy. An additional IEEE ontology about reasoning on multiple robots is in development [19]. A parallel standardization enterprise at the terminological level are the ISO vocabularies for robotics [20] and for mobile robots [21]. Though ISO 8373 is a normative reference in IEEE 1872, there are considerable differences which must be navigated by practitioners. The ROS middleware offers the URDF specification for a formal description of kinematic and dynamic aspects of individual robots that consist of rigid links connected by joints [22]. A proposal to cure some shortcomings and limitations of URDF has not yet been addressed [23]. Considering applications of ontologies and terminologies in robotics research, Jorge et al. (2015) evaluate the POS ontology in a use case where heterogeneous robots and humans interact in a manufacturing task [24]. Neto et al. (2019) apply the CORA ontology in a simulated robotics reconnaissance mission with interaction between autonomous aerial and ground robots [25]. Yüksel (2023) explores the application of ontologies at the robot component level for the automated design of robotic systems in relation to robot tasks and capabilities; the work introduces the Korcut ontology family [26].
DeeperSense and RoBivaL did not employ any formal ontologies or terminologies for development or data management. This is in line with the usual practice in the respective research teams. Applying ontologies on top of the primary project requirements would have posed a major challenge exceeding the available resources. Formal knowledge representation with ontologies and terminologies will therefore be explored in future work.
3 Project summaries
This section gives brief summaries of the projects RoBivaL and DeeperSense, focusing on general project objectives and the base data that was created.
3.1 RoBivaL
The project RoBivaL [3] [27] was conducted between August 2021 and October 2023 by an interdisciplinary and multi-institutional team of roboticists and agriculture researchers in Germany. The project compared different robot locomotion concepts both from space research and agricultural applications on the basis of experiments conducted under agricultural conditions. The goal was to promote knowledge and technology transfer between space and agriculture research. While the experiment designs were inspired by the standards ISO 18646-1 [28] and ISO 18646-2 [29], the environmental properties were adapted to the agricultural context, and the main evaluation focus was on soil interaction. Four robots were used: Two having their origins in space applications, the other two developed for agriculture. The robots were subjected to six experiments addressing different agricultural challenges and requirements. Soil conditions were controlled and varied on the two dimensions moisture (dry, moist, wet) and density (tilled, compacted). Figure 1 gives an impression of selected experiments and robots in the field.
Field conditions and robot behavior were monitored with various sensors and measuring devices, partly on the robots and partly in the field, in order to document the experiment execution and to determine the robot performance. The data capturing devices, their roles and deployments are summarized in Table 1. (Video camera and Lidar on the system are greyed out, because, although available, they were not used in the project.) The entire dataset including comprehensive metadata is publicly available on the Zenodo platform [30].
Table 1: RoBivaL data capturing devices by purpose and deployment
Device on System | Device on System and in Field | Device in Field | |
System Monitoring |
|
|
|
System and Field Monitoring |
|
||
Field Monitoring |
|
|
3.2 DeeperSense
The project DeeperSense [4] [31] was conducted between January 2021 and December 2023 by an international, interdisciplinary, and multi-institutional team of researchers and domain experts in Germany, Spain, and Israel. This paper focuses on the German use case, which employed roboticists, sensor experts, and technical divers. The objective was to improve the safety of the divers, who work under dangerous conditions and therefore require constant monitoring and assistance. Existing safety systems rely on cameras, which is a problem in turbid water that limits visibility – just when the divers most need outside support. Sonars are more robust to turbidity, but conventional sonar output is difficult to interpret. DeeperSense therefore developed a neural network which translates sonar output into images that appear camera-like, thus combining the best aspects of both modalities. Figure 2 illustrates the sonar-to-image translation.
To gather training data, divers performing typical work tasks were recorded underwater with sonar and camera simultaneously. Figure 3 shows the training data collection setup schematically.
For the neural network to be able to handle different types and degrees of turbidity, the training data had to be varied accordingly. Since this is difficult to establish and control efficiently at a single time and location, data was captured during six sessions at four different locations, covering inside and outside conditions, natural and artificial water bodies, and different seasons. Figure 4 gives an impression of the field locations.
Selected parts of the sensor data were published on the Zenodo platform [32] [33]. Due to the size of the sensor data, it is currently impractical to make the entire corpus available online. Instead, the metadata was published as a standalone database [34], allowing researchers to select portions relevant for their use cases, which are made available on demand. This is an effort to comply with the FAIR principle A2 to make metadata accessible independently of the base data.
3.3 Comparison
Table 2 summarizes and constrasts the data-related properties of RoBival and DeeperSense as presented in Sections 3.1 and 3.2. An immediate takeaway is that the scope and form of the base data, its purpose and handling can be quite different between projects even at a single institute. A data management solution should be flexible enough to accommodate such variance.
Table 2: Summary of data-related project properties of RoBival and DeeperSense
RoBivaL | DeeperSense | |
Objective and method |
|
|
Base data |
|
|
Data acquisition in the field |
|
|
4 Content dimension: Executive metadata and rich reusable metadata
This section discusses the content dimension of metadata creation and management in RoBivaL and DeeperSense from the perspectives of data producers on the one hand, and potential reusers as characterized by the FAIR principles on the other. Subsection 4.1 introduces necessary background about high-level purposes of metadata, metadata semantics in the context of robotics and engineering in general, and the concept of metadata “richness” according to the FAIR principles. Subsection 4.2 lays out a collection of metadata topics from RoBivaL and DeeperSense for different purposes, and divides it into executive metadata relevant for producers and reusable metadata for public consumers based on the different motives of both parties. This analysis foreshadows the discussion of social aspects of metadata management in Section 5. Subsection 4.3 attempts to model the process of metadata creation abstractly and at the micro level for production purposes. We illustrate our model with examples from DeeperSense and RoBivaL, and compare it to the communication-oriented ”processing step” class from the Metadata4Ing (M4I) ontology.
4.1 Metadata purposes, semantics, and richness
Virtually every general metadata definition starts with the assertion that metadata is ”data about data” [35], [36], [37], [38], [39]. A common purpose-based classification distinguishes at a high level between descriptive, administrative, and structural metadata [40], [41], [42]: Descriptive metadata ”enables discovery, identification, and selection of resources”, administrative metadata ”facilitates the management of resources”, structural metadata ”describes relationships among various parts of a resource”, and is ”generally used in machine processing” [42].
There are many domain-specific approaches to model metadata semantics. For the communication of metadata in engineering disciplines including robotics, the NFDI4Ing community has developed the Metadata4Ing (M4I) ontology [43], [44], [45]. It features a generalized process model, centered around the ”processing step” class. This is an attempt at communicating multi-stage data processing to satisfy the FAIR principle R1.2 of detailed provenance tracking. We compare the M4I processing step class with our own metadata creation process model in Section 4.3.
The FAIR principles [1] (see Table 3) require metadata to include the identifier of the base data (principle F3) and to be independently accessible (A2). Special emphasis is put on “rich” metadata. The term is associated with findability (F2), but defined in the context of reusability (R1). In fact, from the formulation of R1 it appears that rich metadata is the essence of reusability. Its definition is left vague, which is likely intentional to allow the concept to be applied in various domains. Richness implies ”a plurality of accurate and relevant attributes”. The only specific attributes mentioned are a data usage license (R1.1) and provenance (R1.2). Further attributes must ”meet domain-relevant community standards” (R1.3). In our view, this means it is both possible and necessary to develop community-specific interpretations of metadata richness. The focus on reusability implies that richness must be explained from a user perspective.
Table 3: The FAIR principles [1]
|
|
|
|
4.2 Executive metadata and reusable metadata
While the FAIR principles promote the development of metadata for data reusers, data producers already create and manage metadata routinely for their own purposes. This does not imply that they would describe their own practice in these terms or use specific tools and methods. It means that some forms of metadata creation and management are just an innate part of being an effective researcher. Examples will be given below. What is the relationship between the executive metadata necessary for data production and the rich FAIR metadata supporting, enabling, or facilitating data reuse? This question has a content aspect and a form aspect: Which metadata topics are relevant for producers or reusers? And which formal requirements are demanded by either group? Since these questions address two different stakeholders, they foreshadow the discussion of social aspects of FAIR RDM in Section 5.
Table 4 presents the metadata topics of RoBivaL and DeeperSense categorized by project and by relevance for producers or reusers. Intersections are possible on both dimensions. Each topic is labeled with its dominant purpose(s), i.e., descriptive (D), administrative (A), or structural (S).
Table 4: Metadata topics relevant for producers or reusers in RoBivaL and DeeperSense. Predominant purposes: descriptive (D), administrative (A), or structural (S).
Both projects | RoBivaL | DeeperSense | |
Producer |
|
|
|
Producer and Reuser |
|
|
|
Reuser |
|
|
|
The assignment of topics to producers or reusers is guided by the assumption that either group has a different primary motive: Producers want a correct execution of their project plan to achieve their primary research goal. Reusers want a sufficient understanding of the base data to assess its utility, and to integrate it into their own work flow. Our assumption about producers is primarily based on our personal experience, i.e., they reflect the motives and requirements prevalent in the two examined projects and within our institutions more generally. We believe that these assumptions are neither surprising nor uncommon. The point here is to observe the contrast between producers and reusers. Our assumption about reusers is based on our interpretation of the FAIR principles. Both assumptions are further substantiated in Section 5.
The different motives also affect the formal requirements. Data producers care less if all metadata is specified and captured explicitely and formally, but tolerate tacit expert knowledge, code logic, informal communication, etc. For the sake of efficiency and expediency, they may limit content and form of metadata to what is essential to their needs. Reusers on the other hand require all metadata to be explicit, since they lack the immediate access to the creation context that producers have. To support efficient machine processing, metadata must be formalized. In order to cover a broad range of possible reuse cases, it must be rich in the FAIR sense.
4.3 Base elements of the metadata creation process
This section analyzes the process of metadata creation and derives some process-related metadata categories. The matter is treated abstractly and at the micro level, i.e., with regard to individual data elements; the big picture of the data lifecycle is discussed in Section 6. The analysis yields elements for the design of metadata production workflows. This is useful in a collaborative setting with a division of labor, where responsibilities must be communicated effectively.
The data flow diagram in Figure 5 illustrates the first order of metadata creation on a single data processing stage. Base data processing is represented vertically from top to bottom, metadata processing horizontally from left to right. The Output represents a piece of base data which is generated by some Procedure. Output and Procedure are the subjects of metadata. For both, metadata creation has two phases: Before the subject exists, it is designed; after it exists, it may be documented. The design is metadata that is injected into the Procedure; the documentation is metadata that is extracted either from the Procedure or from the Output.
The entire model can be stacked vertically to represent multi-stage data transformation, i.e., the Procedure may receive output of a previous stage as its input, the Output may serve as input to another procedure on a subsequent stage. This model facilitates division of labor by modularizing metadata both in the content domain (distinguishing metadata subjects Procedure and Output) and across time (distinguishing design and documentation phase).
Table 5 lists examples for each of the four first order metadata categories taken from the DeeperSense project. They are related to the same Procedure (”Capture camera and sonar images of a diver”) and corresponding Output (”Logfiles with raw camera and sonar data”).
Table 5: Examples of the four first order metadata categories from the DeeperSense project
Procedure-related | Output-related | |
Injected |
|
|
Extracted |
|
|
The assertion that metadata is data, as mentionend in Section 4.1, implies that metadata creation may be recursive: Higher orders of metadata can treat metadata of lower orders as their base data. Visually, this means we can stack the first order metadata creation model not just vertically, but also horizontally. This is illustrated in Figure 6. It contains a condensed version of Figure 5: The process ”Create MD” represents all four metadata creation processes of the first order, which are applied to the base Procedure and Output. ”First order MD” represents all four first order metadata types. The recursion recognizes that ”Create MD” and ”First order MD” themselves are a procedure-and-output pair, hence they become subjects of meta-metadata creation.
Table 6 gives two sets of generic examples for higher order metadata on multiple levels. The first example features pieces of literal base data and metadata: A speed measurement is taken at a certain time; the time stamp formatting is expressed in C string format notation; syntax and semantics of this formatting are governed by an ISO standard. The second example has a similar application pattern, but references files, which support structured data and semantic networking.
Table 6: Generic examples of higher order metadata
Base data | Metadata | Meta-Metadata | Meta-Meta-Metadata |
5.3 m/s | 2023-09-27 09:37:51 | %Y-%m-%d %H:%M:%S | ISO 8601 |
camera.mp4 | metadata.json | schema.json | https://json-schema.org |
How does our metadata process model compare to the processing step class of the Metadata4Ing (M4I) ontology depicted in Figure 7? The M4I model acknowledges that each data output is generated by a process, and that data processing may be chained, which corresponds to the vertical direction of our model. But the M4I model does not appear to cover the process of metadata creation itself, i.e., our model’s horizontal direction (injected vs. extracted metadata, higher order metadata). We assume this absence is at least partly a result of the purpose of the M4I model, which is communication of metadata to data consumers after the base data and metadata have been created. As mentioned above, the modularization of metadata in our model serves to design workflows for metadata creation by a data production team during project execution. A further difference between the M4I processing step and our model is that the former specifies a fixed set of attributes, while the latter is agnostic in this regard. Finally, the M4I processing step model provides the opportunity to encapsulate multiple substeps into a single step of larger scale. So far, our model does not feature a similar means of abstraction.
Figure 7: Processing step class of the Metadata4Ing ontology [43]. © Metadata4Ing Workgroup. License: CC BY 4.0 International
5 Social dimension: Collaborative FAIR data management in field research
This section discusses the social dimension of metadata creation and management from the perspective of a research data manager who follows FAIR principles. We argue that a FAIR manager acts as a link between three social domains, where they perform different primary tasks.
5.1 Collaboration with the data production team
The first social domain is the data production team. Here, the primary task of any data manager (irrespective of FAIRness considerations) is collaboration.
Collaborative research in general is challenging, because it involves a multitude of people who must be coordinated and accomodated. If they come from different disciplines and institutions, they may have different motivations, goals, expertise, responsibilities, standards, and practices. These individual attributes may not be equally transparent for everybody, and not be equally present in everyone’s mind, which can complicate intra-group communication.
Collaborative field research is particularly challenging: The pressure to perform is very high, because there are limited opportunities to go into the field; field conditions can be difficult and unpredictable, which often leads to unforeseen problems; equipment and people are put to unusual stress. The main priority is to get all people and systems to work at all at the designated time and place, and to capture the primary data that serves the project goal. This often requires improvisation and adaptation, because prototype systems may break or deviate from specification, and the captured data may not match earlier expectations. Figure 8 illustrates these notions in the context of DeeperSense.
Figure 8: Field team work in DeeperSense. On the last day of data collection, the team is on a boat on a lake, gathering a critical piece of data necessary for the final demonstration event. The underwater system keeps failing. Error messages on the computer screens are difficult to read due to the glaring sun. © DFKI, Christian Backe. License: CC BY 4.0 International
This assessment has two immediate implications for effective RDM in field research: First, RDM must be reliable and unobtrusive. A field research team wants their RDM to ease the effort, not stand in the way or cause extra concerns. Second, RDM must capture unforeseen events, so they can be factored into the preparation of future field missions.
5.2 Mediation between producers and reusers
The previous section dealt with RDM in general. For a FAIR data manager in particular, there is a second social domain, namely the larger research domain. Here, their primary task is mediation between conflicting requirements of their data production team on the one hand, and potential data reusers (as characterized by the FAIR principles) on the other.
We hinted at this conflict in Section 4.2 and can fully express it in light of Section 5.1: Reusers require explicit, formal, rich metadata to thoroughly understand the data that is foreign to them, easily interface with it using machines, and have it serve a broad spectrum of potential use cases. But this demands extra effort from the producers, who not only have the privilege of being more implicit, informal, and brief in their internal communication, but who may actually be forced to cut corners, especially under field conditions, in order to reach their primary research goal.
Table 7 summarizes this proposition and adds two aspects derived from experience in RoBivaL and DeeperSense: In a collaborative setting with division of labor, executive metadata may be distributed over many places convient for different contributors; to become reusable, it must be consolidated. While research is ongoing, the executive metadata design may need to evolve to adapt to changing circumstances; reusers prefer reliable APIs.
Table 7: Different priorities and requirements of data producers and reusers
Data producers | Data reusers |
Research execution | Data understanding and interoperation |
Tacit common knowledge | Explicit metadata files |
Ad-hoc communication | Formal specification, Ontologies |
Single actual use case | Several potential use cases |
Distributed information | Coherent information |
Flexible, evolving designs | Static APIs (keywords, structures) |
The conflicting priorities and requirements of data producers and reusers have two implications for a FAIR research data manager: First, they must motivate their team to apply the extra effort. One possible incentive may be that today’s producers are their own reusers tomorrow, so the investment in more elaborate metadata will pay off directly towards themselves. There is an indirect version of this: By creating metadata they would be happy to receive if they were reusers, producers influence the standards of their community to their own benefit. Another incentive may be increased impact of their research if the underlying data is broadly adopted in the community. A second implication is that FAIR research data managers must design the workflow of their team such that the extra effort necessary to satisfy reuser requirements does not coincide with peak effort towards the primary research goal, because the latter will always have precedence.
5.3 Standardization in the FAIR RDM community
The third social domain for a FAIR research data manager is the FAIR RDM community. Here, their primary task is to participate in the standardization of FAIR practices in a particular research domain and maybe across domains. We believe the outcome of this activity can be conceptualized as higher order metadata.
From a purely theoretical perspective, the metadata recursion could go on to unlimited orders. But in practice, of course, a cut-off is made, from which on the participants (i.e., data producers either among themselves or in relation to reusers) regard all higher metadata orders as common knowledge to be infered from context or prior convention. Still, the communication relies in principle on the assumption that all higher metadata orders could be delivered explicitly. One core role of the FAIR RDM community is to underwrite this assumption, i.e., to work towards a codification of common knowledge (including standards and open vocabularies) to which all participants can refer in their communication.
6 Time dimension: A self-improving data lifecycle
This section divides FAIR research data management into different tasks and organizes them across time. Subsection 6.1 discusses the concept of a data lifecycle and proposes some modifications to the type of lifecycle used by NFDI4Ing and similar parties. The two main modifications are the introduction of an internal data provision phase necessary for collaborative research, and the introduction of an evaluation phase to drive an iterative improvement of the RDM system. Subsection 6.2 presents some lessons learned from RoBivaL and DeeperSense in each phase.
6.1 Model of a self-improving data lifecycle
There is no consensus which phases constitute a data lifecycle and how the phases shall be ordered. In their survey of 76 data lifecycles, Shah et al. identify at least 14 phases [46]. NFDI4Ing uses a model with six phases, named Planning, Production, Analysis, Storage, Access, and Re-Use [47]. It is similar to other six-phase models prevalent in the FAIR RDM community [48], [49], [50] but there are still differences about the naming and ordering of the phases. These models have two shortcomings regarding their application to collaborative and iterative research.
First, while there is a phase in these models near the end of the cycle for making data available externally to the public (called “Publication”, “Access”, “Sharing”, or “Disclosure”), there is no equivalent phase dedicated to making the data available internally to the research team immediately after creation. In our experience, such a phase is necessary in collaborative research, and it has different requirements than the publication phase. We propose to call it Provision.
Second, almost all phases are actions that apply to data (data is produced, analyzed, …), except for Planning which is the only phase that applies to other actions (production is planned, analysis is planned, …). Another oddity about Planning is that it has no corresponding phase for looking into the past. In iterative research, comparing how things were planned to how they turned out would enable an iterative improvement of the RDM system. Since our research is in fact iterative, such a self-improving data lifecycle would be welcome. Therefore, we propose two additional phases called Execution and Evaluation. Together with Provision, they apply to each data-related action and thus form a separate loop nested with the data-related loop.
In summary, our proposed model has six data-related phases: Creation, Provision, Processing, Publication, Reuse, and Archiving. Each of these is divided into three process-related phases: Planning, Execution, and Evaluation. The model is illustrated in Figure 9.
The term Creation is chosen over Collection or Acquisition to emphasize the designed and fabricated nature of data and metadata. The broader term Processing is preferable over the narrow term Analysis, because we encounter a range of data processing activities in our practice, both primary (e.g. machine learning model development is synthesis, rather than analysis) and secondary (e.g., data cleaning, fusion, performance tuning, or quality assurance). The term Planning is to include Preparation. A single word is used here for brevity, but one should be aware that it does not signify purely cerebral activity, but also, e.g., handling of hardware.
6.2 Lessons learned from RoBivaL and DeeperSense
This section serves to illustrate the data lifecycle model discussed in Section 6.1 by presenting lessons learned in the different lifecycle phases of RoBivaL and DeeperSense. Many of the lessons are derived from failures, either to perform a task or to anticipate a challenge. Due to space constraints, we focus on consequences and leave specifics of the failures mostly implicit. Methods and strategies from Sections 4 and 5 are addressed where appropriate.
6.2.1 Creation
Planning
The planning of data creation deserves special care, because errors made during creation are typically difficult to repair. In the field, errors may not be repairable at all if the field conditions cannot be replicated or the cost of another deployment is prohibitive.
Data management needs to specify the scope and form of the metadata set, and provide tools and procedures for metadata creation. Workflows and responsibilities can be made transparent by sorting the planned metadata items into the classes discussed in Section 4, i.e., executive vs. reusable, injected vs. extracted, process- vs. output-related. The specification of first-order metadata items involves the creation of higher-order metadata content. In field research, the environmental conditions and the data creation process need to be documented more extensively than in the lab, because there are fewer means of control and more chances of surprise.
Terms coined during planning will propagate through a growing corpus of communication, documentation, and implementation. To avoid costly changes later, it is advisable to stabilize the terminology early on. FAIR terminologies must reflect community practices in their domains. This can either facilitate planning if a communal terminology already exists, or it can complicate planning if a terminology first needs to be compiled from scholarly sources. FAIR data managers may need to advocate for the requirements of reusers as discussed in Section 5.2.
Terminology requirements may be different for humans and machines. Machines need more consistency, less ambiguity, and may accept only restricted token sets. Inconsistency and ambiguity may arise, e.g., in interdisciplinary settings when different communities use different terms for the same thing or the same term for different things. Since humans are more flexible, a machine-consumable version is preferable for co-processed information, e.g., file and directory names. In this case, human collaborators must be educated on machine requirements.
Concerning the tensions discussed in Sections 4.2 and 5.2, the creation of purely executive or purely reusable metadata is relatively easy: Producers are intrinsicly motivated to fulfill their own needs, and pure reusage issues can be handled by the data manager alone. For metadata concerning both producers and reusers, however, the data manager again has to advocate for the reusers’ requirements and possibly bear the additional effort (or part of it) during execution.
Execution
In a collaborative setting, different people may observe different features of an object. This is an opportunity for the data manager to be a team player, as discussed in Section 5.1: Having one person responsible to record all observations avoids misalignment and ensures consistency, completeness, and uniform compliance with standards, e.g., related to accuracy or measuring units. The information relay requires structured communication and routines to avoid, detect, and correct miscommunication.
A designated record keeper can also take note of problems and unforeseen events which may help improve the planning of future data creation sessions. This may be performed proactively by looking out for and trying to prevent errors in the first place. As discussed in Section 5.1, field data creation can be cognitively very taxing, so it is easy to miss, e.g., a critical failure of a single component. Therefore, having someone specifically focused on error detection is useful.
Evaluation
DeeperSense and RoBivaL each had multiple data creation sessions, so there was reason and occasion to improve the data creation system during the project, e.g., by capturing additional metadata items, or by simplifying the creation process. This was partially countered by the requirement to have data and metadata be compatible between all sessions. To avoid this tension, it is advisable to perform pre-trials where the data creation system can be tested.
If the metadata recording task is delegated, e.g., due to illness, the recording tools must be usable for the delegate, who may be unfamiliar with the task and have additional resposibilities. Mandatory and important metadata items must be indicated. Content requirements must be clearly communicated. Number and complexity of items should be kept at a minimum.
6.2.2 Provision
Planning
Provision is dedicated to the needs of the original research team, in contrast to publication which caters to reusers. Therefore, provision seems to require executive metadata, while publication requires reusable metadata (see Section 4.2). Still, to ensure a smooth transition between the phases, it may be wise to gather reusable metadata already during provision.
The internal data repository must be layed out physically: How much data will be stored where and for which purpose? For example, there may be storage embedded in sensor platforms to collect raw data; file servers to consolidate, backup, and exchange data; database servers to validate, merge, filter, and aggregate data; workstations of different contributors to process and analyze data parts; high-performance servers for compute-intensive tasks.
Logically, the repository can be specified with different resolutions, on multiple layers and domains. Aspects to consider may be file trees, database schemas, and request APIs; encodings, types, and formats; sources and processing stages; separation of base data and metadata; auxiliary assets (e.g., documentation, specification, schemas, logs, errors). The terminology should be consistent between layers and domains, and be compatible with terminologies of the other phases. Again, in case of tensions between the needs of producers and reusers, the data manager may have to advocate for the reuser perspective (see Section 5.2).
Governance and administration of the internal repository as a shared resource must be specified. Who gets access to what? How are safety, security, availability, quality, and privacy established? Who is in charge for which procedures? Examples are consolidation of data from different sources, sessions, or processing stages; deduplication of redundant data; replication to prevent data loss; data removal to free resources; consistency checking and error management.
Execution
An explicit specification of the physical and logical layout can improve team alignment. User onboarding is an opportunity to check if the specification is properly understood and reflects the actual requirements. The layout and its specification may need to be updated to account for e.g., larger volume, changing pipelines, different data formats, etc. Such adjustments during execution may never fully be prevented; still they should be noted for evaluation.
DeeperSense and RoBivaL developed dedicated metadatabases to facilitate reporting (e.g., volume per data layer, sample count per sensor type and session, runs per experiment and robot). As standalone items, they can be transmitted separately from the large base data corpora. They provide information for decision-making both by the executing researchers (e.g., are there critical data gaps?) and reusers (e.g., is this dataset suitable for my use case?). Thus, the metadatabases are a further example of shared concerns as discussed in Section 4.2.
Evaluation
Physical and logical layouts emerge even if they are not expressly designed. They are implemented by contributors out of necessity to accomplish particular tasks, and are reinforced by continued use. To achieve interoperability and consistency in a collaborative setting, a patchwork of individual approaches must be consolidated.
But there is tension: Research data processing must be flexible enough to adjust to new findings and changing views. In interdisciplinary research, practices from different domains must be accommodated. Too much specification too early or too rigidly may lower the acceptance and adoption of a layout. Further, writing a comprehensive, accurate, and understandable specification may be difficult and time-consuming, thus conflicting with other priorities. On the other hand, working with undocumented, inconsistent layouts that need to be reverse engineered and might change without notice, lowers productivity and risks producing bad results. This dilemma shows that it may not always be obvious for a data manager how to follow the maxim expressed in Section 5.1 to ease the effort of the production team.
6.2.3 Processing
Planning
The processing phase is typically comprised of multiple stacked processing sub-stages, as discussed in Section 4.3 and depicted in Figure 5. Therefore, the processing phase may give rise to a lot of first-order metadata content. Injected metadata may already be created during planning, both related to the (sub-)processes and to their outputs. Extracted metadata will normally be created during execution. Some extracted metadata may be created during evaluation, e.g., if problems with the processes or their outputs must be documented.
Processing resources must be supplied for different tasks and stages. This includes individual workstations for all team members, and high performance servers that are used as a shared resource. If there are multiple contributors, it is important to specify who is responsible for which processing job, and what are the interfaces between consecutive steps in a processing pipeline.
Execution
One core responsibility of the data manager is metadata processing. In RoBivaL and DeeperSense, this was done in the context of developing and maintaining a metadatabase, involving schema design, metadata extraction, fusion, and aggregation. (These tasks require also creation and provision, and are planned before execution; they are highlighted here to mark the metadatabase as a processing tool.) The data manager may also be tasked with (meta)data quality assurance, which affects all other processing jobs. This involves the conception of error cases, error logging, escalation of errors, and resolution management.
If the results of a processing step need to be persisted for later consumption by other processing steps, this produces a feedback loop between processing and provision.
Evaluation
In case the input or output requirements of a processing step change, updates to the interface with its predecessor or successor steps may need to be negotiated.
6.2.4 Publication
Planning
The data repository where the data and metadata are to be published should match the given content. Where will the intended reusers be likely to look for data to match a certain use case? The journal Scientific Data recommends various data repositories geared towards particular natural and social sciences, as well as some generalist repositories [51]. For large datasets, space constraints by different repositories may have to be considered.
Execution
Data from RoBivaL and DeeperSense was published on Zenodo. The publisher requires filling out a form with platform-specific metadata, i.e., authors and contributors with affiliations and IDs, a summary description of the dataset, references to related publications, etc.
Evaluation
Typically, only a part of all data and metadata created during a project will be published. To facilitate the separation, it is advisable to store the parts dedicated for publication at a separate place from the beginning, or at least design the internal storage such that these parts are clearly marked and can be easily extracted.
6.2.5 Reuse
Reuse is different from the other data lifecycle phases, because its planning, execution, and evaluation are outside the purview of the data production team. We did not get any feedback from data reusers yet, so we currently cannot report any experiences about the reuse of data from RoBivaL or DeeperSense.
6.2.6 Archiving
The data from RoBivaL and DeeperSense has not been archived yet, so there is no experience to report.
7 Conclusion
This paper discussed the collaborative creation and management of rich FAIR metadata on three dimensions: the metadata content, the social relationships between metadata stakeholders, and the phases of metadata management over time. The discussion was illustrated with examples from the robotics field research projects RoBivaL and DeeperSense.
On the content dimension, we categorized metadata by different purposes, presented a broad spectrum of metadata topics, and discussed the relationship between executive metadata for data producers, and rich reusable metadata to satisfy the FAIR principles. We modeled the process of metadata creation at the micro level, introducing the concepts of injected and extracted metadata, and of higher order metadata.
One risk to consider here is the possibility of scope explosion in multiple directions: Firstly, since executive metadata covers many areas, metadata management for internal purposes might soon turn into general knowledge management. Secondly, since rich metadata lacks a comprehensive definition and is grounded in potential needs of data reusers, it is difficult to judge what must be included and what may be omitted. Thirdly, higher order metadata implies an infinite recursion which must be capped at a level that is reasonable for different stakeholders.
The purpose of higher order metadata is to create formal and accessible expressions of common knowledge and practice which may exist primarily in the heads of practitioners. This is difficult for multiple reasons, not least because it entails a social process: Who may contribute their expertise and how? Does everyone agree with an expression and how are conflicts resolved?
To get broadly adopted, FAIR RDM practices must make sense to data producers. We argued that this is more likely if producers see their own requirements and challenges taken into account. Still, caring for reusability may appear to many researchers as a burden that interferes with their primary goals. Therefore, we presented FAIR data production as team work where someone takes on the role of a dedicated FAIR RDM expert who at the same time provides immediate value to their research team. We attempted to contribute to a definition of this role and explain its competing demands, and we presented tools for the design and communication of FAIR RDM workflows that facilitate collaboration in data production teams.
Trust is a social aspect we omitted in our discussion, because it is a broad topic in itself and involves additional stakeholders. Data reuse depends on the assumption that the delivered data is not manufactured to deceive. Though not a FAIR principle, this is certainly a maxim of scientifc fairness in a broader sense. But even if their intentions are pure, producers may deceive themselves in thinking their data is accurate and represents reality. This problem is compounded when data is processed by different people on multiple stages, or fused from multiple providers. At the end of the data supply chain are people who apply, consume, or are otherwise affected by products derived from data. For them, trustworthyness may literally be a life-and-death issue. The DeeperSense sonar-to-camera translation is an example from our own research. Diving companies have expressed their motivation to solve the trustworthyness problem in this case.
On the time dimension, we divided the prevalent image of a simple data lifecycle into an outer and an inner cycle: The phases of the outer cycle are actions that apply to data (i.e., creation, provision, etc.). The phases of the inner cycle are actions that apply to each outer phase, namely planning, execution, and evaluation. Evaluation allows the data management system to improve over multiple research iterations.
One important challenge here is to find the right balance between flexibility and stability of the data management system. Flexibility is necessary to eliminate errors and inefficiencies in the system itself, and to be able to adapt to new insights and requirements for the primary research. Stability of the system facilitates its adoption, provides backwards compatibility, and allows one to devote more energy to primary research. The trick is to know when the system is good enough, and to stop improving when the marginal benefit becomes too small.
Data availability
RoBivaL data corpus, Sonar-to-RGB Image Translation for Diver Monitoring in Poor Visibility Environments, Fusion of Underwater Camera and Multibeam Sonar for Diver Detection and Tracking, Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation
8 Acknowledgements
The authors would like to thank the Federal Government and the Heads of Government of the Länder, as well as the Joint Science Conference (GWK), for their funding and support within the framework of the NFDI4Ing consortium. Funded by the German Research Foundation (DFG) - project number 442146713.
The RoBivaL project was funded by the Federal Ministry for Economic Affairs and Climate Action with grant number 50RP2150.
The DeeperSense project was funded by the European Union. Program: H2020-ICT-2020-2 ICT-47-2020. Project Number: 101016958.
9 Roles and contributions
Christian Backe: Conceptualization, Data curation, Investigation, Software, Visualization, Writing - original draft, Writing - review & editing
Veit Briken: Conceptualization, Writing - review & editing
Atefeh Gooran Orimi: Investigation, Project administration, Writing - review & editing
Rayen Hamlaoui: Investigation, Writing - review & editing
Malte Wirkus: Data curation, Funding acquisition, Investigation, Project administration, Software, Writing - review & editing
Bilal Wehbe: Data curation, Funding acquisition, Investigation, Project administration, Visualization, Writing - review & editing
Frank Kirchner: Funding acquisition, Supervision, Writing - review & editing
References
[1] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, et al., “The FAIR Guiding Principles for scientific data management and stewardship,” en, Scientific Data, vol. 3, no. 1, p. 160–018, Mar. 2016, ISSN: 2052-4463. DOI: http://doi.org/10.1038/sdata.2016.18. [Online]. Available: https://www.nature.com/articles/sdata201618 (visited on 10/13/2023).
[2] A. Jacobsen, R. de Miranda Azevedo, N. Juty, et al., “FAIR Principles: Interpretations and Implementation Considerations,” pp. 10–29, Jan. 2020. DOI: http://doi.org/10.1162/dint_r_00024.
[3] DFKI GmbH, RoBivaL - Robot Soil Interaction Evaluation in Agriculture, en-US, website, 2021. [Online]. Available: https://robotik.dfki-bremen.de/en/research/projects/robival/ (visited on 12/10/2023).
[4] DFKI GmbH, DeeperSense - Deep-Learning for Multimodal Sensor Fusion, en-US, website, 2021. [Online]. Available: https://robotik.dfki-bremen.de/en/research/projects/deepersense/ (visited on 12/10/2023).
[5] C. Backe, A. Gooran Orimi, V. Briken, R. Hamlaoui, and H. Görner, Creating Rich Metadata for Collaborative Research: Case Studies and Challenges, eng, Sep. 2023. DOI: http://doi.org/10.5281/zenodo.8430753. [Online]. Available: https://zenodo.org/records/8430753 (visited on 12/10/2023).
[6] S. Yang, O. Nachum, Y. Du, J. Wei, P. Abbeel, and D. Schuurmans, Foundation Models for Decision Making: Problems, Methods, and Opportunities, 2023. arXiv: 2303.04129. [Online]. Available: https://arxiv.org/abs/2303.04129.
[7] A. Brohan, N. Brown, J. Carbajal, et al., RT-1: Robotics Transformer for Real-World Control at Scale, 2023. arXiv: 2212.06817 [cs.RO]. [Online]. Available: https://arxiv.org/abs/2212.06817.
[8] Q. Vuong and P. Sanketi. “Scaling up learning across many different robot types.” (Oct. 2023), [Online]. Available: https://deepmind.google/discover/blog/scaling-up-learning-across-many-different-robot-types/ (visited on 11/14/2024).
[9] Embodiment Collaboration, A. O’Neill, A. Rehman, et al., Open X-Embodiment: Robotic Learning Datasets and RT-X Models, 2024. arXiv: 2310.08864 [cs.RO]. [Online]. Available: https://arxiv.org/abs/2310.08864.
[10] R. Firoozi, J. Tucker, S. Tian, et al., “Foundation models in robotics: Applications, challenges, and the future,” The International Journal of Robotics Research, vol. 0, no. 0, p. 02783649241281508, 0. DOI: http://doi.org/10.1177/02783649241281508. eprint: https://doi.org/10.1177/02783649241281508. [Online]. Available: https://doi.org/10.1177/02783649241281508.
[11] T. Schoening, J. M. Durden, C. Faber, et al., “Making marine image data FAIR,” Scientific Data, vol. 9, no. 1, p. 414, Jul. 2022, ISSN: 2052-4463. DOI: http://doi.org/10.1038/s41597-022-01491-3. [Online]. Available: https://doi.org/10.1038/s41597-022-01491-3.
[12] C. Motta, S. Aracri, R. Ferretti, et al., “A framework for FAIR robotic datasets,” Scientific Data, vol. 10, no. 1, p. 620, Sep. 2023, ISSN: 2052-4463. DOI: http://doi.org/10.1038/s41597-023-02495-3. [Online]. Available: https://doi.org/10.1038/s41597-023-02495-3.
[13] R. Dominguez, M. Post, A. Fabisch, R. Michalec, V. Bissonnette, and S. Govindaraj, “Common Data Fusion Framework: An open-source Common Data Fusion Framework for space robotics,” International Journal of Advanced Robotic Systems, vol. 17, no. 2, p. 1729881420911767, 2020. DOI: http://doi.org/10.1177/1729881420911767. eprint: https://doi.org/10.1177/1729881420911767. [Online]. Available: https://doi.org/10.1177/1729881420911767.
[14] S. T. Arundel, K. G. McKeehan, B. B. Campbell, A. N. Bulen, and P. T. Thiem, “A guide to creating an effective big data management framework,” Journal of Big Data, vol. 10, no. 1, p. 146, Sep. 2023, ISSN: 2196-1115. DOI: http://doi.org/10.1186/s40537-023-00801-9. [Online]. Available: https://doi.org/10.1186/s40537-023-00801-9.
[15] A. Olivares-Alarcos, D. Beßler, A. Khamis, et al., “A review and comparison of ontology-based approaches to robot autonomy,” The Knowledge Engineering Review, vol. 34, e29, 2019. DOI: http://doi.org/10.1017/S0269888919000237.
[16] “IEEE Standard Ontologies for Robotics and Automation,” IEEE Std 1872-2015, pp. 1–60, Apr. 2015. DOI: http://doi.org/10.1109/IEEESTD.2015.7084073. [Online]. Available: https://ieeexplore.ieee.org/document/7084073.
[17] “IEEE Standard for Robot Task Representation,” IEEE Std 1872.1-2024, pp. 1–32, Jun. 2024. DOI: http://doi.org/10.1109/IEEESTD.2024.10557559. [Online]. Available: https://ieeexplore.ieee.org/document/10557559.
[18] “IEEE Standard for Autonomous Robotics (AuR) Ontology,” IEEE Std 1872.2-2021, pp. 1–49, May 2022. DOI: http://doi.org/10.1109/IEEESTD.2022.9774339. [Online]. Available: https://ieeexplore.ieee.org/document/9774339.
[19] “IEEE Standard Proposal for Ontology Reasoning on Multiple Robots,” IEEE Std P1872.3, Sep. 2022. [Online]. Available: https://standards.ieee.org/ieee/1872.3/11037/.
[20] ISO 8373:2021, “Robotics - Vocabulary,” International Organization for Standardization, Standard, 2021. [Online]. Available: https://www.iso.org/standard/75539.html.
[21] ISO 19649:2017, “Mobile robots - Vocabulary,” International Organization for Standardization, Standard, 2017. [Online]. Available: https://www.iso.org/standard/65658.html.
[22] Open Robotics, URDF XML Specifications, Sep. 2022. [Online]. Available: http://wiki.ros.org/urdf/XML (visited on 10/16/2023).
[23] S. Chitta, URDF 2.0: Update the ROS URDF Format, Jan. 2016. [Online]. Available: https://sachinchitta.github.io/urdf2 (visited on 01/16/2025).
[24] V. A. Jorge, V. F. Rey, R. Maffei, et al., “Exploring the IEEE ontology for robotics and automation for heterogeneous agent interaction,” Robotics and Computer-Integrated Manufacturing, vol. 33, pp. 12–20, 2015, Special Issue on Knowledge Driven Robotics and Manufacturing, ISSN: 0736-5845. DOI: http://doi.org/10.1016/j.rcim.2014.08.005. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0736584514000660.
[25] A. B. d. O. Neto, J. A. Silva, and M. E. Barreto, “Prototyping and Validating the CORA Ontology: Case Study on a Simulated Reconnaissance Mission,” in Latin American Robotics Symposium (LARS), 2019 Brazilian Symposium on Robotics (SBR) and 2019 Workshop on Robotics in Education (WRE), Oct. 2019, pp. 341–345. DOI: http://doi.org/10.1109/LARS-SBR-WRE48964.2019.00066.
[26] M. Yüksel, “An expert knowledge representation based on real world experiences in proof-of-concept robot system design,” Available at http://doi.org/10.26092/elib/2337, PhD thesis, Universität Bremen, Jun. 2023.
[27] M. Wirkus, S. Hinck, C. Backe, et al., “Comparative study of soil interaction and driving characteristics of different agricultural and space robots in an agricultural environment,” Journal of Field Robotics, pp. 1–34, DOI: http://doi.org/10.1002/rob.22347. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rob.22347. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.22347.
[28] ISO 18646-1:2016, “Robotics — Performance criteria and related test methods for service robots. Part 1: Locomotion for wheeled robots,” International Organization for Standardization, Standard, 2016. [Online]. Available: https://www.iso.org/standard/63127.html.
[29] ISO 18646-2:2019, “Robotics — Performance criteria and related test methods for service robots. Part 2: Navigation,” International Organization for Standardization, Standard, 2019. [Online]. Available: https://www.iso.org/standard/69057.html.
[30] C. Backe, M. Wirkus, S. Hinck, et al., RoBivaL data corpus, Experiment Data, Zenodo, Jun. 2024. DOI: http://doi.org/10.5281/zenodo.12547116.
[31] B. Wehbe, N. Shah, M. Bande, and C. Backe, “Sonar-to-RGB Image Translation for Diver Monitoring in Poor Visibility Environments,” in OCEANS 2022, Hampton Roads, ISSN: 0197-7385, Oct. 2022, pp. 1–9. DOI: http://doi.org/10.1109/OCEANS47191.2022.9977024. [Online]. Available: https://ieeexplore.ieee.org/document/9977024 (visited on 12/10/2023).
[32] B. Wehbe, N. Shah, M. Bande, and C. Backe, Sonar-to-RGB Image Translation for Diver Monitoring in Poor Visibility Environments, Experiment Data, Zenodo, Mar. 2023. DOI: http://doi.org/10.5281/zenodo.7728089. [Online]. Available: https://zenodo.org/records/7728089 (visited on 12/10/2023).
[33] O. Köken and B. Wehbe, Fusion of Underwater Camera and Multibeam Sonar for Diver Detection and Tracking, Experiment Data, version 1, Zenodo, Dec. 2023. DOI: http://doi.org/10.5281/zenodo.10220989. [Online]. Available: https://doi.org/10.5281/zenodo.10220989.
[34] C. Backe, B. Wehbe, M. Bande, N. Shah, D. Cesar, and M. Pribbernow, Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation, Experiment Data, Zenodo, Dec. 2023. DOI: http://doi.org/10.5281/zenodo.10373154. [Online]. Available: https://doi.org/10.5281/zenodo.10373154.
[35] J. Furner, “Definitions of ”Metadata”: A Brief Survey of International Standards,” en, Journal of the Association for Information Science and Technology, vol. 71, no. 6, Jun. 2020, ISSN: 2330-1635, 2330-1643. DOI: http://doi.org/10.1002/asi.24295. [Online]. Available: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24295 (visited on 10/11/2023).
[36] K. G. Jefferey and R. Koskela, “RDA Metadata Principles and their Use,” en, Research Data Alliance, Tech. Rep., Nov. 2014, p. 6. [Online]. Available: https://rd-alliance.org/metadata-principles-and-their-use.html (visited on 06/05/2023).
[37] RFII - Rat für Informationsinfrastrukturen, Leistung aus Vielfalt. Empfehlungen zu Strukturen, Prozessen und Finanzierung des Forschungsdatenmanagements in Deutschland, de. Göttingen, May 2016.
[38] OECD, Data and Metadata Reporting and Presentation Handbook, en. OECD, Jun. 2007, ISBN: 978-92-64-03032-9 978-92-64-03033-6. DOI: http://doi.org/10.1787/9789264030336-en. [Online]. Available: https://www.oecd-ilibrary.org/economics/data-and-metadata-reporting-and-presentation-handbook_9789264030336-en (visited on 10/11/2023).
[39] National Institute of Statistical Sciences, “Metadata and Paradata: Information Collection and Potential Initiatives,” en, National Institute of Statistical Sciences, Expert Panel Report, Nov. 2010, p. 36. [Online]. Available: https://www.niss.org/research/metadata-and-paradata-information-collection-and-potential-initiatives (visited on 10/11/2023).
[40] W3C, Data on the Web Best Practices, en, Jan. 2017. [Online]. Available: https://www.w3.org/TR/dwbp/ (visited on 02/20/2024).
[41] J. Riley, Understanding Metadata: What is Metadata, and What is it For? (NISO Primer Series), en-US. National Information Standards Organization (NISO), 2017, ISBN: 978-1-937522-72-8. [Online]. Available: https://www.niso.org/publications/understanding-metadata-2017 (visited on 02/20/2024).
[42] M. Cofield, Metadata Basics, en, Feb. 2024. [Online]. Available: https://guides.lib.utexas.edu/metadata-basics (visited on 02/20/2024).
[43] S. Arndt, B. Farnbacher, M. Fuhrmans, et al., “Metadata4Ing: An ontology for describing the generation of research data within a scientific activity.,” en, Sep. 2023, Publisher: Zenodo Version Number: 1.2.0. DOI: http://doi.org/10.5281/zenodo.5957103. [Online]. Available: https://zenodo.org/record/5957103 (visited on 10/11/2023).
[44] M. Fuhrmans and D. Iglezakis, “Metadata4Ing - Ansatz zur Modellierung interoperabler Metadaten für die Ingenieurwissenschaften,” en, Aug. 2020, Publisher: Zenodo. DOI: http://doi.org/10.5281/zenodo.3982367. [Online]. Available: https://zenodo.org/record/3982367 (visited on 10/11/2023).
[45] B. Schembera and D. Iglezakis, “EngMeta: Metadata for computational engineering,” en, International Journal of Metadata, Semantics and Ontologies, vol. 14, no. 1, p. 26, 2020, ISSN: 1744-2621, 1744-263X. DOI: http://doi.org/10.1504/IJMSO.2020.107792. [Online]. Available: http://www.inderscience.com/link.php?id=107792 (visited on 10/11/2023).
[46] S. I. H. Shah, V. Peristeras, and I. Magnisalis, “DaLiF: A data lifecycle framework for data-driven governments,” en, Journal of Big Data, vol. 8, no. 1, p. 89, Dec. 2021, ISSN: 2196-1115. DOI: http://doi.org/10.1186/s40537-021-00481-3. [Online]. Available: https://journalofbigdata.springeropen.com/articles/10.1186/s40537-021-00481-3 (visited on 10/13/2023).
[47] D. Schmitz and M. Politze, “Forschungsdaten managen – Bausteine für eine dezentrale, forschungsnahe Unterstützung,” de, o-bib. Das offene Bibliotheksjournal / Herausgeber VDB, vol. 5, no. 3, pp. 76–91, Sep. 2018, Number: 3, ISSN: 2363-9814. DOI: http://doi.org/10.5282/o-bib/2018H3S76-91. [Online]. Available: https://www.o-bib.de/bib/article/view/5339 (visited on 02/23/2024).
[48] NFDI4Ing, Trainings zum Datenlebenszyklus (Data Life Cycle, DLC). [Online]. Available: https://nfdi4ing.pages.rwth-aachen.de/education/education-pages/main/html_slides/startpage_dlc.html#/ (visited on 02/22/2024).
[49] NFDI4Chem, Research Data Life Cycle, en-GB. [Online]. Available: https://knowledgebase.nfdi4chem.de/knowledge_base/docs/data_life_cycle/ (visited on 02/22/2024).
[50] UK Data Service, Research data management, en-US. [Online]. Available: https://ukdataservice.ac.uk/learning-hub/research-data-management/ (visited on 02/22/2024).
[51] Scientific Data. “Data repository guidance.” ISSN: 2052-4463. (2021), [Online]. Available: https://www.nature.com/sdata/policies/repositories (visited on 04/01/2024).