<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2941-1300</journal-id>
<journal-title-group>
<journal-title>ing.grid</journal-title>
</journal-title-group>
<issn pub-type="epub">2941-1300</issn>
<publisher>
<publisher-name>Universit&#228;ts- und Landesbibliothek Darmstadt</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.48694/inggrid.4267</article-id>
<article-categories>
<subj-group>
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Data-Producing Methods in CRC 985: Recommendations for Research Data Management in Large Interdisciplinary Projects</article-title>
<subtitle>CRC 985: Functional Microgels and Microgel Systems</subtitle>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-4695-8532</contrib-id>
<name>
<surname>Parks</surname>
<given-names>Nicole A.</given-names>
</name>
<xref ref-type="aff" rid="aff-au">au</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Kr&#246;ckert</surname>
<given-names>Konstantin W.</given-names>
</name>
<xref ref-type="aff" rid="aff-su">su</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0006-7311-5927</contrib-id>
<name>
<surname>Cla&#223;en</surname>
<given-names>Fabian</given-names>
</name>
<xref ref-type="aff" rid="aff-du">du</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-4592-8171</contrib-id>
<name>
<surname>Richtering</surname>
<given-names>Walter</given-names>
</name>
<xref ref-type="aff" rid="aff-du">du</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-2545-5258</contrib-id>
<name>
<surname>M&#252;ller</surname>
<given-names>Matthias</given-names>
</name>
<xref ref-type="aff" rid="aff-au">au</xref>
</contrib>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0002-4354-4353</contrib-id>
<name>
<surname>Herres-Pawlis</surname>
<given-names>Sonja</given-names>
</name>
<email>sonja.herres-pawlis@ac.rwth-aachen.de</email>
<xref ref-type="aff" rid="aff-su">su</xref>
</contrib>
</contrib-group>
<aff id="aff-su"><label>su</label>Institute of Inorganic Chemistry, RWTH Aachen University, Aachen, Germany</aff>
<aff id="aff-au"><label>au</label>IT Center, RWTH Aachen University, Aachen, Germany</aff>
<aff id="aff-du"><label>du</label>Institute of Physical Chemistry, RWTH Aachen University, Aachen, Germany</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2025-05-16">
<day>16</day>
<month>05</month>
<year>2025</year>
</pub-date>
<pub-date pub-type="collection">
<year>2025</year>
</pub-date>
<volume>3</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>30</lpage>
<history>
<date date-type="received" iso-8601-date="2024-03-07">
<day>07</day>
<month>03</month>
<year>2024</year>
</date>
<date date-type="accepted" iso-8601-date="2025-03-20">
<day>20</day>
<month>03</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2025 The Author(s)</copyright-statement>
<copyright-year>2025</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>The text of this work is released under the Creative Commons license CC BY 4.0 International. You can find the contract text of the license at <uri xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</uri>. The illustrations are excluded from this license, here the copyright lies with the respective rights holder.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://www.inggrid.org/articles/doi.org/10.48694/inggrid.4267/"/>
<abstract>
<p>Large, interdisciplinary projects produce various type of data underlying their published results. To gain a deeper understanding of the data produced, a survey was conducted in a project comprising the fields of chemistry, physics, engineering and life sciences, with the intention to improve the research data management.</p>
<p>Based on the collected information as well as feedback from researchers, we outline a holistic research data management approach, starting at the individual research group level. Here, we focus on data governance, documentation, and data exchange formats. We tie this together at the project level with a focus on data workflows for a collaborative data management and recommend data publication and archival solutions for this specific project. As a whole, this strives to provide researchers with the basic framework to efficiently work and manage their research data while producing understandable and reusable results in line with the FAIR principles.</p>
</abstract>
<kwd-group>
<kwd>Data</kwd>
<kwd>chemistry</kwd>
<kwd>microgels</kwd>
<kwd>research data management</kwd>
<kwd>collaborative projects</kwd>
<kwd>CRC 985</kwd>
<kwd>INF</kwd>
<kwd>data-producing methods</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="S1">
<title>1 Introduction</title>
<p>The collaborative research center (CRC)<xref ref-type="fn" rid="n1">1</xref> 985 <italic>Functional Microgels and Microgel Systems</italic> has studied microgels, soft colloidal macromolecular compounds that find applications in many different fields, for over two funding periods, the current third funding period being its final. The project brings together research groups from numerous chemical institutes, chemical engineering, physics, biotechnology, and the life sciences, with RWTH Aachen University, DWI - Leibniz Institute for Interactive Materials, the RWTH Aachen University Hospital (UKA), and Forschungszentrum J&#252;lich (FZJ) cooperating with each other. In total, roughly 40 groups, currently involving approx. 90 principal investigators (PIs), post-doctoral researchers, or doctoral researchers, have or are actively contributing to the project. Over 300 scientific publications have been produced so far.</p>
<p>In the first funding period, which began in 2012, the research data management (RDM) structure included a Microsoft SharePoint, while Mattermost was introduced as an instant-message communication system. On this basis, information could be shared and communicated across research areas as well as internally in smaller groups. Furthermore, during the previous funding periods, a sample management system was integrated into SharePoint to track sample history, while implementing a universal naming system throughout the CRC and assigning persistent identifiers (PIDs) [<xref ref-type="bibr" rid="B1">1</xref>]. Until the third funding period, the INF project largely focused on establishing collaborative digital systems in the first funding period and improving upon these to increase acceptance in the second. At this point, consulting in terms of RDM also increased.</p>
<p>General guidelines for data publication were established, yet, most data was shared and stored in a manner that did not follow any specific standards. The researchers&#8217; best practice has thus been to document their work in the form of individually written texts, digital or analog, and to save raw and/or processed measurement data in an individual project folder. Storing data across projects with the same structure and making it accessible for future projects is challenging with this approach. One reason for this is that different templates would have to be developed individually for different tasks, or new software would have to be developed for this purpose explicitly for this CRC. Similar statements regarding this problem description for projects of this scale have been published in other CRCs [<xref ref-type="bibr" rid="B2">2</xref>], [<xref ref-type="bibr" rid="B3">3</xref>].</p>
<p>From today&#8217;s perspective, proficient RDM requires much more, e.g., the sharing and archiving of data according to the FAIR (findable, accessible, interoperable, reusable) principles that were introduced in 2016 [<xref ref-type="bibr" rid="B4">4</xref>], coinciding with the second funding period as well as the establishment of a central RDM team at RWTH Aachen University. At their core, these guiding principles build upon one another to ultimately ensure a dataset&#8217;s reusability. For research data, they carry implications for both those producing the data, e.g., researchers, but also for those providing infrastructure such as research data repositories [<xref ref-type="bibr" rid="B5">5</xref>]. Implementing practices and tools that enable FAIR throughout each stage of a research project also facilitates FAIR in the long run. Large, interdisciplinary projects can benefit from these practices as participants can efficiently find, access, and (re)use data produced by their collaborating partners or predecessors, e.g., from previous funding periods.</p>
<p>Fully functional RDM infrastructures and information standards are still a work in progress. The German National Research Data Infrastructure (NFDI; German: Nationale Forschungsdateninfrastruktur) and its discipline-specific consortia aim to move this progress along [<xref ref-type="bibr" rid="B6">6</xref>]. In the area of chemistry, NFDI4Chem strives to not only set up a system of repositories for data sharing and archival, but also to establish minimum information and format standards to ensure data remains reusable and interoperable [<xref ref-type="bibr" rid="B7">7</xref>]. These efforts should inform the research communities&#8217; RDM practices, while the consortia also require researchers&#8217; input to best suit their needs.</p>
<p>As part of the CRC 985 Information and Infrastructure (INF) project, we present an overview of the diversity in a research project of this magnitude in terms of the number of data-producing methods and the variety of associated data. A survey to gather relevant information lays the foundation of this work. Based on this information as well as on formal and informal exchange with CRC project members, we discuss how to deal with such a variety of data in future projects in terms of project preparation, recommended RDM practices regarding storage, publication, archival and the accompanying data formats, and communication and awareness among participating researchers. Furthermore, as a project which includes many chemical and chemistry-related disciplines, the information presented here can inform the efforts and goals within NFDI consortia such as NFDI4Chem.</p>
</sec>
<sec id="S2">
<title>2 Methodology</title>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> shows the general approach taken for this work. Stage 1 focused on gathering information within CRC 985. To this end, the INF project compiled a structured questionnaire [<xref ref-type="bibr" rid="B8">8</xref>] to survey the data-producing methods and workflows throughout the CRC. It then acquired contacts for RDM-related topics for the various research groups and subprojects by contacting the relevant PI. The first version of the questionnaire was then distributed to the supplied contacts. In most cases, the contacts named were PhD candidates working within CRC 985, yet, also included more senior research staff in some cases.</p>
<fig id="F1">
<caption>
<p><bold>Figure 1:</bold> Targeted incremental approach to provide an overview of the project&#8217;s data scope and set the basis for future RDM improvements.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g1.png"/>
</fig>
<p>The first version of the questionnaire focused on the methods themselves, aiming first and foremost to understand technical aspects such as device specifications, output data formats and volume, and frequency of use within for the CRC and within the respective research group. Two issues soon became apparent: (1) The results lacked certain information that would be useful to the INF project, especially regarding current RDM practices such as data workflows and documentation, and (2) some terminology, such as metadata or controlled vocabulary (a term added to the second version), or the questions themselves were unclear to the participants.</p>
<p>Thus, the questionnaire underwent two revisions. The third and final version split the questionnaire into two parts: one regarding each method used, gathering details as described above, and a second regarding overall RDM practices such as the use of an electronic laboratory notebook (ELN), the implementation of the CRC 985 policy on data, and the use of the sample management system. Definitions of terminology were added as well. This granted participants the opportunity to answer the questions independently and gather information in advance of face-to-face exchanges. The first part now also included a question on data workflows, specifically, how data are transferred from the device computer to other servers or data management systems, aiming to determine if data workflows could benefit from automation.</p>
<p>The questionnaire versions were maintained using the central CRC 985 SharePoint. These surveys and exchanges took place starting in 2021 through 2023.</p>
<p>In the second stage, the INF project compiled an overview of the gathered information on data-producing methods. This serves as a resource on available methods and contacts for CRC 985 and was therefore published on the project&#8217;s SharePoint for easy reference.</p>
<p>The third stage, recommendations, employs the data collected and tabular overview created in the previous stage as well as general information and feedback collected in a rather informal manner in question and answer sessions as part of workshops or presentations. This informed the INF project on the needs of the researchers. By drawing on knowledge provided by Fairsharing.org [<xref ref-type="bibr" rid="B9">9</xref>], re3data.org [<xref ref-type="bibr" rid="B10">10</xref>], and NFDI4Chem [<xref ref-type="bibr" rid="B11">11</xref>] as well as central solutions offered by RWTH Aachen University, recommendations for current and future projects on infrastructure options, e.g., working data storage, ELNs, and data publishing and archival services, are made. Furthermore, areas that require additional work by infrastructure providers are pinpointed.</p>
</sec>
<sec id="S3">
<title>3 Results and Discussion</title>
<sec id="S3.1">
<title>3.1 Stage 1: Gathering Information</title>
<p>The questionnaire created at the beginning of this study was used as a living document. Therefore, updates to the questions occurred throughout the first stage to better explain the questions and thus acquire more detailed information, as outlined in <xref ref-type="sec" rid="S2">Section 2</xref>. The questionnaire successfully gathered information in a structured manner and allowed for a baseline to gain more detailed information. This required close face-to-face exchange between the research project members and members of the INF project. In total, 16 interviews were conducted, involving 13 research groups working within the project.</p>
<fig id="F2">
<caption>
<p><bold>Figure 2:</bold> Successful information gathering through a questionnaire that was continuously improved through question and answer sessions and a close exchange with CRC 985 scientists.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g2.png"/>
</fig>
<p>In addition, the INF project held seminars for researchers to raise awareness with respect to RDM. Subsequent question and answer sessions gave a further overview of the methodological diversity as well as other RDM-related concerns, enabling the INF project to provide suggestions to facilitate RDM in the CRC 985. Therefore, by combining a questionnaire as a living document with a close exchange between the data-producing researchers, the first phase was successfully completed (<xref ref-type="fig" rid="F2">Figure 2</xref>).</p>
<p>It should be noted that participation was voluntary and the knowledge of the participants regarding RDM varied greatly. Thus, receiving a full and complete picture of RDM throughout the groups involved in the CRC proved difficult, resulting in possibly incomplete information. To gain a full and complete picture for a holistic RDM within such projects, INF projects must be better integrated into the individual research groups, with responsibilities and points of contacts defined from the onset, as further discussed in <xref ref-type="sec" rid="S3.3">Section 3.3</xref>.</p>
<p>All versions of the questionnaire as well as the completed surveys can be found within the dataset published on Radar4Chem [<xref ref-type="bibr" rid="B8">8</xref>]. The file naming convention includes the respective version for each completed survey. Additional notes on verbal exchanges are included in the individual documents.</p>
</sec>
<sec id="S3.2">
<title>3.2 Stage 2: Information Overview</title>
<p>The full content of the information gathered falls outside the scope of the results reported here, with the focus being placed on information regarding data-producing methods, the produced data volume, the generated data types, data documentation, and working data storage and organization.</p>
<fig id="F3">
<caption>
<p><bold>Figure 3:</bold> Successful information overview that tabulates all methods and resulting data volumes within CRC 985.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g3.png"/>
</fig>
<p>The questionnaires resulted in a tabular overview of the data-producing methods employed throughout CRC 985. <xref ref-type="fig" rid="F4">Figure 4</xref> provides an overview of these methods by research area, indicated by institute or department names. As shown, the wide variety of methods, from spectroscopy to microscopy to numerical methods, cover a broad context of disciplines. This rather coarse-grained depiction summarizes the methods into wider categories. It should be mentioned that the amount of devices and setups employed throughout the CRC gives rise to a large variety of data, including differences in the data output sizes and file types, even within a specific method. In total, 40 method categories were reported throughout the project. As this reporting was primarily voluntary and researchers may acquire, develop, or even switch methods as a project progresses, this number is approximate.</p>
<fig id="F4">
<caption>
<p><bold>Figure 4:</bold> Methods reported according to their area of research in CRC 985. The employed or available methods range from spectroscopy, to microscopy, to numerical, representing the variety of disciplines involved in the project. Nevertheless, many methods are common to chemistry-related research. In total, 40 method categories were reported.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g4.png"/>
</fig>
<p><xref ref-type="fig" rid="F5">Figure 5</xref> exhibits the resulting multitude of data output sizes. The majority of the methods produce data at or below the 1 GB mark, while five methods, namely high-resolution microscopy methods, such as superresolution fluorescence microscopy or tensiometry, and numerical methods, cross or go far beyond that mark. This must be taken into account for recommendations on storage, publication, and archival.</p>
<fig id="F5">
<caption>
<p><bold>Figure 5:</bold> Methods and their output data sizes (logarithmic scale) reported in CRC 985. Most reported output sizes are smaller than 1 GB, with numeric and imaging methods far beyond that point and up to 1 TB. Where applicable, error bars indicate the standard deviation of the data output sizes reported for specific methods.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g5.png"/>
</fig>
<p>The survey results provide an overview of commonly used data formats for raw and exported data. This will be discussed in more detail in <xref ref-type="sec" rid="S3.3">Section 3.3</xref>, with reported data formats provided in <xref ref-type="table" rid="T1">Table 1</xref>. During exchange with researchers and due to the responses presented below, it was clear that standard formats were not necessarily well-known, however, and therefore guidance on data formats is required. This information was included on the shared overview table on the SharePoint for project members to reference and to create general awareness. An anonymized version of this table is also provided in the published dataset [<xref ref-type="bibr" rid="B8">8</xref>]. Furthermore, some information was added to the table without specific surveys being carried out, rather, to add to the central methods overview.</p>
<table-wrap id="T1">
<caption>
<p><bold>Table 1:</bold> Data exchange formats recommended by FAIRsharing, NFDI4Chem, and the Chemotion Repository for selected methods reported within CRC 985 and common data formats or file extensions reported throughout the project. Formats sourced from FAIRsharing.org are cited accordingly, while those listed on NFDI4Chem&#8217;s Knowledge Base and the Chemotion Repository Documentation are denoted accordingly. We recommend the adoption of formats printed in bold font.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top">method</td>
<td align="left" valign="top">data exchange format or file extension recommended by NFDI4Chm, FAIRsharing, and Chemotion Repository</td>
<td align="left" valign="top">data exchange formats within CRC 985</td>
</tr>
<tr>
<td align="left" valign="top">Chromatography</td>
<td align="left" valign="top"><bold>ANDI-MS</bold> [<xref ref-type="bibr" rid="B26">26</xref>], CSV<sup>a</sup>, TXT<sup>a</sup></td>
<td align="left" valign="top">CSV, PDF, .vdt, .gcd</td>
</tr>
<tr>
<td align="left" valign="top">Colorimetric or Fluorescence-based Assays</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.ruc (raw), ASCII (export including metadata)</td>
</tr>
<tr>
<td align="left" valign="top">Computational Chemistry</td>
<td align="left" valign="top">CHARMM Card File Format (CRD) [<xref ref-type="bibr" rid="B27">27</xref>]</td>
<td align="left" valign="top">ASCII, .log, .cosmo, .energy, .out, .gjf, .xyz, CSV (processed)</td>
</tr>
<tr>
<td align="left" valign="top">Cyclic Voltammetry (CV)</td>
<td align="left" valign="top">TXT<sup>a</sup></td>
<td align="left" valign="top">.nox</td>
</tr>
<tr>
<td align="left" valign="top">Electrophysiology (patch clamp)</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.dat</td>
</tr>
<tr>
<td align="left" valign="top">Electron Paramagnetic Resonance</td>
<td align="left" valign="top"></td>
<td align="left" valign="top"></td>
</tr>
<tr>
<td align="left" valign="top">Spectroscopy (EPR)</td>
<td align="left" valign="top">TXT<sup>a</sup></td>
<td align="left" valign="top">.spe, TXT (export)</td>
</tr>
<tr>
<td align="left" valign="top">Elemental Analysis (EA)</td>
<td align="left" valign="top">TXT<sup>a</sup></td>
<td align="left" valign="top">TXT</td>
</tr>
<tr>
<td align="left" valign="top">Energy-dispersive X-ray spectroscopy (EDX)</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">TXT, JPEG (export), PNG (export)</td>
</tr>
<tr>
<td align="left" valign="top">Fluorescence spectroscopy</td>
<td align="left" valign="top"><bold>JCAMP-DX<sup>a</sup></bold></td>
<td align="left" valign="top">OPJ, FDS, TXT (export), PDF (export)</td>
</tr>
<tr>
<td align="left" valign="top">IR Spectroscopy (IR)</td>
<td align="left" valign="top"><bold>JCAMP-DX</bold> [<xref ref-type="bibr" rid="B28">28</xref>]<sup>a</sup>, AnIML [<xref ref-type="bibr" rid="B29">29</xref>]<sup>b</sup></td>
<td align="left" valign="top">.ispd, TXT (export), PDF (export)</td>
</tr>
<tr>
<td align="left" valign="top">Mass Spectrometry (MS)</td>
<td align="left" valign="top"><bold>JCAMP-DX</bold> [<xref ref-type="bibr" rid="B28">28</xref>], AnIML [<xref ref-type="bibr" rid="B29">29</xref>]<sup>b</sup>, <bold>mzML</bold> [<xref ref-type="bibr" rid="B30">30</xref>]<sup>a</sup></td>
<td align="left" valign="top">.d, .bad, Xcalibur Raw file, TXT, .jws</td>
</tr>
<tr>
<td align="left" valign="top">Mechanical Surface Analysis (nanoindentation)</td>
<td align="left" valign="top">-<break/>(standard data model: CWA 17552:2020 [<xref ref-type="bibr" rid="B31">31</xref>]</td>
<td align="left" valign="top">TXT</td>
</tr>
<tr>
<td align="left" valign="top">Microscopy</td>
<td align="left" valign="top"><bold>OME-TIFF</bold> [<xref ref-type="bibr" rid="B32">32</xref>]</td>
<td align="left" valign="top">.nid, .spm, .jpk-qi-image, .jpk-qi-data, <bold>TIFF<sup>e</sup></bold>, LIF, DM4 (TEM), JPEG (export), PNG (export), AVI (video), CSV, TXT</td>
</tr>
<tr>
<td align="left" valign="top">Nuclear Magnetic Resonance Spectroscopy (NMR)</td>
<td align="left" valign="top">NMR-STAR [<xref ref-type="bibr" rid="B33">33</xref>], CCPN [<xref ref-type="bibr" rid="B34">34</xref>], <bold>NMR-ML</bold> [<xref ref-type="bibr" rid="B35">35</xref>], <bold>NMReData</bold> [<xref ref-type="bibr" rid="B36">36</xref>] (assignments)<sup>a</sup>, AniML [<xref ref-type="bibr" rid="B29">29</xref>]<sup>b</sup>, <bold>JCAMP-DX</bold> (raw)<sup>a</sup></td>
<td align="left" valign="top">.mrnova, FID, PDF (export)</td>
</tr>
<tr>
<td align="left" valign="top">Raman Spectroscopy</td>
<td align="left" valign="top"><bold>JCAMP-DX<sup>a</sup></bold>, AniML [<xref ref-type="bibr" rid="B29">29</xref>]<sup>b</sup></td>
<td align="left" valign="top">.icRaman, .sps, TXT (export), CSV (export), .spc (export), .xlsx (export)</td>
</tr>
<tr>
<td align="left" valign="top">Rheometry</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.rdf, .tri, .iwp, CSV (export)</td>
</tr>
<tr>
<td align="left" valign="top">Dynamic Light Scattering</td>
<td align="left" valign="top">CSV<sup>b</sup></td>
<td align="left" valign="top"><bold>ASC<sup>d</sup></bold>, .dts, <bold>.zmes<sup>d</sup></bold>, CSV (export), TXT (export)</td>
</tr>
<tr>
<td align="left" valign="top">Static Light Scattering</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.d80, .txt (export, not all parameters included)</td>
</tr>
<tr>
<td align="left" valign="top">Small Angle X-Ray Scattering (SAXS)</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.mpa, .info, .edf, .dat</td>
</tr>
<tr>
<td align="left" valign="top">Spectroelectrochemistry</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.str8, TXT (export)</td>
</tr>
<tr>
<td align="left" valign="top">Tensiometry</td>
<td align="left" valign="top">PNG (contact angle measurements)<sup>a</sup></td>
<td align="left" valign="top">.krs, <bold>.zip (export, contains all .krs and XML)<sup>d</sup></bold>, XLSX (export or analysis results)</td>
</tr>
<tr>
<td align="left" valign="top">Thermal Analysis</td>
<td align="left" valign="top">-</td>
<td align="left" valign="top">.stad, .spp, TXT (export), CSV (export)</td>
</tr>
<tr>
<td align="left" valign="top">UV/Vis Spectroscopy</td>
<td align="left" valign="top">CSV<sup>a</sup>, <bold>JCAMP-DX<sup>c</sup></bold></td>
<td align="left" valign="top">.dsw, .bsk, .bkn, .str, .jws, .jwb, .ksd, .sre (ASCII), TXT (export), CSV (export)</td>
</tr>
<tr>
<td align="left" valign="top">X-Ray Diffraction Analysis (XRD)</td>
<td align="left" valign="top"><bold>CIF</bold> [<xref ref-type="bibr" rid="B37">37</xref>] (single crystal)<sup>a</sup>, <bold>.xyd</bold> (powder)<sup>a</sup></td>
<td align="left" valign="top">binary encoded frames (images), .p4p, .hkl, .res, CIF, .x</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p><sup>a</sup> = NFDI4Chem Knowledge Base</p>
<p><sup>b</sup> = under development according to FAIRsharing.org</p>
<p><sup>c</sup> = Chemotion Repository</p>
<p><sup>d</sup> = (meta)data accessible by common tools</p>
<p><sup>e</sup> = preferably method-specific TIFF-formats that include extended metadata</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The questionnaire also addressed data documentation, especially regarding (uniform) metadata. The responses reveal that, for most groups, very little uniform, machine-readable metadata are recorded unless it is contained directly in the output data files. However, this information may not always be contained in the exported version of the data, with which many members reported working. Relevant information is often included directly in the file name, analog or ELNs, or digitized in plain text, Microsoft Office, or Microsoft Excel files. Only one group mentioned using controlled vocabularies.</p>
<p>It should be noted that, in some cases, project members, especially doctoral students, expressed concerns in terms of data storage best practices, which data should be stored, published, and archived at which stage (raw vs. exported or processed data), data organization, and data formats. This was often expressed in informal conversations, workshop, or seminar settings.</p>
<p>Thus, the survey provided sufficient results to obtain an overview of the methodological diversity and generated data that led to the successful completion of the second phase (<xref ref-type="fig" rid="F3">Figure 3</xref>). In addition to the data-producing methods, other foundational aspects and concerns regarding RDM were collected and will be addressed in the following.</p>
</sec>
<sec id="S3.3">
<title>3.3 Stage 3: Recommendations</title>
<p>Based on the knowledge gained from the presented results, we derived the following recommendations as outlined below. On the one hand, the data-producing method types and file sizes influence aspects such as data publication platforms and recommended file types. On the other hand, the project participants&#8217; accounts allow us to directly address the concerns and advise on research data management best practices accordingly.</p>
<p>The main concerns reported were:</p>
<list list-type="order">
<list-item><p>(Lack of) knowledge and implementation of data organization basics and best practices regarding working data storage and structure</p></list-item>
<list-item><p>Internal data reuse, e.g., the ability to easily build upon a predecessor&#8217;s work</p></list-item>
<list-item><p>Access to storage space for large amounts of (raw) data</p></list-item>
<list-item><p>Data exchange format standards</p></list-item>
<list-item><p>(Lack of) knowledge of data documentation best practices and minimum information (metadata) standards</p></list-item>
<list-item><p>Publishing data underlying a journal article publication, e.g., which repository best suits the research data and data access control (open access vs. closed access options)</p></list-item>
</list>
<p>These concerns were largely reported on a research group and not necessarily a project-specific level. Many are interlinked and can thus be grouped together. Therefore, in the following, we will discuss and make recommendations for data organization within a group, which involves working data governance, data documentation, data formats, including minimum information (metadata) standards as well as archival (covering points 1, 2, 3, 4, 5 above). Many of these aspects, especially data governance, fall into the <bold>planning</bold> section of the research data lifecycle, depicted in <xref ref-type="fig" rid="F6">Figure 6</xref>. Here, RDM practices are planned and documented in data management plans (DMP) or data policies. They are then carried out and updated throughout the data <bold>production</bold> and <bold>analysis</bold> sections of the data lifecycle.</p>
<fig id="F6">
<caption>
<p><bold>Figure 6:</bold> The research data lifecycle depicts the typical stages of research data throughout a project. These include the planning of the project, which encompasses detailed planning on which research data will be generated or re-used as well as how it will be stored during and archived after the project. The active research phases include the data production and analysis phases, after which the data are preserved and access rights are determined, such as open-access in a public repository or closed access in an institutional archive. Those who have access to the data can then re-use it in the next project. At this point, the planning stage restarts the cycle [<xref ref-type="bibr" rid="B12">12</xref>].</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g6.png"/>
</fig>
<p>Together, these points ensure data can be reused by others within the group and also prepare data for publication and reuse by those outside of an organization or project. We then recommended repositories based on discipline and/or data acquisition methods employed, and how to reference this data within a journal article (covering point 6 above). This allows others to <bold>access</bold> and <bold>reuse</bold> the data, restarting the data lifecycle (<xref ref-type="fig" rid="F6">Figure 6</xref>). Lastly, we outline how large, interdisciplinary projects can tie the individual group RDM together in a collaborative data management.</p>
<p>For the further discussion of these points, we will use the following use cases to illustrate the recommendation. These examples outline the status quo for specific methods within CRC 985 in the third funding phase:</p>
<boxed-text>
<sec>
<title>Case 1: Infrared Spectroscopy</title>
<p><bold>Status Quo</bold></p>
<list list-type="bullet">
<list-item><p>Small data output (<xref ref-type="table" rid="T5">Table 5</xref>)</p></list-item>
<list-item><p>Data processing only possible on device computer</p></list-item>
<list-item><p>Limited metadata captured when exported to an open format</p></list-item>
<list-item><p>ELN available (Chemotion ELN)</p></list-item>
<list-item><p>Networked to institute server</p></list-item>
</list>
<p><bold>Desired Outcome</bold></p>
<list list-type="bullet">
<list-item><p>Enable data processing and analysis on computers other than the device computer</p></list-item>
<list-item><p>Automatically link data to the digital sample documentation</p></list-item>
</list>
</sec>
</boxed-text>
<boxed-text>
<sec>
<title>Case 2: Superresolution Fluorescence Microscopy</title>
<p><bold>Status Quo</bold></p>
<list list-type="bullet">
<list-item><p>Large data output (<xref ref-type="table" rid="T5">Table 5</xref>)</p></list-item>
<list-item><p>Limited uniform metadata automatically generated</p></list-item>
<list-item><p>Predecessors data not always understandable</p></list-item>
<list-item><p>ELN available (eLabFTW)</p></list-item>
</list>
<p><bold>Desired Outcome</bold></p>
<list list-type="bullet">
<list-item><p>Ensure complete data documentation/metadata record</p></list-item>
<list-item><p>Link data to digital documentation</p></list-item>
<list-item><p>Appropriate storage solution for large data volume</p></list-item>
</list>
</sec>
</boxed-text>
<p>These examples represent typical cases. Infrared spectroscopy (IR) produces relatively small data output (just over 10 MB, see <xref ref-type="fig" rid="F5">Figure 5</xref>), which is representative of a large portion of the methods reported and therefore storage space is of little concern. The issue lies rather in ensuring data and full metadata are exported and linked to the sample documentation, while enabling data processing from anywhere, not just through the device computer. This case is fairly representative for spectroscopy in general.</p>
<p>Superresolution Fluorescence Microscopy (SRFM) imaging reaches the 150 GB mark per measurement (see optical microscopy in <xref ref-type="fig" rid="F5">Figure 5</xref>), which poses a challenge to the institutional storage solutions in the long run. Furthermore, the raw data does not include the full measurement parameters, such as which device setup and specific accessories that may have been used. An ELN, eLabFTW, is available to manually enter these parameters. The full dataset cannot be directly attached to this type of documentation due to the file size limitations of the standard database storage. Therefore, ensuring complete metadata and other documentation, automatically transferring the data to an appropriate storage solution, and linking the (meta)data and documentation to the measurement and analysis data is desirable. Due to the output data size and the need for improved documentation, this case represents not only other imaging methods. Certain RDM solutions may also be extended to computational chemistry, for example, where storage and uniform documentation of input parameters play an important role.</p>
<sec id="S3.3.1">
<title>3.3.1 Data Governance</title>
<p>A general uncertainty regarding which data to store, e.g., raw vs. processed files, and how to organize the stored data was reported, especially due to a lack of guidelines in this area. Thus, doctoral researchers often establish their own individual directory structure, documentation practices, software tools to use, file and sample naming conventions, and workflows. While this works for the individual in the short term, establishing a holistic data governance within a research group planning phase enables wider collaboration as it provides structure and guidance. Proper data organization, first and foremost, ensures that those currently working with the data can do so efficiently. Furthermore, it enables others to easily understand and therefore reuse or build upon the data, from future doctoral students in the same group to external researchers with whom the data may be shared.</p>
<p>Starting in the planning phase of research, it must be determined where to store data and how this should be structured. A common practice, observed during exchange with researchers, is for the individual to sort data in a folder bearing their name. However, creating common, structured folder templates for each project and storing data accordingly&#8212;instead of associating it with the person conducting the research&#8212;ensures the data can be correctly found in the years to come. Central, shared storage options, such as institutional servers or rented server space from the university&#8217;s central service providers, are recommended, while access to individual folders is controlled on an administrative level.</p>
<p>It must be clear to all group members at what stages research data should be saved. For example, as with the cases outlined in <xref ref-type="sec" rid="S3.3">Section 3.3</xref>, certain IR devices produce raw data in proprietary formats, while exported data may be used to continue work on the researcher&#8217;s computer. Raw data may not be transferred as it cannot be opened without the device software. However, best practice is to always store raw data, even if in proprietary format, in read-only folders within the given directory structure.</p>
<p>These agreed upon practices and structures should be documented in a group-wide DMP as well as plain-text README files contained within the directory structure for easy reference. Further data policies and on- and off-boarding checklists ensure data are transferred smoothly from one researcher to the next.</p>
<p>This planning and documentation does not stop with data organization and storage, but should also include other aspects that will arise in data production and analysis, such as data exchange formats for storage as well as preservation and reuse, documentation tools and standards, as well as data archival and publication platforms to ensure preservation, access, and re-use, the specifics of which are discussed in the following.</p>
<p>In this phase, clear documentation of the processes and data-producing methods also proves useful to better understand where improvement may be required. For example, a group-level project can fully assess the status quo to determine where data workflows may be improved and where external help may be required,</p>
<p>These efforts not only aid in managing research and the corresponding as a group, but also provide a reference for (external data) stewards or data managers, e.g., those involved in INF projects, while providing contextual information for data publication.</p>
</sec>
<sec id="S3.3.2">
<title>3.3.2 Data Documentation</title>
<p>As noted, doctoral researchers often individually establish documentation practices. In turn, it was often mentioned, that understanding a predecessors&#8217; data and work proved difficult. This indicates that common, group-level documentation standards need to be established.</p>
<p>Using the above SRFM case as an example, the raw data obtained from the device does not necessarily contain all relevant measurement parameters. For IR, raw data files cannot be opened without the device software, while full etadata are not exported with all available data exports. Thus, as a bare minimum, establishing templates and even metadata schema in text-based formats such as YAML or JSON provides a simple, machine and human-readable format that may be filled out for each dataset. Such files can then be stored directly alongside the data to give a digital metadata record. This practice may be extended to digitally record and document research, thereby documenting agreed-upon minimum information for an experiment, measurement, or sample, and by following existing community standards, where available. These templates should be established in the planning phase of the research data lifecycle and updated, when necessary, throughout the data production and analysis phases (see <xref ref-type="fig" rid="F6">Figure 6</xref>).</p>
<p>Up until here, this and the <xref ref-type="sec" rid="S3.3.1">previous section</xref> cover very basic data storage and management that does not employ any specialized tools or infrastructure, besides a well-managed central storage, defined directory structure, and documentation using agreed-upon templates. This provides group members, especially junior scientists, with the basic framework to operate in an efficient and organized manner, while producing transparent results that are (re)usable by other current and future research group members. However, sophisticated tools and platforms exist, and are being continuously updated and improved, to further assist researchers in effective research data management.</p>
<p>In many natural sciences, the laboratory journal stands as the staple of research documentation. However, analog journals are not machine-readable and do not necessarily follow uniform documentation standards. Digital counterparts, ELNs, offer a powerful solution to documenting research in a digital and structured manner, while also managing and connecting the associated research data. These platforms exist with a wide variety of styles and target user groups, from the more synthetic chemistry focused Chemotion ELN [<xref ref-type="bibr" rid="B13">13</xref>], [<xref ref-type="bibr" rid="B14">14</xref>], [<xref ref-type="bibr" rid="B15">15</xref>] to the broadly customizable eLabFTW [<xref ref-type="bibr" rid="B16">16</xref>], [<xref ref-type="bibr" rid="B17">17</xref>]. One group within the CRC transitioned to Chemotion ELN after the survey had been conducted, while limited use of eLabFTW was reported, yet in a rather individualized manner. Proprietary solutions such as FURTHRmind and mbook were also employed. Many CRC members reported using analog journals or solutions such as MS Word and MS Excel files, as noted above.</p>
<p>For ELNs, it is important to continue to follow data organization and documentation best practices. While some ELNs, such as the Chemotion ELN, strive to adhere to minimum information standards for supported methods, highly customizable instances or unsupported methods require high-level organization from within the group or institute. As with the templates outlined above, groups or institutes should agree on the information to record for their experiments and create templates for the ELN. eLabFTW, for example, enables custom metadata and allows for the creation of experiment templates. Chemotion has recently also expanded to include LabIMotion [<xref ref-type="bibr" rid="B18">18</xref>] which enables custom modules for non-chemistry or not yet included methods. Therefore, an ELN must be centrally managed and documented within the group, analogous to the basic data organization and storage outlined above. This not only includes providing templates and usage guidelines, but also training group members on ELN use.</p>
<p>For the examples, the IR use case involves a research group that employs the Chemotion ELN. The ELN offers direct connections for many methods, including IR, which directly transfers data and attaches it to an experiment [<xref ref-type="bibr" rid="B19">19</xref>]. It also offers ChemSpectra to edit the analytical data [<xref ref-type="bibr" rid="B20">20</xref>]. These methods extract necessary metadata to complete the documentation, ensuring documentation, research data as well as the analysis are bundled in one place.</p>
<p>For the SRFM use case, eLabFTW is available, which allows for structured metadata templates to be established within experiment templates. Since not all relevant metadata are captured in a given measurement, researchers can employ such templates to document their research and manually enter any missing information. However, as opposed to IR, attaching SRFM data to experiments within the ELN is not viable due to size limitations. Therefore, creating meaningful links to the data within the documentation proves helpful.</p>
<p>For cases such as this, where increased storage is required while metadata management is at the forefront, the RWTH Aachen IT Center has developed Coscine (short for Collaborative Scientific Integration Environment) [<xref ref-type="bibr" rid="B21">21</xref>], [<xref ref-type="bibr" rid="B22">22</xref>]. This platform primarily aims to organize and manage working research data in ongoing projects. On a group level, Coscine offers various storage types, called resources, with a storage quota of up to 125 TB per project for participating universities or groups involved in NFDI-related projects. Custom metadata application profiles can be generated to fit group needs, which result in a fillable metadata form that includes metadata validation for input values. Data within a project or subproject is organized into resources, each of which has been assigned a specific application profile and a PID in the form of an ePIC [<xref ref-type="bibr" rid="B23">23</xref>], which leads to a contact page. Therefore, groups can customize their data documentation and storage structure to fit their needs and incorporate community-specific minimum information standards. Details pertaining to the collaborative aspects of this platform will be discussed in <xref ref-type="sec" rid="F3.3.4">Section 3.3.4</xref>.</p>
<p>Both eLabFTW and Coscine offer a Representational State Transfer Application Programming Interface (REST API). Such interfaces allow for information to be exchanged between the platforms in an automated manner. Therefore, to maintain the local documentation using the ELN while maintaining a connection to the associated raw and processed data, a Python script on the device computer can transfer the measurement data to Coscine, while a link is added within the ELN entry. Metadata from the ELN is then also mirrored in Coscine.</p>
<p>Similar templates workflows may be setup for different methods to ensure the datasets include complete documentation for all methods employed within the group. Working from a basis of well-structured and well-documented data organization, including governance and research data documentation, established during the planning phase and implemented during the data production and analysis phases of the research data lifecycle (<xref ref-type="fig" rid="F6">Figure 6</xref>), provides the foundation for RDM in collaborative projects. Maintenance of these practices and proper onboarding of group members ensures adherence and avoids uncertainty.</p>
</sec>
<sec id="S3.3.3">
<title>3.3.3 Data Formats</title>
<p>Vendor software typically directs data formats for output data, which may be proprietary. Interoperable data requires open and standardized data formats, which do not (yet) exist for every method [<xref ref-type="bibr" rid="B24">24</xref>]. For many methods, open export formats such as TEXT and comma-separated values (CSV) were reported, however, the associated metadata may be lost or incomplete upon export, as indicated for IR, for example. Furthermore, while these formats may be machine-readable to a certain extent, they are not necessarily machine-<italic>understandable</italic> as they lack a defined structure and semantic annotation.</p>
<p>As standard open data exchange formats exist for certain analysis methods within the CRC and since many of them were not mentioned in the survey responses, we gathered recommendations and summarized these in <xref ref-type="table" rid="T1">Table 1</xref>, sourcing information from FAIRsharing [<xref ref-type="bibr" rid="B9">9</xref>] and NFDI4Chem&#8217;s Knowledge Base [<xref ref-type="bibr" rid="B11">11</xref>], as well as the Chemotion Repository documentation [<xref ref-type="bibr" rid="B25">25</xref>].</p>
<p>This information has also been shared on the CRC 985 SharePoint along with the method information outlined <xref ref-type="sec" rid="S3.2">above</xref>. Gathering this information specifically arose from communication over the common misconception that data should always be stored and published as CSV or TEXT files. Other options exist, may even be supported by vendor software, and simply lack awareness.</p>
<p>The existing standard data exchange formats listed in <xref ref-type="table" rid="T1">Table 1</xref> provide guidelines on formats to choose from, while recommended standards and common formats are highlighted in bold font. The exact format choice for each method will depend on available software and export or conversion tools and also the format data types specific repositories will accept for publication (see, for example, Chemotion Repository requirements in [<xref ref-type="bibr" rid="B25">25</xref>], [<xref ref-type="bibr" rid="B38">38</xref>]).</p>
<p>Notably, many methods do lack specific standards, for which the above-mentioned practice of documenting data appropriately and sharing data along with the associated metadata in open, text-based formats is advised. As the various efforts such as the NFDI consortia continue their work, more standards will become available. Furthermore, minimum information standards will continue to direct how data should be formatted and documented, further guiding format standards. <xref ref-type="table" rid="T1">Table 1</xref> as well as the published overview [<xref ref-type="bibr" rid="B8">8</xref>] serve to inform the standards and infrastructure community on which formats researchers are employing in their day-to-day work and where standards are lacking.</p>
<p>For the example case IR, as the connection can be made to Chemotion ELN, the data should be exported to JCAMP-DX as advised by not only the Chemotion Repository as denoted in <xref ref-type="table" rid="T1">Table 1</xref>, but also the Chemotion ELN to allow for automatic data transfer. This format was not reported, yet it is supported by the vendor software. For SRFM, OME-TIFF may prove beneficial by adapting an instance of Omero on an institutional or university level [<xref ref-type="bibr" rid="B39">39</xref>]. Without this option, TIFF files are appropriate. Connecting the documentation and data management, as described, ensures full metadata annotation, especially since Coscine enables semantic metadata.</p>
<p>As with data organization and documentation, data exchange formats must be agreed upon as part of the planning stage of the data lifecycle (<xref ref-type="fig" rid="F6">Figure 6</xref>), communicated within the group, and updated as more standards become available.</p>
</sec>
<sec id="S3.3.4">
<title>3.3.4 Collaboration</title>
<p>Up until now, the discussion has focused on the group level. Having a well-documented approach to data organization, documentation, and the tools used helps in identifying how collaborative projects such as CRCs and the contained subprojects can best manage data.</p>
<p>The CRC 985 INF project addressed sample tracking throughout a collaborative project involving many different groups and institutes in previous funding periods [<xref ref-type="bibr" rid="B1">1</xref>], as described in <xref ref-type="sec" rid="F1">Section 1</xref>. This system aimed to solve a specific problem with sample traceability within the project, while enabling project members to directly attach associated data to a (digital) sample. In this funding period, the system was further improved. As such, metadata fields for better sample tracking were added, enabling users to define who initially created the sample and who was currently working with it. The main view was altered according to user feedback to only show the most relevant information. This enabled researchers to better find relevant samples and data.</p>
<p>However, as shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, some research within the CRC may not involve physical samples, for example, computational chemistry methods such as molecular dynamics. Furthermore, SharePoint relies on database storage that cannot accommodate larger datasets. It is therefore not suitable for methods with large (raw) data output, e.g., SRFM and numerical methods (see <xref ref-type="fig" rid="F5">Figure 5</xref>). For these cases, other systems can provide the necessary solutions. It should also be noted that the metadata describe the sample rather than any attached data, and therefore would still require external documentation to fully describe the dataset belonging to the sample if not included directly within in the files.</p>
<p>A central ELN instance, that is used by all the members of the CRC, could provide one solution, yet, this did not prove realistic in this CRC for multiple reason, from varying user and group needs to the lack of a centralized solution offered by the university. As individual groups and institutes have indeed implemented ELNs, exchange formats between these could assist in collaborations in such projects. This is a central goal of the ELN Consortium [<xref ref-type="bibr" rid="B40">40</xref>], which currently involves ten ELN providers, including Chemotion ELN and eLabFTW.</p>
<p>The RDM platform Coscine, described in <xref ref-type="sec" rid="F3.3.2">Section 3.3.2</xref>, is intended for collaborative work- Roll management occurs on a project level, therefore, members can be given access to their respective subproject, with all data relevant to the project collected and documented in one place. As described, a REST API allows for automated data workflows, e.g., between local servers or ELNs and Coscine. As such, metadata, data, and identifiers may be mirrored between platforms, giving members a working-group agnostic option. As outlined for SRFM, its large storage capacity assists researchers where institutional servers or systems that rely on a database structure such as SharePoint and some ELNs reach their limits. As such, it has been employed within CRC 985 not only for SRFM, but for computational chemistry data as well as tensiometry.</p>
<p>An example of such an automated workflow would be transferring measurement data from a folder on an institutional server, such as a device computer or research group server, to a central RDM platform such as Coscine. A script would, in a given time interval, check for new data, parse the file for relevant metadata, and use the Coscine&#8217;s API to transfer the individual files and assign metadata in a structured manner. Thus, the data becomes available for project members on one centralized system in an automated manner, while similar workflows can pull relevant data from Coscine to their local storage and RDM solutions.</p>
<p>Implementing solutions that employ such interfacing options require scripts or programs, or even software development for more complex tasks. These should be maintained on a system such as RWTH Aachen University&#8217;s GitLab instance to facilitate access and collaboration. It should be clear what resources are available, aside from the API itself, such as networked computers and other available hardware, and who is responsible for deploying and maintaining these systems within the research group or institute. Staff with development skills may also be required, depending on the complexity of the solution. Due to updates in a given software&#8217;s API, updates to technical implementations may be required.</p>
</sec>
<sec id="S3.3.5">
<title>3.3.5 Data Publication and Archival</title>
<p>Aside from facilitating research within groups as well as large projects, the aim to make data reusable according to FAIR also includes making the (meta)data available to others while describing how to access the data (<xref ref-type="fig" rid="F6">Figure 6</xref>: Access and Re-Use). Therefore, a data policy was established during the second funding period [<xref ref-type="bibr" rid="B1">1</xref>], which stipulated that all data underlying a published journal article should be published as well.</p>
<fig id="F7">
<caption>
<p><bold>Figure 7:</bold> Several recommendations could be made for active data storage, including data formats, documentation, and archival for a project on the scale of CRC 985.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g7.png"/>
</fig>
<p>Various options exist for such publications, with the three common categories being: (1) institutional repositories, (2) general repositories, and (3) community-specific repositories. Where possible, community-specific repositories are preferred, as these are able to provide detailed metadata templates, enabling researchers to fully describe the published data. When using general or institutional repositories, adding as many (optional) metadata fields is best practice, while providing plain-text files for additional metadata and context. As institutional repositories may be used for reporting purposes, importing published datasets is also important, analogous to text publications.</p>
<p>Within these categories, we make the following recommendations for data sharing and archival in CRC 985 and similar projects, outlined in <xref ref-type="table" rid="T2">Table 2</xref>, which completes the final objective of this study (<xref ref-type="fig" rid="F7">Figure 7</xref>). These were selected according to the methods reported within the conducted survey, the institutes involved in the CRC, while recommendations by NFDI4Chem [<xref ref-type="bibr" rid="B11">11</xref>] were preferred. Information on file sizes has been included to provide a reference as to which repository may accommodate larger data amounts for methods producing larger amounts of data.</p>
<table-wrap id="T2">
<caption>
<p><bold>Table 2:</bold> Repositories recommended for CRC 985 and projects with similar data types. Institutional repositories correspond to research institutes involved in the current project.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top">Repository (type)</td>
<td align="left" valign="top">Description [<xref ref-type="bibr" rid="B9">9</xref>]</td>
<td align="left" valign="top">Date Size Limits</td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">J&#252;lich DATA [<xref ref-type="bibr" rid="B41">41</xref>] (institutional)</td>
<td align="left" valign="top">A registry service to index all research data created at or in the context of Forschungszentrum J&#252;lich, which may also be used to publish research data and software.</td>
<td align="left" valign="top">10 GB per file (depends on Dataverse installation); prefers links to larger datasets [<xref ref-type="bibr" rid="B42">42</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">RWTH Publications Research Data [<xref ref-type="bibr" rid="B43">43</xref>] (institutional)</td>
<td align="left" valign="top">As part of the general RWTH Publications repository, data and software can be published by all RWTH Aachen University members and affiliates.</td>
<td align="left" valign="top">100 GB per file; 1 TB maximum over all files (gigamove) [<xref ref-type="bibr" rid="B44">44</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">Chemotion Repository [<xref ref-type="bibr" rid="B45">45</xref>] (discipline-specific)</td>
<td align="left" valign="top">The repository supports the storage of data related to chemical samples or reactions, with a focus on data from synthetic and analytical work. While not a requirement, data may be submitted directly via the Chemotion ELN.</td>
<td align="left" valign="top">None; might limit it to 50 MB in future [<xref ref-type="bibr" rid="B46">46</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">Cambridge Structural Database (CSD) [<xref ref-type="bibr" rid="B47">47</xref>] (discipline-specific)</td>
<td align="left" valign="top">Established in 1965, the Cambridge Structural Database (CSD) is the a repository for small-molecule organic and metal-organic crystal 3D structures. Database records are automatically checked and manually curated by one of our expert in-house scientific editors. Every structure is enriched with chemical representations, as well as bibliographic, chemical and physical property information, adding further value to the raw structural data.</td>
<td align="left" valign="top">50 MB per file; 100 MB for the total size of files uploaded; exception for bigger files via email contact [<xref ref-type="bibr" rid="B48">48</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">Inorganic Crystal Structure Database (ICSD) [<xref ref-type="bibr" rid="B49">49</xref>] (discipline-specific)</td>
<td align="left" valign="top">The world&#8217;s largest database for fully determined inorganic crystal structures and contains the crystallographic data of published crystalline inorganic structures. Organometallic and theoretical structures have been added within the past years.</td>
<td align="left" valign="top">None; contact for file sizes &gt; 10 TB [<xref ref-type="bibr" rid="B50">50</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">ioChem-BD [<xref ref-type="bibr" rid="B51">51</xref>], [<xref ref-type="bibr" rid="B52">52</xref>] (discipline-specific)</td>
<td align="left" valign="top">IoChem-BD is a digital repository of Computational Chemistry and Materials results. A set of modules and tools aimed to manage large volumes of quantum chemistry results from a wide variety of broadly used simulation packages.</td>
<td align="left" valign="top">default 1 GB per upload; &gt; 100 MB not to be uploaded by web interface [<xref ref-type="bibr" rid="B53">53</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">NOMAD Repository &amp; Archive [<xref ref-type="bibr" rid="B54">54</xref>] (discipline-specific)</td>
<td align="left" valign="top">The NOMAD Repository and Archive stands for open access of scientific materials data. It enables the confirmatory analysis of materials data, their reuse, and repurposing. All data are available in their raw format as produced by the underlying code (Repository) and in a common, machine-processable, and well-defined data format (Archive).</td>
<td align="left" valign="top">32 GB per upload (maximum of 10 non-published uploads per user) [<xref ref-type="bibr" rid="B55">55</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">RADAR4Chem [<xref ref-type="bibr" rid="B56">56</xref>], [<xref ref-type="bibr" rid="B57">57</xref>] (chemistry: general)</td>
<td align="left" valign="top">A low-threshold and easy-to use service for sustainable publication and preservation of research data from all disciplines of chemistry. Currently, exclusive to publicly funded research institutions and universities in Germany.</td>
<td align="left" valign="top">10 GB per project [<xref ref-type="bibr" rid="B56">56</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">Suprabank [<xref ref-type="bibr" rid="B58">58</xref>] (discipline-specific)</td>
<td align="left" valign="top">Curated, open resource for intermolecular interaction.</td>
<td align="left" valign="top">10 GB per user (can be adapted) [<xref ref-type="bibr" rid="B59">59</xref>]</td>
</tr>
<tr>
<td align="left" valign="top">zenodo [<xref ref-type="bibr" rid="B60">60</xref>] (general)</td>
<td align="left" valign="top">EU discipline-agnostic repository for data and other research results.</td>
<td align="left" valign="top">50 GB per data set [<xref ref-type="bibr" rid="B61">61</xref>]</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Certain repositories are also tied to ELNs, therefore providing direct data and metadata workflows. Going a step further, data may also be converted to standard open formats, as is the case with Chemotion ELN and Chemotion Repository, as mentioned in <xref ref-type="sec" rid="F3.2">Section 3.2</xref>.</p>
<p>The published data should then be explicitly referenced via their DOI within the article using a data availability statement, which journals are increasingly requiring [<xref ref-type="bibr" rid="B62">62</xref>]. They may also be cited within the article itself. Especially in cases which involve multiple published datasets, this provides additional context for the reader.</p>
<p>As shown in <xref ref-type="fig" rid="F5">Figure 5</xref>, much of the data volume falls into smaller sizes, with imaging and numerical methods requiring larger storage if all data were to be published. For these, the use of institutional repositories such as RWTH Publications Research data are the best option. For some methods, such as Atomic Force Microscopy, not all extracted data must be published, yet the scripts employed to do so could be. Hence, the data may be reproduced in the same manner when needed, while the published data volume is held to a minimum in cases where repositories limit quota. Otherwise, much of the produced can be published on subject-specific or general chemistry repositories without too much concern for data volume. Furthermore, repositories may offer more quota upon request.</p>
<p>In terms of data access control, most of the repositories mentioned offer embargo periods to ensure the creators&#8217; first rights to the data. In addition, zenodo allows restricted access in cases where data cannot be made public.</p>
<p>As shown in <xref ref-type="fig" rid="F8">Figure 8</xref>, RADAR4Chem has proven itself as a readily-accepted data publication platform, which may be attributed to its ease of use, the ability for data stewards to add standard pre-filled metadata, as well as the recently-added notification system, allowing the INF project to quickly respond to requests for dataset review. Institutional repositories found favor as well, as RWTH Publications is used for 34.5% of data publications. Again, ease of use, but also a certain trust in one&#8217;s own services could be a strong factor here. For those using Chemotion ELN, the direct publishing workflow to the Chemotion Repository considerably assists authors in the publication process. In the example case for IR data, automated workflows from the Chemotion ELN to the Chemotion Repository exist and enable simple data publication. Both the Chemotion Repository and RADAR4Chem guarantee storage and accessibility for 10 years or more, conforming with German Research Foundation (DFG) requirements; the data herein is therefore successfully be deemed archived, while it can also be accessed and reused in accordance with the research data lifecycle in <xref ref-type="fig" rid="F6">Figure 6</xref>. RWTH Publications does not specifically list a time span, but considers items published as archived as well. It should be noted that the Institut Laue-Langevin carried out measurements for the CRC 985, the data for which is published on the associated data repository, as indicated in <xref ref-type="fig" rid="F8">Figure 8</xref>. This institutional data repository was only omitted from <xref ref-type="table" rid="T2">Table 2</xref> as only institutional repositories for direct participants were included. &#8218; Typically, projects will amass more data than that, which has been published. This therefore requires additional archive resources. For project members in CRC 985, the above-mentioned Coscine also serves as an archiving space and may also be used where data access must be controlled. It should be noted, however, that while the dataset PID may be used in a data availability statement, the access restrictions should be stated. Furthermore, as the data has not been published and received a DOI, it may not be cited.</p>
<fig id="F8">
<caption>
<p><bold>Figure 8:</bold> Research data repositories used to publish data underlying published articles in CRC 985. RADAR4Chem and RWTH Publications are widely used, followed by Chemotion and the institutional data repository for the Institut Laue-Langevin.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g8.png"/>
</fig>
<p>The entire SharePoint, including the sample management system, will be archived under the CRC&#8217;s Coscine project, while members can gain access to the system to archive their data as needed.</p>
</sec>
</sec>
<sec id="S3.4">
<title>3.4 Recommendations for Future CRCs and INF Projects</title>
<p>The overarching role of INF projects within the CRC has largely been left out of the discussion thus far. These central projects, however, can play a vital part in setting up and implementing the above aspects.</p>
<p>Three aspects were identified within the CRC 985 INF project that should be considered for future projects:</p>
<list list-type="order">
<list-item><p>Support for project-wide data management plans and guidelines during project planning stage</p></list-item>
<list-item><p>End-of-life plan for implemented infrastructure solutions</p></list-item>
<list-item><p>Sustainability of software solutions</p></list-item>
</list>
<p>To elaborate on 1., many workflows within research groups evolve naturally to fit the needs of those carrying out much of the practical work, i.e., the individual doctoral researchers. However, these tend to be highly individualistic and can be difficult to alter in order to streamline data workflows. Therefore, providing clear guidelines on data organization and associated tools is vital both within the group, but also across the project and should be established in the planning phase. INF projects need to be involved at this stage and assist with infrastructure planning and selection. Hence, overarching solutions can be available at the beginning of a project to avoid implementing solutions and tools and altering workflows during ongoing work. Individual workflows can then be developed within a given framework that facilitates data storage, documentation, and exchange. This enables INF projects to focus on collaborative workflows as opposed to improving individualized workflows, which proved difficult in CRC 985.</p>
<p>In terms of 2., the selected solutions require a detailed end-of-life management. It will not always be possible to foresee which services and dependencies may become outdated over the lifetime of a project. However, precautions and exit strategies to safeguard any and all data managed by these services in a structured manner must exist.</p>
<p>As for 3., the software solutions developed by the INF project, e.g., data workflow scripts, should be designed to outlive the project. The aspect of maintenance comes into play. Therefore, INF projects should directly include individuals within the groups who are able to maintain these solutions after the INF project is no longer available.</p>
<p>Overall, detailed, high-level planning for data management and the implementation of infrastructure solutions should involve INF projects at a very early stage of the project. Then, throughout the project, members must be onboarded and continuously informed on common practices, guidelines, and policies to ensure adherence.</p>
<p>It should be noted that a readiness to publish data underlying published results generally exists throughout CRC 985, especially in the third funding period. <xref ref-type="fig" rid="F9">Figure 9</xref> shows an increase in (text) publications which are linked to a published dataset, especially in 2023 and 2024, while archiving data in a non-public manner was preferred up until then. This data is recorded by RWTH Publications, in which data as well as text publications within the CRC are recorded in addition to its use as a data repository. This increase in text publications is likely due to general changes in academic culture and awareness concerning data publication, but also the availability of more platforms to easily do so. As noted in <xref ref-type="sec" rid="F3.3.5">Section 3.3.5</xref>, RADAR4Chem, a service which began in 2022, is greatly accepted. While its ease of use plays a role, the INF project also created awareness of the repository.</p>
<fig id="F9">
<caption>
<p><bold>Figure 9</bold> Publications with linked datasets according to RWTH Publications. Initially, linking archived (non-public) datasets was favorable in CRC 985, while publishing data becomes more common, especially in 2023 and 2024.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="inggrid-4267_herres-pawlis-g9.png"/>
</fig>
<p>For future INF projects, creating awareness of these platforms and workflows from the very beginning should prove helpful, stressing their ease of use and how they conform to DFG requirements on data publication and archival. INF members should be in exchange with infrastructure providers to, on the one hand, stay up-to-date with developments, but also to communicate researchers&#8217; requirements and expectations. This aids in increasing usability and therefore acceptance, enabling researchers to make their data reusable.</p>
</sec>
</sec>
<sec id="S4">
<title>4 Conclusion</title>
<p>Information on the data-producing methods and the associated data formats and data sizes in CRC 985 were collected in order to gain an overview of the diversity and derive RDM concepts and structures for CRCs. The collected information is based on a structured survey, which collected most of the details on the methods themselves, while formal as well as informal discussions in various settings provided further feedback and deeper insight. Based on the information as a whole, recommendations for this ongoing as well as future projects are made.</p>
<p>The gathered information paints a picture of the varied disciplines and the accordingly varied data types and sizes. This underlines the need for standardized open exchange formats, as many of the open export formats reported do not necessarily contain the required complete information in the form of structured metadata to fully understand the acquired data. In order to assist in this, tools from plain-text metadata templates to structured ELNs and data management platforms provide essential machine-readable solutions for data documentation, assisting in data interoperability and reuse.</p>
<p>The workflows and the RDM practices for each stage of the research data lifecycle (see <xref ref-type="fig" rid="F6">Figure 6</xref>) should be clearly defined and documented on a group level in advance. This information can then feed into large projects such as CRCs, enabling informed decisions regarding RDM and collaboration within the planning phase. In this way, data stewards within the INF project can then establish policies, workflows, and infrastructures for collaboration within these institutional frameworks while working closely with researchers.</p>
<p>For projects of the size of CRC 985, a one-size-fits-all solution, such as a uniform ELN and repository where all (meta)data can be recorded in a well-structured manner, does not exist due to the variety of analytical and experimental methods employed and the associated different data formats and size requirements. Therefore, discipline-specific solutions found on a group level require collaboration platforms that support RDM. Within CRC 985, Microsoft SharePoint serves as collaboration platform, however, expectations regarding RDM evolved over the project duration. FAIR data requires more structured and defined metadata on various levels. More appropriate platforms for RDM have become available, including platforms such as the RWTH Aachen University&#8217;s Coscine as well as ELNs. This shows that, in addition to a minimum standard which should be defined prior to the data production phase of the research data lifecycle (see <xref ref-type="fig" rid="F6">Figure 6</xref>), a certain flexibility should also be implemented to meet evolving requirements in later funding periods.</p>
<p>With the requirement to publish all data underlying a text publication, ELNs and RDM platforms can greatly assist researchers&#8217; workflows in FAIR data publication and archival in subject-specific repositories by providing automated workflows. With much of this work still being in-progress by infrastructure providers, future research projects will be able to greatly benefit, while current work provides vital insight for these efforts.</p>
</sec>
</body>
<back>
<fn-group>
<fn id="n1"><p>CRCs are long-term yet temporary research projects funded by the German Research Foundation (DFG). They can run a total of 12 years, with individual funding periods of 4 years.</p></fn>
</fn-group>
<sec>
<title>Data availability</title>
<p>Data can be found here: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://dx.doi.org/10.22000/1793">https://dx.doi.org/10.22000/1793</ext-link></p>
</sec>
<sec id="S5">
<title>5 Acknowledgements</title>
<p>The authors acknowledge German Research Foundation (DFG) funding under the project numbers 191948804 (CRC 985) and 441958208 (NFDI4Chem) as well as for the funding and support within the framework of the DALIA project with the funding code 16DWWQP07B, funded by the Federal Ministry of Education and Research (BMBF) and the funding measure from the EU&#8217;s Capacity Building and Resilience Facility.</p>
</sec>
<sec id="S6">
<title>6 Roles and contributions</title>
<p><bold>Nicole A. Parks:</bold> Conceptualization, Investigation, Writing, Visualization, Data Curation &#8211; original draft</p>
<p><bold>Konstantin W. Kr&#246;ckert:</bold> Conceptualization, Investigation, Writing &#8211; original draft</p>
<p><bold>Fabian Cla&#223;en:</bold> Conceptualization, Writing &#8211; original draft</p>
<p><bold>Walter Richtering:</bold> Project Administration, Writing - review &amp; editing</p>
<p><bold>Matthias M&#252;ller:</bold> Project Administration, Writing - review &amp; editing</p>
<p><bold>Sonja Herres-Pawlis:</bold> Project Administration, Supervision, Writing &#8211; review &amp; editing,</p>
</sec>
<ref-list>
<ref id="B1"><label>[1]</label><mixed-citation publication-type="journal"><string-name><given-names>F.</given-names> <surname>Claus</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Kirchmeyer</surname></string-name>, <string-name><given-names>M. S.</given-names> <surname>M&#252;ller</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Richtering</surname></string-name>, <article-title>&#8220;Das INF-Projekt im SFB 985 Funktionelle Mikrogele und Mikrogelsysteme,&#8221;</article-title> <source>Bausteine Forschungsdatenmanagement</source>, no. <volume>2</volume>, pp. <fpage>104</fpage>&#8211;<lpage>111</lpage>, <month>Nov.</month> <year>2019</year>. DOI: <pub-id pub-id-type="doi">10.17192/bfdm.2019.2.8097</pub-id>. Accessed: Apr. 6, 2023.</mixed-citation></ref>
<ref id="B2"><label>[2]</label><mixed-citation publication-type="journal"><string-name><given-names>M.</given-names> <surname>Schr&#246;der</surname></string-name>, <string-name><given-names>H.</given-names> <surname>LeBlanc</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Spors</surname></string-name>, and <string-name><given-names>F.</given-names> <surname>Kr&#252;ger</surname></string-name>, <article-title>&#8220;Intra-consortia data sharing platforms for interdisciplinary collaborative research projects,&#8221;</article-title> <source>it - Information Technology</source>, vol. <volume>62</volume>, no. <issue>1</issue>, pp. <fpage>19</fpage>&#8211;<lpage>28</lpage>, <month>Feb.</month> <year>2020</year>, ISSN: 2196-7032, 1611-2776. DOI: <pub-id pub-id-type="doi">10.1515/itit-2019-0039</pub-id>. Accessed: Feb. 3, 2023.</mixed-citation></ref>
<ref id="B3"><label>[3]</label><mixed-citation publication-type="book"><string-name><given-names>H.-J.</given-names> <surname>G&#246;tze</surname></string-name> <italic>et al.</italic>, <chapter-title>&#8220;Data Management of the SFB 267 for the Andes &#8212; from Ink and Paper to Digital Databases,&#8221;</chapter-title> in <source>The Andes</source>, <string-name><given-names>O.</given-names> <surname>Oncken</surname></string-name> <italic>et al.</italic>, Eds., <publisher-name>Springer Berlin Heidelberg</publisher-name>, <year>2006</year>, pp. <fpage>539</fpage>&#8211;<lpage>556</lpage>, ISBN: 978-3-540-24329-8. DOI: <pub-id pub-id-type="doi">10.1007/978-3-540-48684-8_26</pub-id>. Accessed: Mar. 8, 2023.</mixed-citation></ref>
<ref id="B4"><label>[4]</label><mixed-citation publication-type="journal"><string-name><given-names>M. D.</given-names> <surname>Wilkinson</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;The FAIR Guiding Principles for scientific data management and stewardship,&#8221;</article-title> <source>Scientific Data</source>, vol. <volume>3</volume>, no. <issue>1</issue>, p. <fpage>160</fpage>&#8211;<lpage>018</lpage>, <month>Mar.</month> <year>2016</year>, ISSN: 2052-4463. DOI: <pub-id pub-id-type="doi">10.1038/sdata.2016.18</pub-id>. Accessed: Jan. 27, 2023.</mixed-citation></ref>
<ref id="B5"><label>[5]</label><mixed-citation publication-type="webpage"><string-name><given-names>A.</given-names> <surname>Kraft</surname></string-name>, <article-title>&#8220;The FAIR Data Principles for Research Data,&#8221;</article-title> <source>TIB-Blog</source>, <month>Sep.</month> <year>2017</year>. Accessed: Mar. 15, 2023. [Online]. Available: <uri>https://blogs.tib.eu/wp/tib/2017/09/12/the-fair-data-principles-for-research-data/</uri>.</mixed-citation></ref>
<ref id="B6"><label>[6]</label><mixed-citation publication-type="journal"><string-name><given-names>N.</given-names> <surname>Hartl</surname></string-name>, <string-name><given-names>E.</given-names> <surname>W&#246;ssner</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Sure-Vetter</surname></string-name>, <article-title>&#8220;Nationale Forschungsdateninfrastruktur (NFDI),&#8221;</article-title> <source>Informatik Spektrum</source>, vol. <volume>44</volume>, no. <issue>5</issue>, pp. <fpage>370</fpage>&#8211;<lpage>373</lpage>, <month>Oct.</month> <year>2021</year>, ISSN: 1432-122X. DOI: <pub-id pub-id-type="doi">10.1007/s00287-021-01392-6</pub-id>. Accessed: Sep. 21, 2023.</mixed-citation></ref>
<ref id="B7"><label>[7]</label><mixed-citation publication-type="journal"><string-name><given-names>C.</given-names> <surname>Steinbeck</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany,&#8221;</article-title> <source>Research Ideas and Outcomes</source>, vol. <volume>6</volume>, <elocation-id>e55852</elocation-id>, <month>Jun.</month> <year>2020</year>, ISSN: 2367-7163. DOI: <pub-id pub-id-type="doi">10.3897/rio.6.e55852</pub-id>. Accessed: Apr. 22, 2022.</mixed-citation></ref>
<ref id="B8"><label>[8]</label><mixed-citation publication-type="journal"><string-name><given-names>N. A.</given-names> <surname>Parks</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Kr&#246;ckert</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Cla&#223;en</surname></string-name>, <string-name><given-names>M.</given-names> <surname>M&#252;ller</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Herres-Pawlis</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Richterig</surname></string-name>, <source>Dataset belonging to the publication &#8220;Data-Producing Methods in CRC 985: Recommendations for Research Data Management in Large Interdisciplinary Projects&#8221;</source>, <year>2023</year>. DOI: <pub-id pub-id-type="doi">10.22000/1793</pub-id>.</mixed-citation></ref>
<ref id="B9"><label>[9]</label><mixed-citation publication-type="webpage"><source>FAIRsharing &#8212; Home</source>. Accessed: Apr. 4, 2023. [Online]. Available: <uri>https://fairsharing.org/</uri>.</mixed-citation></ref>
<ref id="B10"><label>[10]</label><mixed-citation publication-type="webpage"><source>Re3data - Registry of Research Data Repositories</source>. DOI: <pub-id pub-id-type="doi">10.17616/R3D</pub-id>. Accessed: Dec. 21, 2022. [Online]. Available: <uri>https://www.re3data.org/</uri>.</mixed-citation></ref>
<ref id="B11"><label>[11]</label><mixed-citation publication-type="webpage"><source>NFDI4Chem Knowledge Base &#8212; NFDI4Chem Knowledge Base</source>. Accessed: Apr. 4, 2023. [Online]. Available: <uri>https://knowledgebase.nfdi4chem.de</uri>.</mixed-citation></ref>
<ref id="B12"><label>[12]</label><mixed-citation publication-type="journal"><collab>RDM Team RWTH Aachen University</collab>, <source>Research Data Life Cycle</source>, image, <year>2022</year>. Accessed: Apr. 27, 2022.</mixed-citation></ref>
<ref id="B13"><label>[13]</label><mixed-citation publication-type="journal"><string-name><given-names>P.</given-names> <surname>Tremouilhac</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;Chemotion ELN: An Open Source electronic lab notebook for chemists in academia,&#8221;</article-title> <source>Journal of Cheminformatics</source>, vol. <volume>9</volume>, no. <issue>1</issue>, p. <fpage>54</fpage>, <month>Sep.</month> <year>2017</year>, ISSN: 1758-2946. DOI: <pub-id pub-id-type="doi">10.1186/s13321-017-0240-0</pub-id>. Accessed: Apr. 25, 2022.</mixed-citation></ref>
<ref id="B14"><label>[14]</label><mixed-citation publication-type="webpage"><collab>PiTrem</collab>, <source>ComPlat/chemotion_ELN: Chemotion ELN 0.9.1</source>, <month>Jun.</month> <year>2021</year>. DOI: <pub-id pub-id-type="doi">10.5281/zenodo.4899080</pub-id>. Accessed: Jan. 26, 2023. [Online]. Available: <uri>https://zenodo.org/record/4899080</uri>.</mixed-citation></ref>
<ref id="B15"><label>[15]</label><mixed-citation publication-type="webpage"><source>Chemotion</source>. Accessed: Apr. 25, 2022. [Online]. Available: <uri>https://eln.chemotion.net/home</uri>.</mixed-citation></ref>
<ref id="B16"><label>[16]</label><mixed-citation publication-type="journal"><string-name><given-names>N.</given-names> <surname>Carpi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Minges</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Piel</surname></string-name>, <article-title>&#8220;eLabFTW: An open source laboratory notebook for research labs,&#8221;</article-title> <source>The Journal of Open Source Software</source>, vol. <volume>2</volume>, no. <issue>12</issue>, p. <fpage>146</fpage>, <month>Apr.</month> <year>2017</year>, ISSN: 2475-9066. DOI: <pub-id pub-id-type="doi">10.21105/joss.00146</pub-id>. Accessed: Apr. 5, 2023.</mixed-citation></ref>
<ref id="B17"><label>[17]</label><mixed-citation publication-type="webpage"><source>eLabFTW - Open Source Laboratory Notebook</source>. Accessed: Apr. 5, 2023. [Online]. Available: <uri>https://www.elabftw.net</uri>.</mixed-citation></ref>
<ref id="B18"><label>[18]</label><mixed-citation publication-type="webpage"><source>The LabIMotion Extension &#8212; Chemotion</source>, <month>Mar.</month> <year>2023</year>. Accessed: Apr. 17, 2023. [Online]. Available: <uri>https://chemotion.net/docs/labimotion</uri>.</mixed-citation></ref>
<ref id="B19"><label>[19]</label><mixed-citation publication-type="journal"><string-name><given-names>J.</given-names> <surname>Potthoff</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Tremouilhac</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Hodapp</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Neumair</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Br&#228;se</surname></string-name>, and <string-name><given-names>N.</given-names> <surname>Jung</surname></string-name>, <article-title>&#8220;Procedures for systematic capture and management of analytical data in academia,&#8221;</article-title> <source>Analytica Chimica Acta: X</source>, vol. <volume>1</volume>, p. <fpage>100</fpage>&#8211;<lpage>007</lpage>, <month>Mar.</month> <year>2019</year>, ISSN: 2590-1346. DOI: <pub-id pub-id-type="doi">10.1016/j.acax.2019.100007</pub-id>. Accessed: May 2, 2024.</mixed-citation></ref>
<ref id="B20"><label>[20]</label><mixed-citation publication-type="journal"><string-name><given-names>Y.-C.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Tremouilhac</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Nguyen</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Jung</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Br&#228;se</surname></string-name>, <article-title>&#8220;ChemSpectra: A web-based spectra editor for analytical data,&#8221;</article-title> <source>Journal of Cheminformatics</source>, vol. <volume>13</volume>, no. <issue>1</issue>, p. <fpage>8</fpage>, <month>Dec.</month> <year>2021</year>, ISSN: 1758-2946. DOI: <pub-id pub-id-type="doi">10.1186/s13321-020-00481-0</pub-id>. Accessed: Aug. 16, 2024.</mixed-citation></ref>
<ref id="B21"><label>[21]</label><mixed-citation publication-type="journal"><string-name><given-names>M.</given-names> <surname>Politze</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Claus</surname></string-name>, <string-name><given-names>B. D.</given-names> <surname>Brenger</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Yazdi</surname></string-name>, <string-name><given-names>B. P. A.</given-names> <surname>Heinrichs</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Schwarz</surname></string-name>, <article-title>&#8220;How to Manage IT Resources in Research Projects? Towards a Collaborative Scientific Integration Environment,&#8221;</article-title> <source>European journal of higher education IT</source>, <year>2020</year>. DOI: <pub-id pub-id-type="doi">10.18154/RWTH-2020-11948</pub-id>. Accessed: Apr. 17, 2023.</mixed-citation></ref>
<ref id="B22"><label>[22]</label><mixed-citation publication-type="webpage"><source>Coscine &#8212; The research data management platform</source>. Accessed: Mar. 17, 2023. [Online]. Available: <uri>https://coscine.de/</uri>.</mixed-citation></ref>
<ref id="B23"><label>[23]</label><mixed-citation publication-type="webpage"><source>Persistent Identifiers for eResearch</source>. Accessed: Mar. 16, 2023. [Online]. Available: <uri>https://www.pidconsortium.net/</uri>.</mixed-citation></ref>
<ref id="B24"><label>[24]</label><mixed-citation publication-type="journal"><string-name><given-names>D.</given-names> <surname>Rauh</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;Data format standards in analytical chemistry,&#8221;</article-title> <source>Pure and Applied Chemistry</source>, vol. <volume>94</volume>, no. <issue>6</issue>, pp. <fpage>725</fpage>&#8211;<lpage>736</lpage>, <month>Jun.</month> <year>2022</year>, ISSN: 1365-3075. DOI: <pub-id pub-id-type="doi">10.1515/pac-2021-3101</pub-id>. Accessed: Mar. 16, 2023.</mixed-citation></ref>
<ref id="B25"><label>[25]</label><mixed-citation publication-type="webpage"><source>For Data Files &#8212; Chemotion</source>, <month>Oct.</month> <year>2023</year>. Accessed: Jan. 16, 2024. [Online]. Available: <uri>https://chemotion.net/docs/repo/details%5C_standards/files</uri>.</mixed-citation></ref>
<ref id="B26"><label>[26]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Analytical Data Interchange Protocol for Chromatographic Data</source>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.D7795C</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B27"><label>[27]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: CHARMM Card File Format</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.7HP91K</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B28"><label>[28]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Joint Committee on Atomic and Molecular Physical data - working group on Data eXchange</source>, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.V8NVE2</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B29"><label>[29]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Analytical Information Markup Language</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.6CS4BF</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B30"><label>[30]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Mz Markup Language</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.26DMBA</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B31"><label>[31]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: CWA 17552:2020 Engineering materials - Electronic data interchange - Instrumented indentation test data</source>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.5C379F</pub-id>. Accessed: Mar. 28, 2023.</mixed-citation></ref>
<ref id="B32"><label>[32]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Open Microscopy Environment - Tagged Image File Format</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.CQ8TG2</pub-id>. Accessed: Mar. 28, 2023.</mixed-citation></ref>
<ref id="B33"><label>[33]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: NMR Self-defining Text Archive and Retrieval format</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.2CHXXC</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B34"><label>[34]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Collaborative Computing Project for NMR</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.AVW5Q</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B35"><label>[35]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Nuclear Magnetic Resonance Markup Language</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.ES03FK</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B36"><label>[36]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Nuclear Magnetic Resonance Extracted Data Format</source>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.8AE3D0</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B37"><label>[37]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Crystallographic Information Framework</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.ZR52G5</pub-id>. Accessed: Mar. 29, 2023.</mixed-citation></ref>
<ref id="B38"><label>[38]</label><mixed-citation publication-type="journal"><string-name><given-names>P.</given-names> <surname>Tremouilhac</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;Chemotion Repository, a Curated Repository for Reaction Information and Analytical Data,&#8221;</article-title> <source>Chemistry&#8211;Methods</source>, vol. <volume>1</volume>, no. <issue>1</issue>, pp. <fpage>8</fpage>&#8211;<lpage>11</lpage>, <year>2021</year>, ISSN: 2628-9725. DOI: <pub-id pub-id-type="doi">10.1002/cmtd.202000034</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B39"><label>[39]</label><mixed-citation publication-type="journal"><string-name><given-names>C.</given-names> <surname>Allan</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;OMERO: Flexible, model-driven data management for experimental biology,&#8221;</article-title> <source>Nature Methods</source>, vol. <volume>9</volume>, no. <issue>3</issue>, pp. <fpage>245</fpage>&#8211;<lpage>253</lpage>, <month>Mar.</month> <year>2012</year>, ISSN: 1548-7091, 1548-7105. DOI: <pub-id pub-id-type="doi">10.1038/nmeth.1896</pub-id>. Accessed: Aug. 20, 2024.</mixed-citation></ref>
<ref id="B40"><label>[40]</label><mixed-citation publication-type="webpage"><source>The ELN Consortium</source>. Accessed: Apr. 6, <year>2023</year>. [Online]. Available: <uri>https://github.com/TheELNConsortium</uri>.</mixed-citation></ref>
<ref id="B41"><label>[41]</label><mixed-citation publication-type="journal"><collab>Re3data.Org</collab>, <article-title>&#8220;J&#252;lich DATA,&#8221;</article-title> 23 dataverses, 29 datasets, 1170 files, 2021. DOI: <pub-id pub-id-type="doi">10.17616/R31NJMYC</pub-id>. Accessed: Apr. 6, <year>2023</year>.</mixed-citation></ref>
<ref id="B42"><label>[42]</label><mixed-citation publication-type="webpage"><source>Dataset + File Management &#8212; J&#252;lich DATA documentation</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://apps.fz-juelich.de/fdm/staging/mode-of-access/user/dataset-management.html%5C#file-upload</uri>.</mixed-citation></ref>
<ref id="B43"><label>[43]</label><mixed-citation publication-type="journal"><collab>Re3data.Org</collab>, <article-title>&#8220;RWTH Publications Research Data,&#8221;</article-title> 319 research datasets, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.17616/R33N6J</pub-id>. Accessed: Apr. 6, 2023.</mixed-citation></ref>
<ref id="B44"><label>[44]</label><mixed-citation publication-type="webpage"><source>GigaMove - RWTH AACHEN UNIVERSITY IT Center - English</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://www.itc.rwth-aachen.de/cms/it-center/Services/Kollaboration/smiti/GigaMove/?lidx=1</uri>.</mixed-citation></ref>
<ref id="B45"><label>[45]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Chemotion repository</source>, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.IAGXCR</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B46"><label>[46]</label><mixed-citation publication-type="webpage"><source>Frequently Asked Questions (FAQ) &#8212; Chemotion</source>, Oct. 2023. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://chemotion.net/docs/repo/faq</uri>.</mixed-citation></ref>
<ref id="B47"><label>[47]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: The Cambridge Structural Database</source>, <year>2015</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.VS7865</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B48"><label>[48]</label><mixed-citation publication-type="webpage"><source>Deposit - The Cambridge Crystallographic Data Centre (CCDC)</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://www.ccdc.cam.ac.uk/deposit/upload</uri>.</mixed-citation></ref>
<ref id="B49"><label>[49]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Inorganic Crystal Structure Database</source>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.A95199</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B50"><label>[50]</label><mixed-citation publication-type="webpage"><string-name><given-names>D. A.</given-names> <surname>Steudel</surname></string-name>, <string-name><given-names>D. S.</given-names> <surname>R&#252;hl</surname></string-name>, <string-name><given-names>D. R.</given-names> <surname>Hinek</surname></string-name>, and <string-name><given-names>S.</given-names> <surname>Rehme</surname></string-name>, <source>Scientific Manual ICSD Database</source>, <year>2021</year>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://www.fiz-karlsruhe.de/sites/default/files/ICSD/documents/brochures/scientific-manual-2021-en.pdf</uri>.</mixed-citation></ref>
<ref id="B51"><label>[51]</label><mixed-citation publication-type="journal"><string-name><given-names>C.</given-names> <surname>Bo</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Alvarez</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Lopez</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Maseras</surname></string-name>, <string-name><given-names>J. M.</given-names> <surname>Poblet</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>De Graaf</surname></string-name>, <source>ioChem-BD Find central service</source>, <month>Nov.</month> <year>2017</year>. DOI: <pub-id pub-id-type="doi">10.19061/iochem-bd-find</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B52"><label>[52]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: ioChem-BD</source>, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.LWW6A1</pub-id>. Accessed: Sep. 21, 2023.</mixed-citation></ref>
<ref id="B53"><label>[53]</label><mixed-citation publication-type="webpage"><source>Set upload limits &#8212; ioChem-BD documentation</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://docs.iochem-bd.org/en/latest/faqs/admin/setup-upload-limits.html</uri>.</mixed-citation></ref>
<ref id="B54"><label>[54]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: NoMaD Repository</source>, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.AQ20QN</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B55"><label>[55]</label><mixed-citation publication-type="webpage"><source>How to upload data &#8212; NOMAD Repository and Archive documentation</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://nomad-lab.eu/prod/rae/docs/upload.html</uri>.</mixed-citation></ref>
<ref id="B56"><label>[56]</label><mixed-citation publication-type="webpage"><source>RADAR4Chem &#8212; RADAR</source>. Accessed: Apr. 18, <year>2023</year>. [Online]. Available: <uri>https://radar.products.fiz-karlsruhe.de/de/radarabout/radar4chem</uri>.</mixed-citation></ref>
<ref id="B57"><label>[57]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: RADAR</source>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.601A27</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B58"><label>[58]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: SupraBank</source>, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.VJWUT7</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B59"><label>[59]</label><mixed-citation publication-type="webpage"><source>SupraBank</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://suprabank.org/terms_of_service</uri>.</mixed-citation></ref>
<ref id="B60"><label>[60]</label><mixed-citation publication-type="journal"><collab>FAIRsharing Team</collab>, <source>FAIRsharing record for: Zenodo</source>, <year>2018</year>. DOI: <pub-id pub-id-type="doi">10.25504/FAIRSHARING.WY4EGF</pub-id>. Accessed: Apr. 18, 2023.</mixed-citation></ref>
<ref id="B61"><label>[61]</label><mixed-citation publication-type="webpage"><source>Zenodo - Research. Shared</source>. Accessed: Oct. 13, 2023. [Online]. Available: <uri>https://help.zenodo.org/faq/</uri>.</mixed-citation></ref>
<ref id="B62"><label>[62]</label><mixed-citation publication-type="journal"><string-name><given-names>N. A.</given-names> <surname>Parks</surname></string-name> <italic>et al.</italic>, <article-title>&#8220;The current landscape of author guidelines in chemistry through the lens of research data sharing,&#8221;</article-title> <source>Pure and Applied Chemistry</source>, <month>Feb.</month> <year>2023</year>, ISSN: 1365-3075. DOI: <pub-id pub-id-type="doi">10.1515/pac-2022-1001</pub-id>. Accessed: Apr. 6, 2023.</mixed-citation></ref>
</ref-list>
</back>
</article>