<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="discussion" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2941-1300</journal-id>
<journal-title-group>
<journal-title>ing.grid</journal-title>
</journal-title-group>
<issn pub-type="epub">2941-1300</issn>
<publisher>
<publisher-name>Universit&#228;ts- und Landesbibliothek Darmstadt</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.48694/inggrid.3945</article-id>
<article-categories>
<subj-group>
<subject>Data management letter</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Critically thinking about the reusability of (meta)data</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-7093-224X</contrib-id>
<name>
<surname>Pimenta</surname>
<given-names>Izadora Silva</given-names>
</name>
<email>izadora.pimenta@tu-darmstadt.de</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Chair of Fluid Systems, Technische Universit&#228;t Darmstadt, Darmstadt</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2024-03-18">
<day>18</day>
<month>03</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>2</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>3</lpage>
<history>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2024 The Author(s)</copyright-statement>
<copyright-year>2024</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>The text of this work is released under the Creative Commons license CC BY 4.0 International. You can find the contract text of the license at <uri xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</uri>. The illustrations are excluded from this license, here the copyright lies with the respective rights holder.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://www.inggrid.org/articles/doi.org/10.48694/inggrid.3945/"/>
<abstract>
<p><bold>Biography.</bold> I have a PhD in Digital Linguistics (TU Darmstadt). I hold an MA in Applied Linguistics (University of Campinas) and a bachelor&#8217;s in Journalism (PUC-Campinas). Along my research path, I have been strongly connected to Systemic-Functional Linguistics, Appraisal Theory and Corpus Linguistics studies. Currently, I work at TU Darmstadt, as a Research Associate (Chair of Fluid Systems), and at the Gender Consulting for Research Networks (Gender Equality Office). I am a Managing Editor for ing.grid.</p>
</abstract>
<kwd-group>
<kwd>Inggrid</kwd>
<kwd>Critical Data Studies</kwd>
<kwd>Data Literacy</kwd>
<kwd>Data Ethics</kwd>
<kwd>Data Reusability</kwd>
<kwd>RDM</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<p>How much consideration are we giving to the (Meta)data we produce? Naturally, the FAIR principles lead us through several steps in which we can elevate data management to a scientific effort in its own right, as ing.grid shares and advocates. However, reflecting on what the (meta)data does and does not encompass is also a commendable endeavour. Citing the researcher Joy Buolamwini in the documentary &#8221;Coded Bias&#8221;, data is destiny [<xref ref-type="bibr" rid="B1">1</xref>]. Data is a relationship we can make and put to use [<xref ref-type="bibr" rid="B2">2</xref>]. Taking responsibility for this (meta)data and striving to make it as transparent as possible is also a crucial step towards ensuring its reusability.</p>
<p>Engaging in critical thinking and taking ownership of the (meta)data we generate and disseminate not only enhances its worth but also steers us towards innovative pathways. Several approaches, such as the CARE Principles [<xref ref-type="bibr" rid="B3">3</xref>][<xref ref-type="bibr" rid="B4">4</xref>] and the Feminist Data Manifesto [<xref ref-type="bibr" rid="B2">2</xref>], make us think of data as a resource to be cared for and cultivated [<xref ref-type="bibr" rid="B2">2</xref>], going beyond the colonial extraction logic. To achieve this, we must consider the narrative of our (meta)data, the stakeholders involved in its generation, and the societal values embedded within it. Who is generating the data? For whom is it intended? Do we contemplate the ramifications of this data? Is our focus solely on data generation without fully realising its potential?</p>
<p>Sarah Ciston, author of a guide on managing machine learning datasets, acknowledges that datasets that encompass diverse perspectives &#8212; meaning, where feasible, those datasets that incorporate interdisciplinary and intersectional<xref ref-type="fn" rid="n1">1</xref> communities in &#8221;designing, developing, implementing, and evaluating your work&#8221; [<xref ref-type="bibr" rid="B5">5</xref>] &#8212; can offer a more robust approach to working with your data. Furthermore, they recognise that critical practices are becoming standard in many conferences and journals. Understanding that datasets can never be neutral (&#8221;taking no position on a dataset&#8217;s ethical question is still taking a position&#8221;, as Ciston reminds us), it is imperative to bear certain considerations in mind:</p>
<disp-quote>
<p>While it may be impossible to escape classification&#8217;s worldviews entirely, with awareness of the underlying assumptions of classification and its impact on your processes, it becomes easier to make critical decisions that account for these contexts. [<xref ref-type="bibr" rid="B5">5</xref>, paragraphs 1466-1469]</p>
</disp-quote>
<p>(Meta)data is more than just a resource; it is a representation of specific contexts from our world. All processes carry social implications. Do we possess accurate data regarding the safety of seatbelts if we fail to consider all body types during our research [<xref ref-type="bibr" rid="B6">6</xref>]?<xref ref-type="fn" rid="n2">2</xref> Without such considerations, and if we share our (meta)data without critical thinking, we also pave the way for inaccurate reproducibility. To optimise the reuse of (meta)data, as demanded by the FAIR principles, it is pertinent that we extend our thinking beyond mere management requirements. Generating data also entails taking responsibility for initiating their life cycle.</p>
<p>So, what is your dataset there for [<xref ref-type="bibr" rid="B5">5</xref>]? I like to ponder, having immersed myself in bell hooks' approach<xref ref-type="fn" rid="n3">3</xref> to considering the care we show towards others, that when we talk about community, we are discussing an undeniable commitment and responsibility. We must nurture this community around us. Building a community around a subject is also tied to that notion. If we are contemplating novel ways of reshaping scholarly publications, we are also contemplating the knowledge we must share and learn from others. Advocating for transparency also involves considering the implications of this (meta)data.</p>
<p>When formulating the author guidelines for ing.grid, we considered some of these aspects. Describing the (meta)data&#8217;s usability for the community and clarifying whether the data is sensitive to certain segments of our society are among the points already encompassed in our guidelines [<xref ref-type="bibr" rid="B7">7</xref>]. However, establishing a community for Research Data Management in Engineering Sciences can extend beyond these aspects. As we endeavour to place (meta)data at the forefront, we, as a community, hold the power to shape this trajectory.</p>
<p>As someone coming from the field of Systemic-Functional Linguistics, I am constantly reminded of J.R. Firth&#8217;s maxim: &#8221;We shall know a word by the company it keeps&#8221; [<xref ref-type="bibr" rid="B8">8</xref>]. Reflecting on our communication processes through the lens of language, and viewing language as the mechanism for constructing meaning [<xref ref-type="bibr" rid="B9">9</xref>], every act of communication embodies a dynamic of power. (Meta)data represents power. By generating and disseminating it, we bear responsibility for the entire spectrum of communication associated with it. When we share our (meta)data within a community, we undertake the obligation to provide something that our community can either trust or challenge &#8211; thus, collectively advancing our understanding.</p>
</body>
<back>
<fn-group>
<fn id="n1"><p>Intersectionality is a term coined by Kimberle Crenshaw in 1989. It has to be of how interlocking systems of power affect those who are marginalised in society. To read more: <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.law.columbia.edu/news/archive/kimberle-crenshaw-intersectionality-more-two-decades-later">https://www.law.columbia.edu/news/archive/kimberle-crenshaw-intersectionality-more-two-decades-later</ext-link></p></fn>
<fn id="n2"><p>Some authors that discuss this issue further are Caroline Criado Perez in &#8220;Invisible Women: Data Bias in a World Designed for Men&#8221; and Rebekka Endler, in &#8221;Das Patriarchat der Dinge&#8221; (in German)</p></fn>
<fn id="n3"><p>bell hooks (1952-2021) was an American author, theorist, educator and social critic working mainly in writings on race, feminism, class and education. Her name is always written without capital letters.</p></fn>
</fn-group>
<sec>
<title>Conflict of interest</title>
<p>Izadora Silva Pimenta is a managing editor for ing.grid. This Data Management Letter does not necessarily reflect the opinion of ing.grid.</p>
</sec>
<ref-list>
<ref id="B1"><label>[1]</label><mixed-citation publication-type="journal"><string-name><given-names>S.</given-names> <surname>Kantayya</surname></string-name>, <source>Coded bias</source>, <year>2020</year>.</mixed-citation></ref>
<ref id="B2"><label>[2]</label><mixed-citation publication-type="webpage"><article-title>&#8220;Feminist Data Manifest-No.&#8221;</article-title> (n.d.), [Online]. Available: <uri>https://www.manifestno.com/home</uri> (visited on 07/03/2024).</mixed-citation></ref>
<ref id="B3"><label>[3]</label><mixed-citation publication-type="webpage"><article-title>&#8220;The CARE principles for indigenous data governance.&#8221;</article-title> (n.d.), [Online]. Available: <uri>https://www.gida-global.org/care</uri> (visited on 07/03/2024).</mixed-citation></ref>
<ref id="B4"><label>[4]</label><mixed-citation publication-type="journal"><string-name><given-names>S. R.</given-names> <surname>Carroll</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Garba</surname></string-name>, <string-name><given-names>O. L.</given-names> <surname>Figueroa-Rodr&#237;guez</surname></string-name>, <italic>et al.</italic>, <article-title>&#8220;The CARE principles for indigenous data governance,&#8221;</article-title> <source>Data Science Journal</source>, vol. <volume>19</volume>, pp. <fpage>43</fpage>&#8211;<lpage>43</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="B5"><label>[5]</label><mixed-citation publication-type="webpage"><string-name><given-names>S.</given-names> <surname>Ciston</surname></string-name>, <article-title>&#8220;A critical field guide for working with machine learning datasets,&#8221;</article-title> <source>Knowing Machines project</source>, <string-name><surname>Mike Ananny</surname>, <given-names>K. C.</given-names></string-name>, Ed., <year>2023</year>. [Online]. Available: <uri>https://knowingmachines.org/critical-field-guide</uri> (visited on 07/03/2024).</mixed-citation></ref>
<ref id="B6"><label>[6]</label><mixed-citation publication-type="webpage"><string-name><given-names>S.</given-names> <surname>Samuel</surname></string-name>. <article-title>&#8220;Women suffer needless pain because almost everything is designed for men.&#8221;</article-title> (<year>2019</year>), [Online]. Available: <uri>https://www.vox.com/future-perfect/2019/4/17/18308466/invisible-women-pain-gender-data-gap-caroline-criado-perez</uri> (visited on 07/03/2024).</mixed-citation></ref>
<ref id="B7"><label>[7]</label><mixed-citation publication-type="webpage"><collab>&#8220;ing.grid author guidelines.&#8221;</collab> (n.d.), [Online]. Available: <uri>https://www.inggrid.org/site/authorguidelines/</uri> (visited on 07/03/2024).</mixed-citation></ref>
<ref id="B8"><label>[8]</label><mixed-citation publication-type="book"><string-name><given-names>J.</given-names> <surname>Firth</surname></string-name>, <chapter-title>Studies in Linguistic Analysis: Special Volume of the Philosogical Society</chapter-title> (<source>Special Volume of the Philological Society</source>). <publisher-name>Blackwell</publisher-name>, <year>1957</year>.</mixed-citation></ref>
<ref id="B9"><label>[9]</label><mixed-citation publication-type="journal"><string-name><given-names>M. A. K.</given-names> <surname>Halliday</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Hasan</surname></string-name>, <article-title>&#8220;Language, context, and text: Aspects of language in a social-semiotic perspective,&#8221;</article-title> <year>1989</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>