All papers are data papers: from open principles to digital methods

2020-07-22

My proposal to DH2020, accepted in the long paper category. The in-person conference was cancelled and as of 2021 I have not revisited this proposal yet.
Perret, Arthur et Le Deuff, Olivier (2020). « All papers are data papers: from open principles to digital methods ». DH2020 Book of Abstracts.

How do we bridge the gap between ambitious global schemes, such as Paul Otlet’s “Aims of documentation”Otlet, Traité de documentation, 1934.
or the FAIR data principlesWilkinson, Dumontier, Aalbersberg, et al., “The FAIR Guiding Principles for scientific data management and stewardship,” March 2016.
, and existing information practices? We describe the theoretical basis and practical steps for a subject-oriented approach to this problem, examining data-related expectations through the lens of documentarity.

In 1934, Belgian bibliographer Paul Otlet published a Treaty of documentation in which he outlined the “Aims of documentation”:

“Universal as to their purpose; reliable and true; complete; fast; up to date; easy to obtain; collected in advance and ready to be communicated; made available to the greatest number of people.”Otlet, cited, p. 6.

In 2016, the FAIR principles were published along similar lines:

“To be Findable; to be Accessible; to be Interoperable; to be Reusable.”Wilkinson, Dumontier, Aalbersberg, et al., art. cit., p. 4.

They differ in some ways: Otlet viewed the Aims as a whole, with openness as a critical element, while FAIR is modular and not necessarily synonymous with open data. But more importantly, they both describe a plan which is meant to precede and guide implementation. Otlet’s Aims are broken down into goals related to the actual “biblio-technie” or “bibliothéconomieOtlet, cited, pp. 372–375.
; similarly, each of the 4 components of FAIR is itself divided in 4 sub-components which delve into technical matters (e.g. data vs. metadata). These are actionable steps to be applied in the field, which is where trouble begins.

During and after his time, close collaborators and distant peers alike noted the gap between Otlet’s ambitions and what he was able to achieve: Valère Darchambeau commented on “Mr. Otlet’s mental audacities, his utopias some would say” (Mundaneum archives, PP P0 462); Suzanne Briet called him ironically “the magus” of documentation. Indeed, he had a major impact on the institutionalization of documentation—the development of Library and information science (LIS) in Europe owes much to section 4 of his Treaty—but his work on the relationship between subject and knowledge was largely neglected. The techno-semiotic mediations of information have been far less studied in LIS than human ones; we can arguably trace this back to Otlet’s incomplete legacy. Conversely, the implementation of FAIR principles quickly raised the issues of user experience, expectations and metrics:

“FAIRness is aspirational, yet the means of reaching it may be defined by increased adherence to measurable indicators . . . metrics that reflect the expectations of particular communities.”Wilkinson, Sansone, Schultes, et al., “A design framework and exemplar metrics for FAIRness,” December 2017, pp. 1–2.

The interface between person and information seems much thinner for computer-held data than for library books. While this is not actually true (mediations have simply shifted towards human-computer interaction), it means that the feasibility of principles is challenged almost immediately by subjective experience. Data may be FAIR but people may differ: they do not all work on the same data or with the same mindset and therefore have different expectations. This shapes the way we assess data within the framework of documentation and therefore its value to us—its documentarity.

Documentarity is the product of interdisciplinary theoretical work, at the intersection between ontology, documentation and linguistics. The first two influences have been studied: documentarity can be seen as a philosophy of evidence based on documentationDay, Documentarity, 2019.
and also as the quantifiable documentary quality of things, with applications to digital documents and dataPerret and Le Deuff, “Documentarité et données, instrumentation d’un concept,” 2019.
. Here, we examine the third influence: how linguistics contribute to documentarity as an epistemological proposal which at the core focuses on the reception of information. We show that documentarity is linked to several works: Roman Jakobson’s “literaturnost,” which in FrenchLittérarité,” Jakobson, Huit questions de poétique, 1977, p. 16.
is very close to documentarity (“documentarité”); Hans Robert Jauss’ adaptation of horizons of expectation (“Erwartungshorizont”) to literatureJauss, “Literary History as a Challenge to Literary Theory,” 1970.
; the shape of enunciation with Mary-Ann Caws “architexture”Caws, The eye in the text, 1981, p. 10.
and Roger Laufer’s “scripturationLaufer, “L’énonciation typographique,” 1986, p. 75.
.

This array of concepts is dense but its purpose is coherent: we draw from the phenomenology of the reading process to make better sense of the way we assess computer-held data. Our methodology is to track the embodiment of thought in technological mediations, especially in writing. The usefulness of such an approach has been described for the study of information as experienceGorichanaz, “Auto-hermeneutics,” January 2017.
. We argue that the way we perceive the documentarity of data is shaped by our horizons of expectation, especially previous experience of genre-based rules which me must establish if we wish to prevent global principles from falling into abstraction as soon as they enter the field.

In this perspective, digital notebooks form a stimulating case study, highly relevant to the conference’s theme on open data. They relate to a tradition and to new practices (data science, data papers). We analyze the way data is presented and interacted with in R, Python and Javascript-based notebooks, and we observe a reflexive impact on our perception of documentarity: it allows us to relate more practically to the intellectual framework behind Otlet’s “Aims of documentation” and the FAIR principles, which could improve their adoption. Through reproducibility and replicability, the practice of the notebook informs us on the relationship between data and truth. It also underlines the status of text as the most basic and universal type of data in science: the way text is handled in notebooks (lightweight markup languages, integration of standards, automation) shifts our perception of ‘text’ to ‘textual data.’ This is independent from the field of study: we suggest that any research built from plain text can be considered a data paper and that extending “FAIRness” to scientific writing in general would be an epistemological breakthrough in scientific communication.

References

Caws, Mary Ann. The eye in the text. Princeton University Press, 1981. 978-0-691-01377-0. http://archive.org/details/eyeintextessayso0000caws.
Day, Ronald E. Documentarity: Evidence, Ontology, and Inscription. MIT Press, 2019. 978-0-262-04320-5.
Gorichanaz, Tim. “Auto-hermeneutics: A phenomenological approach to information experience.” Library & Information Science Research. January 2017, Vol. 39, no. 1, p. 1–7. https://doi.org/10.1016/j.lisr.2017.01.001.
Jakobson, Roman. Huit questions de poétique. Éd. du Seuil, 1977. 978-2-02-004680-0.
Jauss, Hans Robert. “Literary History as a Challenge to Literary Theory.” New Literary History. 1970, Vol. 2, no. 1, p. 7–37. https://doi.org/10.2307/468585. JSTOR.
Laufer, Roger. “L’énonciation typographique.” Communication et langages. 1986, no. 68. https://doi.org/10.3406/colan.1986.1762.
Otlet, Paul. Traité de documentation. Le livre sur le livre. Les Impressions nouvelles (2015), 1934. 978-2-87449-299-0.
Perret, Arthur and Le Deuff, Olivier. “Documentarité et données, instrumentation d’un concept.” In : 12ème Colloque international d’ISKO-France : Données et mégadonnées ouvertes en SHS : de nouveaux enjeux pour l’état et l’organisation des connaissances ? 2019. https://hal.archives-ouvertes.fr/hal-02307039.
Wilkinson, Mark D., Dumontier, Michel, Aalbersberg, IJsbrand Jan, Appleton, Gabrielle, Axton, Myles, Baak, Arie, … Mons, Barend. “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data. March 2016, Vol. 3, p. 160018. https://doi.org/10.1038/sdata.2016.18.
Wilkinson, Mark D., Sansone, Susanna-Assunta, Schultes, Erik, Doorn, Peter, Santos, Luiz Olavo Bonino da Silva and Dumontier, Michel. “A design framework and exemplar metrics for FAIRness.” bioRxiv. December 2017, p. 225490. https://doi.org/10.1101/225490.