QUEST Blog Beitrag: The wild west of data sharing (in Englisch) - Aktuelles

Beitrag von Dr. Evgeny Bobrov

It’s 2078. A PhD student is preparing an abstract for a conference on research integrity. She works on the detection of viruses, which crawl scientific data and introduce errors, a practice known as crawltoxing. Nobody seems to know where these viruses come from, but some blame governments and companies for destroying open data once they have been harvested. She closes her eyes, focuses and imagines right arrow two zero seven eight black-square open black square data white triangle landscape, which gives her the search instantly, in landscape format. However, she was a bit hasty this time. The eegreader-cap detects a one instead of a seven, and she ends up with the open data landscape of 2018. Immediately the fragmentation of the information catches her attention. There seem to have been no such thing as a “landscape” in 2018, concepts and ideas are all scattered. She is used to follow rivers and climb hills, helping her to intuitively grasp currents of thought, peaks of attention, simply how everything is related. But here she just sees barely more than specks of text scattered over a generic lawn. But she is also fascinated. Giving up any hope of getting a real overview of the open data landscape, she simply walks around, looking here and there. What she sees makes her frown, or laugh, or both. Some things are plain embarrassing. Apparently even the best data engines of the time did not search the actual data, but some random bits, spread over them like powdered sugar. In fact, she reads there were many of those, but hardly used, not being able to detect much. There was any kind of data sharing imaginable - typically data dumps where data stood for themselves in so-called “sets”. Incompatibility detectors became obsolete a few years ago, but she still has one installed – the values she gets are dismal. She skims through some documentation, finding it cute how most researchers at the time made honest efforts, writing extensive texts no machine could ever understand. People were basically annotating the way they spoke! She wonders, why people made the effort at all. She is even more perplexed, discovering how much manual work was required to share even the most simple dataset. People thought up their own keywords or looked through long tables. Impressively, there were whole conferences dedicated to settling on common terms. But gosh, people physically went to these conferences on oil-burning planes‽ This was outrageous! But probably it was considered acceptable at the time, she pondered. Like driving oneself, or even eating animals… In the eighteen-hundreds it might have been just fine to shoot someone and then ride away on a horse. Times change. She continues, discovering that it was allowed to share data with friends only, another strange concept. But she can’t really blame them, or those not sharing at all, if apparently everyone was supposed to do so much manual work! Among a myriad of alternatives, finding appropriate file names, appropriate lab books, appropriate plot types, appropriate places for data storage, appropriate data to link to, appropriate…well, anything appropriate really, must have been really time consuming. For the first time, she fully grasps why historians call this period the “manualistic”. Increasingly exhausted from this pandemonium, she is quite relieved to be told by her machine that she has spent enough time on this. She should get her abstract done, or else she might have elevated stress levels from 3.15 pm onwards.