The article was translated and published in English by the University of Tartu. It was initially published in Estonian in Eesti Ekspress by Greete Lehepuu.

The cover of the September issue of Communications of ACM, the most prestigious journal in computer science, featured an article led by University of Tartu researchers. The article proposes steps to tackle one of the biggest challenges of the future of computer sciences and the digital world: how to manage ever-increasing data volumes and data presented as graphs.

Sherif Sakr (1979-2020) at a Data Science Seminar. Photo: Henry Narits

Graphs are invisible multidimensional networks through which the data moves and in the nodes of which computations occur. They allow to do searches and receive answers by monitoring connections that describe the links between the data. Much of the data currently produced (in increasing volumes) is presented as graphs.

Jaak Vilo, Professor of Bioinformatics of the University of Tartu, gives a simplified example of the so-called ‘communication nodes’: “A usual database can be understood as one or more Excel tables with rows, columns and relationships between tables. You can search for a specific cell: for example, when did Greete speak to Jaak? Graphs, however, allow a much wider search: in addition to seeing that Greete and Jaak talked to each other, we can also see who both have spoken to at other times and where their acquaintances overlap. Data networks emerge.” To make it clear: the graphs themselves, of course, do not monitor people’s lives.

According to Vilo, we do not yet have a good infrastructure that could handle the ever-expanding graph network (maintaining reasonable energy use). An even bigger question is: how to make maximum use of graphs? Figuratively, the data no longer fits into individual computers and it is difficult to find the necessary information quickly.

The answer lies not in a single comprehensive application but in a general communication system between the graphs, which must also withstand growth. This is what the representatives of the international community led by the UT researcher offer answers to in the article. These involve machine learning, automated data analysis, the creation of an innovative infrastructure for moving data, etc.

“In the past, it seemed a utopian future that people could have whole libraries of information at their fingertips. But look at us now. The internet brings us all this, as well as phone calls, videos, friends’ holiday pictures and even a bunch of cat photos,” says Vilo.

The future lies in the ‘graph library’, making it possible to find unknown links between phenomena or data points. The system would benefit the entire connected world, from the pharmaceutical industry to the organisation of supply chains. Vilo gives a topical example. Pharmaceutical companies have tested thousands of molecules and chemical elements for different diseases. An organised graph system would allow to quickly search whether a compound exists that could also be usable in procedures related to COVID-19 treatment.

The article’s first author is Sherif Sakr, a researcher at the University of Tartu, who unfortunately passed away last spring. Co-authors from the Institute of Computer Science are Riccardo Tommasini, Lecturer of Data Management and PhD student Mohamed Ragab. The article was published posthumously, and according to Vilo, it is also a homage to Sakri’s research work. Her widow, Associate Professor Radwa El Shawi and Ahmed Awad, the new head of the UT Data Systems Research Group, will continue to research Sheriff’s vision of the connected world and big data.

The journal Communications of ACM is intended for academics and teaching staff as well as research staff of major technology companies, a total of more than 100,000 professionals. Thus, the article by Tartu researchers is likely to be discussed also in Google, Amazon and Microsoft.

Riccardo Tommasini’ research is financed by the European Social Fund.