Estonian researchers created a unique and vital search engine for geneticists

The article was initially published in Ekspress by Greete Lehepuu.

The app works like Google Maps: given genetic coordinates associated with a disease it finds which gene is likely to cause the disease and in which tissue.

Imagine that the only way to find your way in an unknown city is to buy the map made by that specific city. The map is sold exclusively with the permission of the city government. And it is necessary to apply for the permission. Each city has its own map symbols for sights, shops, hotels, roadworks, etc. Most people would probably randomly drive around in the city just to skip the boring and burdensome process of obtaining the map. At the same time risking with not finding everything that is important and getting quite a wrong impression of the city. In real life, we are lucky to have all kinds of maps and atlases, and also Google Maps. Similar in every country and always understood in the same way.

Until recently, it was exactly as complicated for researchers and doctors to find associations between diseases and genetic variants from the genetic data. Several countries have biobanks which include important and interesting information but obtaining access to them meant spending weeks or even months applying for permissions. Genetic data are sensitive – with some effort, it would be possible to identify a specific individual, which is why it is understandable that the data are not shared with everyone and on every request. Such obstacles, however, also slow down science. The more there are data, the more reliable are the research results (and the more studies can be conducted), which is why researchers make such enquiries frequently.

Also genetic researcher Kaur Alasoo, Lecturer of Bioinformatics at the University of Tartu, was facing this problem a few years ago as a doctoral student. He studied immune cells and created them himself for research in the laboratory settings. He hoped to get comparative data from other biobanks where similar experiments had been made. Unfortunately, the hope did not materialise. All in all, it would have taken three (!) years to get the data; a separate query had to be sent to each database, and the wait for responses took an eternity… By that time, his own data and results would have been outdated already. The dataset for his doctoral thesis remained smaller than he would have liked.

“This “failure” was a motivator that urged us three years ago to start creating our own application,” says Alasoo. They tried to solve two problems at a time: reduce bureaucracy (without compromising security) and raise the speed of research. In cooperation with several researchers, the database was created and a search engine in which you enter the coordinates of a genetic variantand a list of genes regulated by that variantpops up.

A great portion of the work was done by Nurlan Kerimov, a doctoral student supervised by Alasoo, for whom the app is part of the doctoral thesis. All in all, 20 researchers contributed to creating the app and also European Bioinformatics Institute was involved as a partner. A total of 29 datasets, i.e. about 30,000 samples from 7,000 people were collected into the app. Including from 600–700 gene donors from Estonia.

“Our advantage is probably that we have jumped through the bureaucratic hoops, which no one else has bothered to do. We can automate data processing, and devise smart solutions and convenient workflows for that purpose. However, no one has managed to automate bureaucracy,” Alasoo says, smilingly. Now, the gene app also spares others from the wheels of bureaucracy.

Following all the data protection rules proved to be the most complicated task. This is why it is possible to search and find only those associations that exist between genetic variants and genes. The raw data of biobanks is not available for everyone to see or download.

The Open Targets public-private partnership launched together with pharmaceutical companies also gave its impetus to creating the app. Pharmaceutical companies are very much interested in scientific research that studies the effect of a particular gene on a disease. For example, there have certainly been a number of studies on type 1 diabetes, but finding the studies and comparing and requesting the underlying data is another tedious task.

About a decade ago when scientists discovered the relationship between a single gene and a disease, a news article was written about it. Today, such discoveries are no longer newsworthy. It has become clear that everything going on in our body is caused by genes in one way or another.

Kaur Alasoo gives two of his favourite examples. “The best example is lactose intolerance or the inability to digest milk sugar. It is caused by an insufficient amount of a specific enzyme – lactase – in the intestines to break down lactose. Lactase is an enzyme, a protein, encoded by a specific gene in the DNA. The gene is also called lactase (LCT). The gene is functional in nearly all people – if it were defective, they would not be able even to drink breast milk. The difference results from the fact that a genetic variant determines the amount of lactase produced by the body. A person with lactose intolerance has the variant which does not produce enough lactase.”

It is a well-known fact but also a good opportunity to test whether the app of the Alasoo team works. “Our database, for example, reveals that the gene variant associated with lactose intolerance affects the expression of the lactase gene and this association can only be seen in intestinal samples. The variation does not seem to have any effect, for example, in muscle tissue or blood,” Alasoo adds.

Or take the common question why one person’s skin gets darker in the sun than another’s. This is important to know because those who tan less have higher risk of sunburn, which in turn increases the risk of developing skin cancer. The answer is in the genes, naturally. To put it very simply, in some people the gene that produces larger quantities of the so-called tanning molecule is more expressed.

If information about genes and enzymes is available, pharmaceutical companies can experiment whether a certain molecule could be used to specifically target the proteins which cause a characteristic or a disease. In principle, it would be possible to create a drug to stimulate the enzyme that makes our skin tan in the sun, and then sell it to people whose gene variant does not allow to produce so much of the enzyme. “This is still science fiction that someone would get a tan from pills, but who wouldn’t like to get rid of greasy sun screens?” Alasoo says with a smile.

This is the future. The skin-bronzing “medicine” could be the future of the cosmetics industry, while for example the pharmaceutical industry is working on cancer medicine. Thanks to genetic data it is no longer necessary to waste time and money on finding out which gene affects a certain disease. Applying the same principle would also allow faster and more precise disease prevention.

“Now also pharmaceutical manufacturers can check out the effect of a genetic variant associated with a disease in a particular cell type, see which gene it affects, and then already test new drugs – what happens when the molecule or medicine targeting this particular gene is administered to the cells,” Alasoo explains.

The first version of the application was completed about 18 months ago, and now doctors and colleagues have tested and praised it. Also, the app has already been referred to in new scientific articles. “Many colleagues have said: someone has finally made this app! It is a tedious work and many people have thought it is something they should do at some point,” Alasoo said. The fact that Tartu researchers did this boring task has given them a competitive advantage. It will not make them rich but it is a solid landmark for their CV and gets them exciting project proposals.

Alasoo hopes the application will also be helpful when answering the next big question of genetic science and medicine: how the human body works. Why one person’s skin gets darker in the sun than the other’s or why some are at a higher risk of Alzheimer’s disease. As the data is now conveniently available (and probably more will be added), we can hope to get answers more quickly, based on reliable analysis.

A scientific article about the application was recently published in the world’s most reputable human genetics journal Nature Genetics.

Why are not all gene donors’ data included in the app?

Taking a DNA sample from gene donors and including it in a database is not a complicated task. Just a blood sample is needed and it is not very difficult to ascertain the genetic variants that the person is carrying (for example, whether the person has a genetic variant that enables to digest milk). However, the DNA does not tell us how active the gene is (how strongly the gene that enables to digest milk is expressed), and this can be found out from mRNA, which is highly unstable and degrades quickly at temperatures over -80 °C (remember the complex logistics of mRNA vaccines? The reason is the same). In addition, to measure the effect of the gene, its tissue of expression is needed. And if the gene activates something in the brain, you cannot take brain tissue from the gene donor… Some samples are easy to take, others can only be obtained post-mortem. Still others can be grown in the lab, just like Alasoo grew the immune cells for his doctoral thesis. Anyway, the work is expensive, time-consuming, and that is why the data from the biobanks are particularly valuable. “The beauty of genetics lies in the fact that we can do quite a lot by using samples from healthy people. For many variants that increase the risk of a disease, the effect can also be detected in healthy people,” says Kaur Alasoo.

Video: Data Science Seminar – Autonomous Vehicles

Preparing for ERC Grant Schemes Takes Time and Focus

Hina Anwar: Towards Greener Software Engineering Using Software Analytics