Kea Kohv, Master Student in Data Science

An interactive tool developed by Data Science Master’s student Kea Kohv helps find highly rated and popular data science books sold on Amazon.

A query for data science books on Amazon returns thousands of results. That is a lot to skim through. In addition, if searching for highly rated books on Amazon, the threshold for an average review can be set to be 4 stars and up, not higher. Furthermore, the comparison of books is not intuitive because there is no compact visual representation of books and how they compare to each other in terms of popularity and average star reviews.

This interactive tool (in Estonian) helps to explore books on a specific subject or just in the field of data science in general. It encompasses an interactive graph where books are represented as bubbles. The graph has filters for the average review from 0 to 5 (in 0.2 increments), the number of reviews, the publishing year, and the text that should be included in the title.

In addition, the books are also displayed in table format with sorting and searching enabled. A click on the bubble in the graph or a table row takes you to the respective book’s Amazon page.

Try it out for yourself

Every bubble is a book. The bigger the bubble, the more popular the book, according to the number of ratings on Amazon. The lighter the bubble, the more highly rated the book.

Hover over the bubble to see more information. Click on the bubble or on the row in the table below to open a link to that book on Amazon.

The dataset

The dataset used for this data visualization project was the Amazon Data Science Books Dataset on Kaggle. Obtained through web scraping Amazon, it has metadata on about a thousand data science books sold on Amazon. However, around 800 remained after data cleanup. The dataset version 1 (September 2022) is used in this project, as the second version released in January 2023 has far fewer books.

As can be seen from the figure below, there has been a surge in published data science books since 2015.

Visualization methods

The project was written in R using the shiny and shinydashboard packages. The bubble graph was made with the ggplot2 package and turned interactive with the ggiraph package using the one-level “circle packing” approach. A ”circle packing” example can be found here.

A benefit of “circle packing” is that the bubbles do not overlap as they might with a scatterplot. The scatterplot is more useful for analyzing the trends and patterns in the data but if the plot is interactive and has many overlapping data points, “circle packing” is worth a try. Although “circle packing” is often used to represent hierarchical data structures, it can be one-level if there is no hierarchy in the data as is the case in this application.

The datatable was created using the DT package. Other interactive figures in the application were made using the Plotly package.

The Amazon link opening when clicking on the bubble was achieved with a very convenient function of the ggiraph package called onclick, which executes custom Javascript automatically when a bubble is clicked. The link opening with the datatable was achieved with the shiny package’s observeEvent function and the shinyjs package, which helps to execute Javascript in shiny applications.

The code is available on GitHub.

The application was created as a final project for the course “Data Visualization and Presentation” (LTAT.02.008). Have suggestions or ideas? Write to the author Kea Kohv at kohv.kea@gmail.com.