It has been a while since I released the initial version of RecommendationAssistant, my Master’s Thesis in coorperatin with Nextcloud GmbH and Frankfurt University of Applied Sciences. Today, I am happy to announce version 1.0.0.
What has been changed?
Most of the work is on how recommendations are being calculated. Earlier this year, I dropped a recommendation type as it was not as reliable as expected. The remaining recommendation algorithm compares all files in a NxN matrix and therefore, I tried to add efficiency using algorithms and data structures.
Problems and Issues
One finding is now the final realization that the tools and frameworks used are not compatible with the tasks of Machine Learning and this app. First of all there is PHP, which does not support threading and executes everything on the main thread. The recommendation algorithm is a Nextcloud Background Job which is a series of jobs that is executed in a loop (on the main thread). The recommendation background job blocks all other jobs until it is finished and leads to bad user experience.
Secondly, which follows from the first, there are no official, well-tested frameworks for complex mathematical formulas as I need for Collaborative Filtering (the theory behind recommendations). This leads me to create the frameworks (or at least some classes) myself, which means a higher probability of bugs (e.g. I had to test edge cases/base cases).
And thirdly, according to the nature of Nextcloud, there is a lack of test data. Nextcloud is known for “being a safe home for all your data”. There is no Nextcloud SaaS (at least no official), as with it is at Dropbox, where you can simply set up a test instance and run your algorithms.
Furthermore, recommendation algorithms rely on data that are open to everyone and explorable. For instance, imagine Netflix, you have a endless bunch of movies and series where you loose yourself when you want to explore. Nevertheless, you have access to each film and you can see the ratings of other users.
In a Nextcloud environment, you do not have this opportunity. Users share files with you and you have your own files. But the number of files shared with you is limited and it usually does not make sense to get your own files recommended.
Really, no test data?
Well, no. There is the MovieLens data set which is mainly used for recommendation related stuff. Using the mini data set which has nearly 100,000 ratings applied to 9,000 movies by 600 users, PHPUnit requires about 13 minutes (do you see problem #1?) to create recommendations.
Nevertheless, I tried to represent the movies as files, the users as Nextcloud users and stored the appropriate ratings in the database. Using occ, I have created users and files using SabreDAV from the MovieLens data set. Since RecommendationAssistant converts the “last seen date” of a file into a rating, it was necessary to perform backwards engineering to get the time stamps.
At this point, however, I reached a dead end. Nextcloud needs to grant access to files for users (a user shares a file with another) – it is not enough that they are still in the database. This would mean that I have to touch every single file manually provided by 9,000 movies and have to share it with 600 users. RecommendationAssistant can not reach any file in the user’s root folder (except the one who owns the files). And thus I was not able to test the algorithm in the Nextcloud environment.
What is next?
I am grateful that Nextcloud GmbH gave me the chance and trust to complete my Master’s thesis at their company. I really wanted to create a useful app that serves as a recommendation tool and introduces Machine Learning to Nextcloud. Unfortunately, this was not possible. I am at the point where I have to admit: PHP, Machine Learning and computation load are not compatible to each other.
A good way to integrate Machine Learning into Nextcloud would be an external framework that is implemented in a programming language that supports Machine Learning, such as Python or Java. This framework communicates then with Nextcloud via an API. The Nextcloud apps can request data the same way.
With version 1.0.0, I will end the work on this app. I will not add any additional features due to the problems mentioned above. I think it would be better to focus on other open source work and projects (may be another Nextcloud app?!). However, I will fix bugs and answer to user requests, if necessary.
This project is dead! I declare this project for dead. Pull requests are accepted. Use the software on your own risk.