Review-Based-Recommendation-System

OBJECTIVE

To build web application for product recommendations to customers based on their reviews at the online grocery store. The web application is a prototype to mimic a real time application where the recommendations are rendered with product images for a given input string as a review.

Check it out live here

Data

The dataset was obtained from Amazon review dataset released in 2014, provided by UCSD. The dataset contains 287,209 products with 5,074,160 reviews and ratings by 1,57,386 unique users

Technologies

Python - Scikit-learn, Pipeline
Flask
Docker
PowerShell
Heroku

Algorithm

We use the k-means algorithm to cluster all the products based on the reviews. The features would form the unsupervised clusters based on TF-IDF scores of the text.

How do we do that ?

Each product's reviews are collected and concatenated as a single string.Thus, each product has the feature set of tf-idf scores for the concatenated string of reviews. Further the tf-idf scores as a feature set is used to find the euclidean distance between selected points in space, thus allowing us to implement the k-means algorithm.

Based on the number of categories in the grocery store, we get to choose the number of centroids and bucket the products with a label. We would train the model, over iterations, allowing the clusters to move farther apart in space and save the model, i.e saving the centroid points in the space. Further, we pickle dump the dictionary of cluster labels and its corresponding products.

The conversion of a string into a tf-idf score and then find its nearest centroid point (cluster) would done using a pipeline function and saved into a joblib file.

What is TF-IDF score ?

Given a document(concatenated string of a product) in a corpus(across the reviews of all products), It tells how rarely a word occurs accross the corpus and how frequently it occurs in a that particular document.

Example for intution

Consider comparing reviews of chocolates. Let's assume there are three variants in chocolates available in the market.

Review for Variant 1 : This is the best choclate in the world.

Review for Variant 2 : I liked this choclate.

Given that similarity of two sentences here is based on Euclidean distance, the reviews would have closer distance due the presence of the word " Chocolate".

However, there would a be lot of noice and misallocations, but it's possibility is very less as the reviews for grociries would involve some amount of context to express the thoughts. Also we concatenate all the reviews for the product, which reduce the noise by considering the tf-idf scores for each word.

Architecture

The flow of data given a input string is explained in this section. The input string is passed on to the flask application to access pipeline which is saved on joblib file. The pipeline would return labels which is used by the lookup file to return the ASIN IDs. These ASIN IDs are used to generate product URLS which is used to post the recommendations.

Deployment

Once the model is trained with the dataset using the algorithm, we save the model pipeline for deploying flask application on local server or cloud.

Given a the flask app requires a set environment and further perform predictions through the saved model pipline, we consider building an image using docker. The docker container can be hosted on local system or on cloud as per requirements.

We build a docker image first, which could be containerized on local system and test the application. Building image makes it easy to flexibly build containers on any server.

However, you could also push the application directly in to the containers already facilitated by the cloud environments. We host the web application on Heroku container directly without building any image on local system.

Both the methods are illustrated below,

Docker Image

A docker folder is created on your local system with the requirements file. Check the folder here
Once the folder is in place, use powershell command line for building the image. On CLI we navigate to the folder we just created.
Building the image.
```
docker image build -t "recommsys".
```
Running the application with docker image
```
docker image build -t "recommsys".
```
Verify the image up and running without any errors and further validate for web applicationtio
Validating the application

Heroku container (Cloud deployment)

A docker folder is created on your local system with the requirements file. Check the folder here
Once the folder is in place, use command prompt as the Heroku CLI, installed on the system. We use heroku container and push the Docker folder and create the image on heroku to host the application.
Check if heroku is installed by following the below command, else check here
```
heroku
```
Now login into heroku container
```
heroku container: login
```
Create dummy application
```
heroku create recommsys
```

Now push to web directly from the docker folder directly

heroku container:push web --app recommsys

Now release the app to web,

heroku container:release web --app recommsys

Now open in web page, command line shortcut is
```
heroku open --app recommsys
```

Results

Here you could see, for a given input, you get presented with relevant recommendations.

References

https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
Data Science in Production: Building Scalable Model Pipelines with Python – Ben G. Weber
https://help.heroku.com/4RNZSHL2/
https://aws.amazon.com/
https://towardsdatascience.com/deploy-machine-learning-pipeline-on-cloud-using-docker-container-bec64458dc01
https://medium.com/analytics-vidhya/deploy-your-machine-learning-model-on-docker-ee2b931e133c
https://medium.com/analytics-vidhya/deploy-machinelearning-model-with-flask-and-heroku-2721823bb653

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
Code		Code
Demo		Demo
Docker		Docker
Images		Images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Review-Based-Recommendation-System

OBJECTIVE

TABLE OF CONTENTS

Data

Technologies

Algorithm

Architecture

Deployment

Results

References

About

Uh oh!

Releases

Packages

Languages

skotak2/Review-Based-Recommendation-System

Folders and files

Latest commit

History

Repository files navigation

Review-Based-Recommendation-System

OBJECTIVE

TABLE OF CONTENTS

Data

Technologies

Algorithm

Architecture

Deployment

Results

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages