Need help with YouTube-Like-predictor?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

140 Stars 36 Forks 19 Commits 0 Opened issues


YouTube Like Count Predictions using Machine Learning

Services available


Need anything else?

Contributors list

# 18,537
18 commits

YouTube Like Count Predictor

This a tool for getting youtube video like count prediction.A Random Forest model was used for training on a large dataset of ~3,50,000 videos.Feature engineering,Data cleaning, Data selection and many other techniques were used for this task.


contains a detailed explanation of different steps and techniques that were used for this task.

Tools Used

How to run :

  1. Clone this repo

      $ git clone
      $ cd PS17_Ayush_Singh
  2. Create new virtual environment

      $ sudo pip install virtualenv
      $ virtualenv venv
      $ source venv/bin/activate
      $ pip install -r requirements.txt
  3. Predictions

    There are two ways for getting the prediction results.

    3.1. Training the model and run prediction

    $ cd model
    $ python

    This will save a

    file in the same folder,Training takes ~18 Mins.Then run
    $ python 

    for ex:

    $ python dOyJqGtP-wU ASO_zypdnsQ wEduiMyl0ko

    3.2 From pretrained model

    A pretrained model has been uploaded on dropbox.Download model(~500MB) from the link.

    Unzip the

    file in the
    $ cd model
    $ python 
    for ex:
    $ python vid1 vid2 vid3]

Note: List can contain a maximum of 40 Video IDs at the time of run.

Code Details

Below is a brief description for the Code files/folder in repo.


This folder contains scripts which were used to fetch data using Youtube API and populatin the base.

$ cd data

The script uses Youtube Search API for extracting Video IDs for the last 7 years(2010-2016).It gives Approx. 22,000-24,000 Video IDs for every category and stores them in a Pickle files for different categories.

$ python

The script use the Video IDs saved by
and further extract different video related attributes using Youtube API and saves the data Dictionary in pickle format.
$ python

The script is used to further collect data for all channels present in the video dataset.It makes use of the data stored for videos to extract channelIds.

$ python

The script is used to scrape social links

$ python

Note : Due to large amount of data to be extracted for different attributes,the extraction was done at different levels therefore it was not viable to make a single script for data collection which could make debugging a little messy.


This folder contains ipython notebooks which contain implementation for merging different data extracted and tasks like Data cleaning and processing.

$ jupyter notebook


The notebook has the implementation for making new derived features.


This notebook contains data processing implementation for data cleaning and encoding processes.

Note : The final data generated after all processing has been uploaded in

has the data which is used for training the model.


This folders contains scripts used for training,tuning model and getting the prediction results.

This script generates the tuned parameters for estimator using Grid Search and Cross Validation.

$ python

This script is used for training the model over training data (

) Because of Bootstrap Sampling in random forest the results migght vary after every trainig process.
$ python

This script returns the Like count prediction along with the difference and the Error rate

$ cd model
$ python 
for ex:
$ python [vid1,vid2,vid3]


A very common issue comes with the pickling process which sometime leads to loss of information and different results every time.


1 2 3 4 5 6 7 8

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.