Need help with MalwareTrainingSets?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

168 Stars 87 Forks 18 Commits 1 Opened issues


Free Malware Training Datasets for Machine Learning

Services available


Need anything else?

Contributors list

No Data


Please check it out:

For an updated followUP please check it out:

Cite The DataSet
If you find those results useful please cite them :

@misc{ MR,
   author = "Marco Ramilli",
   title = "Malware Training Sets: a machine learning dataset for everyone",
   year = "2016",
   url = "",
   note = "[Online; December 2016]"

UPDATE Many people asked me about the scripts I used to generate MIST-Modified JSON. So here there are ! (take a look to scripts section). You might use
as a reporting module from CuckooSandbox and the script
to generate ARFF files suitables for WEKA.

If you are going to create new datasets by running your local CuckooSandbox using
module and you wanto to share them, please feel free to make pool requests !

If you want to know more about the working flow, please check this update:

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.