by marcoramilli

marcoramilli /MalwareTrainingSets

Free Malware Training Datasets for Machine Learning

140 Stars 72 Forks Last release: Not found 17 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:


Please check it out:

For an updated followUP please check it out:

Cite The DataSet
If you find those results useful please cite them :

@misc{ MR,
   author = "Marco Ramilli",
   title = "Malware Training Sets: a machine learning dataset for everyone",
   year = "2016",
   url = "",
   note = "[Online; December 2016]"

UPDATE Many people asked me about the scripts I used to generate MIST-Modified JSON. So here there are ! (take a look to scripts section). You might use
as a reporting module from CuckooSandbox and the script
to generate ARFF files suitables for WEKA.

If you are going to create new datasets by running your local CuckooSandbox using
module and you wanto to share them, please feel free to make pool requests !

If you want to know more about the working flow, please check this update:

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.