Need help with format-corpus?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

openpreserve
133 Stars 38 Forks 281 Commits 4 Opened issues

Description

An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.

Services available

!
?

Need anything else?

Contributors list

format-corpus

An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.

All items, apart from the source code under 'tools', is CC0 licenced unless otherwise stated. The source code is Apache 2.0 Licenced unless otherwise stated.

A recent summary of the contents of the repository can be found here.

How to Contribute

See http://wiki.curatecamp.org/index.php/CollectingformatIDtestfiles for more information.

See metadata-template.ext.md for a simple per-file metadata template.

Pooled Signatures

As well as pooling example files, we also pool format signatures:

  • Tika signatures staged here: https://github.com/openplanets/format-corpus/tree/master/tools/fidget/src/main/resources/tika-bl-staging
  • Tika signatures later merged here: [https://github.com/openplanets/format-corpus/blob/master/tools/fidget/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml here]
  • DROID signatures go [https://github.com/openplanets/format-corpus/tree/master/tools/fidget/src/main/resources/droid here].

More details here: http://wiki.curatecamp.org/index.php/ImprovingformatID_coverage

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.