Need help with SafeText?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

DavidJacobson
133 Stars 11 Forks MIT License 24 Commits 3 Opened issues

Description

Script to remove homoglyphs and zero-width characters to allow for safe distribution of documents from anonymous sources.

Services available

!
?

Need anything else?

Contributors list

# 445,802
Jupyter...
HTML
PHP
forensi...
18 commits
# 399,793
Python
forensi...
love2d
Lua
1 commit
# 592,490
Python
forensi...
JavaScr...
1 commit
# 6,926
prose
testing...
Racket
shortes...
1 commit

SafeText

Tool to sanitize text to allow for safe distribution of documents from anonymous sources by removing zero-width characters and homoglpyhs.

Individuals attempting to leak an email or other text file face the risk of identification through fingerprinting. Fingerprinting often occurs when the original distributor of the document has embedded some form of a canary. For example, Elon Musk's email in 2008 in response to leaks featured slightly different wording for each employee. This tactic was realized by the employees, and failed. An easier tactic that is also employed, is the presence of nearly invisible changes to the text. SafeText is designed to identify and remove these changes. Specifically this tool will remove homoglyphs, zero-width characters, and other subtle characters. This tool will also attempt to identify unique spelling of words that could give away an individual's location.

Usage

To use SafeText, call:

shell
python safetext.py inputfile
Example output is: ```shell λ python safetext.py TestFile.txt [] Cleaning TestFile.txt to TestFile.txt.safe ... [!] FOUND HOMOGLYPHIC CHARACTER CYRILLIClargeH ON LINE 1 The message said: "(Н)ey, let's hang out!" [!] FOUND a SPACE ON LINE # 2 LoremIpsumDolorSit [!] WARNING - Use of spelling (colour) that identifies country on line 3 [!] FOUND HOMOGLYPHIC CHARACTER GREEKB ON LINE 5 [!] FOUND HOMOGLYPHIC CHARACTER GREEKC ON LINE 5 Subject: (Β)udget (Ϲ)uts [*] Output file closed
Note: The relevant characters will be underlined - not enclosed by parentheses. 
SafeText will output to infile.safe. 

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.