Delivery-Date: Sun, 14 Dec 2014 23:15:29 -0500
Return-Path: <tor-talk-bounces@lists.torproject.org>
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on moria.seul.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
	RP_MATCHES_RCVD,URIBL_BLOCKED autolearn=ham version=3.3.1
X-Original-To: archiver@seul.org
Delivered-To: archiver@seul.org
Received: from eugeni.torproject.org (eugeni.torproject.org [38.229.72.13])
	(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by khazad-dum.seul.org (Postfix) with ESMTPS id 2DD7B1E019F;
	Sun, 14 Dec 2014 23:15:28 -0500 (EST)
Received: from eugeni.torproject.org (localhost [127.0.0.1])
	by eugeni.torproject.org (Postfix) with ESMTP id 3D75531E52;
	Mon, 15 Dec 2014 04:15:23 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by eugeni.torproject.org (Postfix) with ESMTP id 0B2BC31990
 for <tor-talk@lists.torproject.org>; Mon, 15 Dec 2014 04:15:20 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at 
Received: from eugeni.torproject.org ([127.0.0.1])
 by localhost (eugeni.torproject.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 4BGVEEhkyiaW for <tor-talk@lists.torproject.org>;
 Mon, 15 Dec 2014 04:15:19 +0000 (UTC)
Received: from mail-qg0-f46.google.com (mail-qg0-f46.google.com
 [209.85.192.46])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (not verified))
 by eugeni.torproject.org (Postfix) with ESMTPS id D34FB2A9F9
 for <tor-talk@lists.torproject.org>; Mon, 15 Dec 2014 04:15:19 +0000 (UTC)
Received: by mail-qg0-f46.google.com with SMTP id q107so5839064qgd.19
 for <tor-talk@lists.torproject.org>; Sun, 14 Dec 2014 20:15:17 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:content-type;
 bh=78U0idau5NDpadMn5DqeXls+IZDwovMf8+ScjebFbUE=;
 b=RHaQYhX8heWQNWgZx7YlruYKp1ziONNRvAORcnmjw1Kcto4WMc6s/GlOz4ADP6Mi5A
 E3xJmYyATxSAODimeuBjQ6AKLvz8jFmYPXyraBneaQbE6FWfGss7copMtQ9FiuGTlwQF
 4LS3a12m7SWAxfJXc1CUEIsegmtPb888k787YKQZi9Dfc38nx+E5X0Ohf4TyKDJONBzu
 o5dJINaoT1D9HnGPrFeHMlMmAFLnTwWO+auS+I79M3xF1G+E7uHkBdI76hDnv4OpWo6W
 kinFadUhLhMIf9XSsNB4TfvZ6qIMsDBY58bEMczSe6o2LhI3mNfKFE+705oGtR3OvOjo
 hhaw==
X-Gm-Message-State: ALoCoQmCxytVLhS1BWNlDZp+Mrk+K0B2fQ6QKviyYOwKxq5vGkCZMk3w+TW7qYOL6HlqXsV/35VC
X-Received: by 10.229.140.72 with SMTP id h8mr23145535qcu.25.1418616917530;
 Sun, 14 Dec 2014 20:15:17 -0800 (PST)
MIME-Version: 1.0
Received: by 10.140.28.183 with HTTP; Sun, 14 Dec 2014 20:14:37 -0800 (PST)
In-Reply-To: <CABLZTyAkNDQdvHEnZatQa+xYGBWjFvxD6T7uv9iSNZwiEL+ayA@mail.gmail.com>
References: <CABLZTyAkNDQdvHEnZatQa+xYGBWjFvxD6T7uv9iSNZwiEL+ayA@mail.gmail.com>
From: Alden Page <pagea@allegheny.edu>
Date: Sun, 14 Dec 2014 23:14:37 -0500
Message-ID: <CABLZTyD87c46u0Px=LSw9osJf6LY3KBmt0KScjik2q5h-TLLcA@mail.gmail.com>
To: tor-talk@lists.torproject.org
Subject: [tor-talk] Fwd: Developing an open-source, user-friendly tool for
 avoiding stylometry; seeking input from community
X-BeenThere: tor-talk@lists.torproject.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: tor-talk@lists.torproject.org
List-Id: "all discussion about theory, design,
 and development of Onion Routing" <tor-talk.lists.torproject.org>
List-Unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=unsubscribe>
List-Archive: <http://lists.torproject.org/pipermail/tor-talk/>
List-Post: <mailto:tor-talk@lists.torproject.org>
List-Help: <mailto:tor-talk-request@lists.torproject.org?subject=help>
List-Subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: tor-talk-bounces@lists.torproject.org
Sender: "tor-talk" <tor-talk-bounces@lists.torproject.org>

It has been shown that it is possible to "fingerprint" a person using
their writing style (preference for certain words, spelling mistakes,
eccentricities in grammar, etc.) thereby using this fingerprint to
determine whether or not a person authored an anonymous document to a
high degree of statistical certainty. The process of
analyzing/fingerprinting a person's writing style is called stylometry
It has been shown that it is possible to perform stylometry on sample
sizes of up to 100,000 authors with a surprising degree of success. I
hope that you will all agree that this poses a significant threat to
the preservation of the anonymity of Tor users. Please see the
following document for more information on the threat stylometry poses
to privacy, freedom of speech, and, more specifically, Tor users:
http://www.cs.berkeley.edu/~dawnsong/papers/2012%20On%20the%20Feasibility%20of%20Internet-Scale%20Author%20Identification.pdf

Several members of the online privacy community have expressed
interest in a tool that helps circumvent stylometry, as seen on the
Tails bug tracker and in a few threads on tor-talk. There is a tool
called Anonymouth that sets out to do this by pointing out stylometric
"giveaways" in input text, but it is quite unstable, and aimed at
researchers rather than your everyday end-user, making it quite
difficult to use. For this reason, I am attempting to replicate the
functionality of Anonymouth in a stripped down, easy-to-use Python
application, which I believe may someday be suitable for prepackaging
in the Tails OS and inclusion in Debian repositories.

Development will begin in mid-January 2015 at the latest; source code
will be made available under the MIT license on May 1st 2015. As much
as I would like to reach it earlier, I am developing this software as
part of my senior thesis at my college, and must not accept outside
code contributions until I have turned in my project for grading. It
is my hope that I and any other interested developers will continue to
work on this project long after May 1st.

In the spirit of meeting the needs of the privacy community, I am
interested in hearing what potential users might have to say about the
design of such a tool. As of now, I envision this tool as a GUI
desktop application that provides suggestions for preserving anonymity
much like Anonymouth, although this will be targeted at Tails/Tor
users rather than researchers. I hope to at least partially automate
the anonymization process as well, perhaps automatically substituting
certain words with synonyms or slightly adjusting the structure of a
sentence in order to get rid of glaring indicators of writing style.

Please contact pagea (at) allegheny.edu if you would like to be
notified once the source code is available. For a (very rough) idea of
what I hope to accomplish with this project, please see a draft of my
research proposal here:
https://pdf.yt/d/HsAyoE0VGCYsnVxU

I look forward to reading your comments.

Cheers,
Alden Page
-- 
tor-talk mailing list - tor-talk@lists.torproject.org
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk

