Delivery-Date: Tue, 09 Feb 2016 16:48:21 -0500
Return-Path: <tor-talk-bounces@lists.torproject.org>
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on moria.seul.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED,
	T_RP_MATCHES_RCVD autolearn=ham version=3.3.1
X-Original-To: archiver@seul.org
Delivered-To: archiver@seul.org
Received: from eugeni.torproject.org (eugeni.torproject.org [38.229.72.13])
	(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by khazad-dum.seul.org (Postfix) with ESMTPS id 08F601E0308;
	Tue,  9 Feb 2016 16:48:19 -0500 (EST)
Received: from eugeni.torproject.org (localhost [127.0.0.1])
	by eugeni.torproject.org (Postfix) with ESMTP id 36AB4390E3;
	Tue,  9 Feb 2016 21:48:12 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by eugeni.torproject.org (Postfix) with ESMTP id 2DA6A390DE
 for <tor-talk@lists.torproject.org>; Tue,  9 Feb 2016 21:48:08 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at 
Received: from eugeni.torproject.org ([127.0.0.1])
 by localhost (eugeni.torproject.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id 287aSgeefXMk for <tor-talk@lists.torproject.org>;
 Tue,  9 Feb 2016 21:48:08 +0000 (UTC)
Received: from mail-lf0-f99.google.com (mail-lf0-f99.google.com
 [209.85.215.99])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (not verified))
 by eugeni.torproject.org (Postfix) with ESMTPS id AE5B0390C3
 for <tor-talk@lists.torproject.org>; Tue,  9 Feb 2016 21:48:07 +0000 (UTC)
Received: by mail-lf0-f99.google.com with SMTP id 78so19195lfy.2
 for <tor-talk@lists.torproject.org>; Tue, 09 Feb 2016 13:48:07 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:in-reply-to:content-type:content-transfer-encoding;
 bh=qfAH/rdTcsb4hqi8p5UDaPLQV7OND/BX+ZXCv/8d4EM=;
 b=l/RTT8nK4RqRTqVE0VytBwlcm4wBAFSpT7qP7pAUgli8DQyrcEeC+V//S8DWeGZLqX
 cGKNExZhdDetdhMdVJxJzzNmHkHY17XLTEQ+HGDLxrmn0NDliQGYn2CCatNmh6dH9sdl
 soyWSggDnKnDWo21rsTb4+HfQ9Wyejv9h8Yl2XOpXYeupK1fLIgibolDaRkQFnwYmwte
 bqRVUStQJy7leogtZUQPofKpqvOhmGtsCXy3A0fAT6PsstfshenSQx2xmimNHOdFQl5X
 Vr9qbknBRftAHBRSp8SpSxDh9Re9JBlRhTyd5o8EvtAinZls5HVNqvj8ZQnNVlFlH4ek
 BZ2g==
X-Gm-Message-State: AG10YOSaDowBdzUk1YlXDK/vQ8RZQwHWhqGAB/Lpo/tzfIiuweAASoAkuRcsUp8TgqNr18k6KzZuxJ6J1+QxsUnd4YYo7fjr
X-Received: by 10.28.52.195 with SMTP id b186mr7055190wma.40.1455054484205;
 Tue, 09 Feb 2016 13:48:04 -0800 (PST)
Received: from apps.globaleaks.org (demo.globaleaks.org. [194.150.168.64])
 by smtp-relay.gmail.com with ESMTPS id e132sm1093529wme.2.2016.02.09.13.48.03
 for <tor-talk@lists.torproject.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 09 Feb 2016 13:48:04 -0800 (PST)
X-Relaying-Domain: apps.globaleaks.org
To: tor-talk@lists.torproject.org
References: <56AE1D6B.6060804@infosecurity.ch>
 <20160202125014.GM7734@moria.seul.org>
From: "Fabio Pietrosanti (naif) - lists" <lists@infosecurity.ch>
X-Enigmail-Draft-Status: N1110
Message-ID: <56BA5E90.3070304@infosecurity.ch>
Date: Tue, 9 Feb 2016 22:48:00 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0)
 Gecko/20100101 Thunderbird/38.5.1
In-Reply-To: <20160202125014.GM7734@moria.seul.org>
Subject: Re: [tor-talk] Exit Traffic classification and discrimination
X-BeenThere: tor-talk@lists.torproject.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: tor-talk@lists.torproject.org
List-Id: "all discussion about theory, design,
 and development of Onion Routing" <tor-talk.lists.torproject.org>
List-Unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=unsubscribe>
List-Archive: <http://lists.torproject.org/pipermail/tor-talk/>
List-Post: <mailto:tor-talk@lists.torproject.org>
List-Help: <mailto:tor-talk-request@lists.torproject.org?subject=help>
List-Subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=subscribe>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: tor-talk-bounces@lists.torproject.org
Sender: "tor-talk" <tor-talk-bounces@lists.torproject.org>



On 2/2/16 1:50 PM, Roger Dingledine wrote:
> On Sun, Jan 31, 2016 at 03:42:51PM +0100, Fabio Pietrosanti (naif) - lists wrote:
>> But 90% of my resources (given the previous hypotetical assumption)
>> would be happily pumping non-abuse-generating Tor exit traffic.
>>
>> Does anyone ever done some kind of testing or analysis about that kind
>> of approach?
> 
> Well, the first question there is to learn whether your assumption
> about destinations is actually true -- is most Tor traffic going to a
> small number of sites, or are many Tor destinations in the "long tail"?
> 
> I spoke to Tariq Elahi at length about exactly this research question,
> because they want to run some exit relays and try to answer it. They had
> some good plans for how to do it safely -- use Privex to combine views
> from several exits so you can't go back and learn which exit saw which
> destination, write nothing to disk except the final answer, etc.

I think that a possible cheap approach could be done, for what's related
to AS-aware approach.

Let's say that we make up an iptables chain rules that just load into a
chain all the destinations of well-known high-traffic destinations based
on the netblocks part of their own Autonomous System.

For example if i want to know from my Tor exit how many connections is
done to twitter and how much traffic is done to Twitter.

We learn that Twitter own AS54888, AS35995 and AS13414 .

We get their IP address netblocks with:

$ whois -h whois.radb.net '!gAS54888'
A182
199.96.56.0/21 199.96.56.0/24 199.96.57.0/24 199.96.58.0/24
199.96.59.0/24 199.96.60.0/24 199.96.61.0/24 103.252.114.0/23
185.45.6.0/23 104.244.43.0/24 199.96.56.0/23 199.96.60.0/23

$ whois -h whois.radb.net '!gAS35995'
8.25.194.0/23 8.25.196.0/23 192.133.78.0/23 8.25.194.0/24 8.25.195.0/24
8.25.196.0/24 8.25.197.0/24 185.45.4.0/24 103.252.112.0/23 185.45.4.0/23

$ whois -h whois.radb.net '!gAS13414'
199.96.57.0/24 199.16.156.0/22 199.59.148.0/22 192.133.76.0/22
192.133.76.0/23 199.96.59.0/24 199.96.58.0/24 199.96.63.0/24
199.96.56.0/21 103.252.112.0/22 103.252.114.0/23 185.45.4.0/23
199.96.62.0/23 199.96.58.0/23 185.45.6.0/23 192.44.68.0/23
192.48.236.0/23 69.12.56.0/21 104.244.40.0/21 104.244.42.0/24
103.252.112.0/23 104.244.43.0/24 185.45.5.0/24 185.45.4.0/24
199.96.56.0/24 202.160.128.0/22 202.160.128.0/24 202.160.129.0/24
202.160.130.0/24 202.160.131.0/24 188.64.224.0/24 188.64.225.0/24
188.64.226.0/24 188.64.227.0/24 188.64.228.0/24 188.64.229.0/24
188.64.230.0/24 188.64.231.0/24 188.64.224.0/21 199.16.156.0/22
199.96.57.0/24 199.96.63.0/24 192.133.76.0/22 8.25.194.0/23
199.96.61.0/24 192.133.78.0/23 199.96.59.0/24 8.25.196.0/23
199.96.60.0/24 199.96.56.0/24 199.96.58.0/24 199.59.148.0/22
199.96.56.0/21 192.133.76.0/23 8.25.195.0/24 8.25.194.0/24 8.25.197.0/24
8.25.196.0/24 103.252.114.0/23 103.252.112.0/23 103.252.112.0/22

Then we create one rule to catch all traffic going to that destination
in a chain, in the same way it's doable with iptables traffic accounting:
http://www.catonmat.net/blog/traffic-accounting-with-iptables/

We put the netblocks in file twitter-netblocks.txt and then make:
for i in `cat twitter-netblocks.txt` ; do echo iptables -I OUTPUT -p tcp
-d $i -m state --state NEW,ESTABLISHED -j twitter ; done

That way it would be possible to have a chain named "twitter" where all
the twitter traffic went trough in the Linux kernels trough the
"twitter" named chain.

By doing the following command it would be possible to know how much
traffic has been related to twitter:
$ iptables -L OUTPUT -n -v -x | grep twitter | awk '{ print $2}' | awk
'{ sum+=$1} END {print sum}'
1596

Knowing the specific total of only Tor Exit traffic (that today can't be
look at with iptables because of the missing
https://trac.torproject.org/projects/tor/ticket/17975 or
https://trac.torproject.org/projects/tor/ticket/18142) and knowing the
amount of Twitter traffic in the linux kernel's iptables chain
accounting, would make possible to say that X% of traffic where Twitter.

That process, established by scripting the process with all the AS of:
Google (17 AS)
Facebook (1 AS)
Twitter (3 AS)
Microsoft (28 AS)
Yahoo (59 AS)
Wikipedia (3 AS)
Linkedin (9 AS)
Github (1 AS)
Cloudflare (5 AS)

It would be possible to extract the information on how much traffic is
going destinated to those company/services in % to the total amount of
Tor Exit traffic.

A smart linux hacker with some iptables and shell script skills could
automate the process, including extraction of those information within a
weekend of fun.

What would be logged would be in the linux kernel buffers will be the
total amount of bytes exchanged for each of the netblocks defined in the
AS of the "destinations", that must be computed.

No source IP addresses or timing on when a new session has been
established would be logged.

Technically it's likely (but must be tested) that iptables could easily
keep some ten thousands of entry, representing all the netblocks of the
autonomous systems of all the destinations that someone would like to
measure that way, because of the linux kernel optimization stuff.

-naif
-- 
tor-talk mailing list - tor-talk@lists.torproject.org
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk

