Delivery-Date: Wed, 09 Jul 2014 04:42:10 -0400
Return-Path: <tor-talk-bounces@lists.torproject.org>
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on moria.seul.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_MED,RP_MATCHES_RCVD,T_DKIM_INVALID
	autolearn=ham version=3.3.1
X-Original-To: archiver@seul.org
Delivered-To: archiver@seul.org
Received: from eugeni.torproject.org (eugeni.torproject.org [38.229.72.13])
	(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by khazad-dum.seul.org (Postfix) with ESMTPS id 2B7741E0A55
	for <archiver@seul.org>; Wed,  9 Jul 2014 04:42:08 -0400 (EDT)
Received: from eugeni.torproject.org (localhost [127.0.0.1])
	by eugeni.torproject.org (Postfix) with ESMTP id 28C622FD43;
	Wed,  9 Jul 2014 08:42:07 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by eugeni.torproject.org (Postfix) with ESMTP id 166962F841
 for <tor-talk@lists.torproject.org>; Wed,  9 Jul 2014 08:38:17 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at eugeni.torproject.org
Received: from eugeni.torproject.org ([127.0.0.1])
 by localhost (eugeni.torproject.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id U7H_sIa1Wchc for <tor-talk@lists.torproject.org>;
 Wed,  9 Jul 2014 08:38:17 +0000 (UTC)
Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com
 [IPv6:2a00:1450:400c:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (not verified))
 by eugeni.torproject.org (Postfix) with ESMTPS id A584E2F7AE
 for <tor-talk@lists.torproject.org>; Wed,  9 Jul 2014 08:38:16 +0000 (UTC)
Received: by mail-we0-f173.google.com with SMTP id t60so7109212wes.32
 for <tor-talk@lists.torproject.org>; Wed, 09 Jul 2014 01:38:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:subject:references
 :in-reply-to:content-type:content-transfer-encoding;
 bh=jIBS/e3+1hD50V6mTkpkWTVB38cFVdzU7LIyKKkKPSY=;
 b=mKOiXebKes5DQ7iB2cneO/dQUArD88NKDFhTSeg0LYt5+mPO0BHIzaNIxSI6jGclbM
 VKvxkhdBLN+pDnJxa4nCj7aE3QZlbzi7lK/ADmlft93sAnAHiDxemgnM+OLouNp9SZpW
 ZAEJbDqOuAx9p08jUj/A8bmWjanC10nh2Ty4/S6r04gSar79VgH/g8afa91ZKy9OX2HB
 BK3MysIIyDNkysI5vcK2Itzokm3/Kon/Aqsg4pXyiHLTwFMB776t4I2ueCsA/0rcAs00
 grfr9OXGjKVH/X0R6x2OHdlJdlqB3mbZ4iIDrrXI764hEVvzsQpQ5kxpFdXKPPWDzuJ0
 2Aig==
X-Received: by 10.180.91.194 with SMTP id cg2mr10135791wib.12.1404895093620;
 Wed, 09 Jul 2014 01:38:13 -0700 (PDT)
Received: from [192.168.1.11] (ANice-652-1-242-75.w86-203.abo.wanadoo.fr.
 [86.203.113.75])
 by mx.google.com with ESMTPSA id gh16sm16813669wic.3.2014.07.09.01.38.12
 for <tor-talk@lists.torproject.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 09 Jul 2014 01:38:12 -0700 (PDT)
Message-ID: <53BCFF76.8040707@gmail.com>
Date: Wed, 09 Jul 2014 10:38:14 +0200
From: Aymeric Vitte <vitteaymeric@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.3;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: tor-talk@lists.torproject.org
References: <53AB742E.5000400@riseup.net>
 <DUB121-W1602424B2673FF14097129C8180@phx.gbl> <53ABAAFA.1040406@riseup.net>
 <C21E9389-F7C9-47E7-B475-A3D23C8C4F14@hidemeta.com>
 <20140626073045.GA10980@inner.h.apk.li> <6334967458078911169@unknownmsgid>
 <CAP3fC8TUmMGFOnyXNRWzR30ynk_bxFod0_VwcKAtguDESi70Fg@mail.gmail.com>
 <53B5AE90.60405@virtadpt.net>
 <53b5dc42.c457e00a.272b.3a5fSMTPIN_ADDED_BROKEN@mx.google.com>
 <53BC52CE.7060103@virtadpt.net>
In-Reply-To: <53BC52CE.7060103@virtadpt.net>
Subject: Re: [tor-talk] High-latency hidden services
X-BeenThere: tor-talk@lists.torproject.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: tor-talk@lists.torproject.org
List-Id: "all discussion about theory, design,
 and development of Onion Routing" <tor-talk.lists.torproject.org>
List-Unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=unsubscribe>
List-Archive: <http://lists.torproject.org/pipermail/tor-talk/>
List-Post: <mailto:tor-talk@lists.torproject.org>
List-Help: <mailto:tor-talk-request@lists.torproject.org?subject=help>
List-Subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=subscribe>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Errors-To: tor-talk-bounces@lists.torproject.org
Sender: "tor-talk" <tor-talk-bounces@lists.torproject.org>

According to your description you intend to reconsitute the page =

removing eventually what can be dangerous, this is very difficult to do =

(assuming that you want this page to behave like a real one and not like =

opening something similar to offline/mypage.html from your disk and =

assuming that you want to use the browsers as they are, ie not using =

plugins/extensions and not hacking into the code), I have described how =

it can be done in [1]

But finally, if the interesting information are some resources to be =

fetched from this page, then [2] does apply and is from far much more =

easy to do.

You can look at [3] to [6] which are projects to fetch/parse a page on =

server side (headless browser, handling js too) and extract things from =

it, the same principles apply on browser side for what people want to do =

here, when the fecthing is coupled to [7] it does provide anonymity, =

whether on browser or server side.

[1] https://lists.torproject.org/pipermail/tor-talk/2014-July/033636.html
[2] https://lists.torproject.org/pipermail/tor-talk/2014-July/033697.html
[3] https://github.com/Ayms/node-dom
[4] https://github.com/Ayms/node-bot
[5] https://github.com/Ayms/node-gadgets
[6] https://github.com/Ayms/node-googleSearch
[7] https://github.com/Ayms/node-Tor

Le 08/07/2014 22:21, The Doctor a =E9crit :
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 07/03/2014 03:16 PM, Seth David Schoen wrote:
>
>> That's great, but in the context of this thread I would want to
>> imagine a future-generation version that does a much better job of
>> hiding who is downloading which pages -- by high-latency mixing,
>> like an anonymous remailer chain.
> I realized that too late; thank you for pointing that out.
>
> I've been thinking a bit about this lately, and I think it might be
> doable.
>
> A while back I chanced across a description of how Richard Stallman
> browses the Net much of the time.  He uses a Perl script which is
> executed by Postfix via an e-mail alias.  If the sender's e-mail
> address matches one hardcoded in the config file, it parses the e-mail
> for URLs to grab and then uses LWP::UserAgent to download the URL and
> e-mail it back to the script's owner.
>
> The Git repo with the implementation:
>
> git://git.gnu.org/womb/hacks.git
>
> So... I've been toying with this idea but haven't had time to sit down
> and implement it yet:
>
> It would be possible to write a relatively simple utility that runs as
> a hidden service; perhaps on the user's (virtual) machine, perhaps on
> a known Tor hidden service node.  Perhaps it doesn't use a hidden
> service for itself but only listens on the loopback interface on a
> high port, and the user connects to http://localhost:9393/ from within
> the TBB.  Perhaps any of those options, dependent upon a command line
> switch or configuration file setting.  The user connects to the
> application and types or pastes a URL into a field.  The utility
> accepts the URL, verifies that it's a well formed URL, and records it
> internally, perhaps in a queue.  Every once in a while on a
> pseudorandom basis (computers, 'true' randomness, we've all seen the
> mailing list threads) the utility wakes up, picks the oldest URL in
> its queue out, and tries to download whatever it points to through the
> Tor network.
>
> If it successfully acquires an HTML page it could then attempt to
> parse it (using something like Beautiful Soup, maybe) to verify that
> it was a fully downloaded and validated HTML page.  It would also pick
> through the parsed tags for things like CSS or images, construct URLs
> to download them using the original URL (if no full URLs to them are
> in the HTML), and add them to the queue of things to get.  It doesn't
> seem unreasonable to rewrite the HTML to make links to those
> additional resources local instead of remote (./css/foo.css instead of
> css/foo.css) so the additional files downloaded would be referenced by
> the browser.  It also doesn't seem unreasonable that a particular
> instance of this utility could be configured to ignore certain kinds
> of resources (no .js files, no images, no CSS files) and snip tags
> that reference them from the HTML entirely.  When the resources for
> the page in question are fully downloaded (none are left in the queue)
> the user is alerted somehow (which suggests a personal application but
> there are other ways of notifying users).
>
> The timeframe in which an entire page could be downloaded could be
> extremely long, from seconds between requests, to requiring a new
> circuit for each request, to even weeks or months to grab an entire page.
>
> I don't know if such a thing could be written as a distributed
> application (lots of instances of this utility spread across a
> percentage of the Tor network keeping each other appraised of bits and
> pieces of web pages to download and send someplace).  I'll admit that
> I've never tried to write such a thing before.  The security profile
> of such a thing would certainly be a concern.
>
> Representing each page and its resources in memory would take a little
> doing but is far from impossible.  Depending on the user's threat
> model it may not be desirable to cache the page+resources on disk
> (holding them in RAM but making them accessible to the web browser,
> say, with a simple HTTP server listening on the loopback on a high
> port (I'm thinking instead of http://localhost:9393/ the user would
> access http://localhost:9393/pages/foo)), or the user may be
> comfortable with creating a subdirectory to hold the resources of a
> single page.  This is the technique that Scrapbook uses, and being
> workable aside seems very easy to implement:
>
> ~/.mozilla/firefox/<profile name>/Scrapbook/data/<datestamp>/<web page
> and all resources required to view it stored here in a single directory>
>
> A problem that would probably arise is Tor circuits dropping at odd
> intervals due to the phase of the moon, Oglogoth grinding its teeth,
> sunspots, or whatever and the connection timing out or dropping.  I'm
> not sure how to handle this yet.  Another potential problem is a user
> browsing a slowly downloaded page and clicking a link, which the
> browser would then jump directly to and avoiding slow-download
> entirely.  Warn the user this will happen?  Rewrite or remove the
> links?  I'm not sure yet what the Right Thing To Do(tm) would be.
> There are undoubtedly other gotchas that I haven't thought of or run
> into yet which others will notice immediately.
>
> - -- =

> The Doctor [412/724/301/703] [ZS]
> Developer, Project Byzantium: http://project-byzantium.org/
>
> PGP: 0x807B17C1 / 7960 1CDC 85C9 0B63 8D9F  DD89 3BD8 FF2B 807B 17C1
> WWW: https://drwho.virtadpt.net/
>
> "So many paths lead to tomorrow/what love has lost, can you forgive?"
> - --The Cruxshadows
>
> -----BEGIN PGP SIGNATURE-----
>
> iQIcBAEBCgAGBQJTvFLOAAoJED1np1pUQ8RkBxQP/0dVnZ5roC4S5ErbK30FFKSt
> VfACPXeO8SKbjyLQxU5RYVS0Q/nwS8QgSr/cCc7tqjtmkGypolJFsFIeKLc4m1bG
> 93H7FWoSDO++oiWyqxYBb7+q6CzktAGFesb0YFUtIe5ADKTVIqcynWD++6NByN0v
> rPpd5awppjL2f8r4l+bNBRWpk2d7KpilAG6KAwsQyDyvmLJWw9Pr2yN0o6SrrpHl
> hxK7jti6HAt2pMuFxbl3mI3MMN717XDymE04CLkGopiprhF0YAk7K0tPEak69e7/
> OD2AI2O0nLZzWZUdX9zrYs9OICuVfzVf0XUeTNKogh/30UBw3KdqNNPcVFbajnEe
> YBeAe6iNnDIgE70nv4OiIuFL9XO4rLmNOvCB9F3mRqIJVl/8mq7WTQeVBGt+dmz3
> 7srHnR1nenCmTHnyfaKYtn4+N0TGhdXHLR3e/4+v4RmU+Zueo/wf4ggM0ZUlfUVk
> JRPhfdDe9bx3O+nYIObPd5V0/atAHuXJjh9SJasOaxnQjiye75wZuswRa3vR5pdz
> Sd4vSWi6eQ+9s4xlD+dy5309yxQTDyFbr/O7lshtLPx51PC8ObU2+NMJMQY2HhyU
> pgLL6Gj6W2evwJAsS7pqQZNLzV3XdJwgNka2zO1XVES5/odKlvuItXDCC4g5ftxI
> 75FDhASj5VzcZjOJqcAc
> =3DLLtg
> -----END PGP SIGNATURE-----

-- =

Peersm : http://www.peersm.com
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms

-- =

tor-talk mailing list - tor-talk@lists.torproject.org
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk

