Delivery-Date: Sat, 03 Oct 2015 01:20:52 -0400
Return-Path: <tor-talk-bounces@lists.torproject.org>
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on moria.seul.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED,FREEMAIL_FROM,RCVD_IN_DNSWL_MED,T_DKIM_INVALID,T_RP_MATCHES_RCVD
	autolearn=ham version=3.3.1
X-Original-To: archiver@seul.org
Delivered-To: archiver@seul.org
Received: from eugeni.torproject.org (eugeni.torproject.org [38.229.72.13])
	(using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by khazad-dum.seul.org (Postfix) with ESMTPS id 3BF441E0D5E;
	Sat,  3 Oct 2015 01:20:50 -0400 (EDT)
Received: from eugeni.torproject.org (localhost [127.0.0.1])
	by eugeni.torproject.org (Postfix) with ESMTP id 44ED137EF0;
	Sat,  3 Oct 2015 05:20:43 +0000 (UTC)
Received: from localhost (localhost [127.0.0.1])
 by eugeni.torproject.org (Postfix) with ESMTP id A016F37EE9
 for <tor-talk@lists.torproject.org>; Sat,  3 Oct 2015 05:20:39 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at 
Received: from eugeni.torproject.org ([127.0.0.1])
 by localhost (eugeni.torproject.org [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id w804H1lkfjuJ for <tor-talk@lists.torproject.org>;
 Sat,  3 Oct 2015 05:20:39 +0000 (UTC)
Received: from mail-ig0-f169.google.com (mail-ig0-f169.google.com
 [209.85.213.169])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (not verified))
 by eugeni.torproject.org (Postfix) with ESMTPS id 77E0837EE5
 for <tor-talk@lists.torproject.org>; Sat,  3 Oct 2015 05:20:39 +0000 (UTC)
Received: by igcpb10 with SMTP id pb10so32325518igc.1
 for <tor-talk@lists.torproject.org>; Fri, 02 Oct 2015 22:20:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :content-type; bh=kMoRPqguBBVxPmypzoQj9lkZlpSv21JpM4bFu5fRbYA=;
 b=v2sSjJnvP4KzGxnqn5CeM9KTcGsjSU7lyHo30MoZ+1H3LAEp3VXILJq0ou0VvFzozT
 er+KS0SpdZLjy9Hh1VOtxR+1WiCMuUSiPtXpvslQj03rbY7e/qE6vcAkPZDCguYjIq3A
 +xXznzBl6t5iOo+aWH4nrzTpb7GZX274ge31JPjpxH/QnfnkNXmm4pxpGKK8MDbWP5Lb
 hl3bLv2l6gJztyFtR3oqsrRdRpPPfYOk7avnBA7nCSHiyggH05Hx31XdgZeixQvl8BG5
 9+FEXl79EixlNrY0y0u1yzmDcGL+NlyZqGjaTBLOlJQ8XMVZ3hjbMbrVxmlAI2WslAwq
 0xpA==
MIME-Version: 1.0
X-Received: by 10.50.41.74 with SMTP id d10mr796672igl.94.1443849637264; Fri,
 02 Oct 2015 22:20:37 -0700 (PDT)
Received: by 10.107.170.216 with HTTP; Fri, 2 Oct 2015 22:20:37 -0700 (PDT)
Received: by 10.107.170.216 with HTTP; Fri, 2 Oct 2015 22:20:37 -0700 (PDT)
In-Reply-To: <CAAgxajFgN-A79=W0P__U2yOk3=tLjQe_mMDm-zHmLtFDOoXcUA@mail.gmail.com>
References: <CAJwFvsWDxzLjhRLPa1xbW5EuXO1nw=BRR+uJTSqRxyu3BF9CYA@mail.gmail.com>
 <CAJwFvsXHjUWz8YwB70V3mfTrh9F=GwSCxpensEjEhLmBy-tjhQ@mail.gmail.com>
 <CAJJJ=-wz5uMVeB2UnWpoWT=ZmpnY7QoAkK=KDitiC9JO3o5wVg@mail.gmail.com>
 <CAAgxajFgN-A79=W0P__U2yOk3=tLjQe_mMDm-zHmLtFDOoXcUA@mail.gmail.com>
Date: Sat, 3 Oct 2015 01:20:37 -0400
Message-ID: <CAJwFvsU2hdM1jAfL+dth-SUL0JiN7Pw2aYCq+7++tMG4bc362Q@mail.gmail.com>
From: Tyler Hardin <th020394@gmail.com>
To: tor-talk@lists.torproject.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [tor-talk] How to write program that uses Tor network
X-BeenThere: tor-talk@lists.torproject.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: tor-talk@lists.torproject.org
List-Id: "all discussion about theory, design,
 and development of Onion Routing" <tor-talk.lists.torproject.org>
List-Unsubscribe: <https://lists.torproject.org/cgi-bin/mailman/options/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=unsubscribe>
List-Archive: <http://lists.torproject.org/pipermail/tor-talk/>
List-Post: <mailto:tor-talk@lists.torproject.org>
List-Help: <mailto:tor-talk-request@lists.torproject.org?subject=help>
List-Subscribe: <https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk>, 
 <mailto:tor-talk-request@lists.torproject.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: tor-talk-bounces@lists.torproject.org
Sender: "tor-talk" <tor-talk-bounces@lists.torproject.org>

This is a lot of really good advice. Thanks. For some reason, I was
thinking C++ would give a measurable performance increase for the spider,
but after having questioned that it seems really dumb. Obviously the
network will be the bottleneck by far. I think I'll still use C++ for the
back end though, since that's where the performance might matter. I'm also
thinking about using Wt for the frontend. It seems like most of the search
engines on Tor aren't capable of holding up to the load. Do you think
that's caused by computational limits or upload/download rate limits?
On Oct 1, 2015 8:16 AM, "Apple Apple" <djjdjdjdjdjdjd32@gmail.com> wrote:

> Asio is only a socket library which means you would need to build all the
> Http logic on top of it, which is not very fun but everything you need to
> know is documented in RFCs if you really want to go down that route.
>
> The "best/ easiest" way would be to use a http library specifically for the
> purpose of fetching webpages. Curl is a good one. To integrate Tor support
> it is simply a matter of setting a SOCKs proxy, the same way you configure
> a web browser to use Tor.
>
> Make sure that your library contains an option to proxy DNS as well. If
> fetching bing.com works but an onion site doesn't then you probably have a
> DNS leak. Curl provides an option to fix this but it is not enabled by
> default.
>
> This is not really related to Tor but are you sure C++ is the right
> language for this? You will quickly discover that web developers have a
> very easy life. Not a single one of them is capable of writing valid HTML
> but browsers need to process it anyway (hence why there are so many bugs in
> browsers).
>
> You can get kind of far using regular expressions. You can get kind of
> further with libtidy and an XML parser. If you are serious though I would
> recommend an alternative language such as ruby + nokogiri or python +
> beautiful soup, at least to do the HTML parsing.
>
> Of course you can always embed a parser written in another language into an
> existing C++ code base (Python is easy, Ruby is harder but I have done it).
> If you are still at the greenfields stage of the project you should think
> about this early.
>
> I hope this helps.
> --
> tor-talk mailing list - tor-talk@lists.torproject.org
> To unsubscribe or change other settings go to
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk
>
-- 
tor-talk mailing list - tor-talk@lists.torproject.org
To unsubscribe or change other settings go to
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-talk

