Ethical Hacking

Learn to find vulnerabilities before the bad guys do! Gain real world hands on hacking experience in our state of the art hacking lab. Course designed and taught by expert instructors with years of penetration testing experience. 12 student maximum in every class. Certification attempt included in every package.
Computer Forensics Training at InfoSec Institute

Gain the in-demand skills of a certified computer examiner, learn to recover trace data left behind by fraud, theft, and cybercrime perpetrators. Discover the source of computer crime and abuse at your organization so that it never happens again. All of our class sizes are guaranteed to be 12 students or less to facilitate one-on-one interaction with one of our expert instructors.




Network Security Web-App-Sec
[Top] [All Lists]

Re: Defeating CAPTCHA

Subject: Re: Defeating CAPTCHA
Date: Thu, 25 Aug 2005 21:31:53 -0700
Hi Mark, 

Good points you've brought up in regards to my question earlier.
However, I don't fully agree that a secret component as you've described
it is entirely necessary. While noone with current technology should
ever expect a panacea in regards to solving this problem, the fact
remains that randomisation with quality entropy can serve as the
effective 'secret' as you've described, which is why I mentioned it
twice; once in regards to the answers array, and once again in regards
to the answer-selection identifier placement. Given 8-12 or perhaps even
up to 64 'mini' images with a quality randomiser and a proper
application of the randomness, I would suggest it's considerably more
solid than near total negation as you suggested. For starters, let's
remember that somewhere between panacea and utter pessimism is where the
answer always lies, so we've got that for starters. Second, the
randomness factored by the answer-identification markers factored by the
number of answer images, it COULD be done with a higher degree of
effectiveness than current deployments. Since it is not a magic bullet,
yes, given enough iterations eventually a machine will succeed in
guessing the answer. However, if the answers are cognitive derivatives
requiring input of multi-character words, patterns etc., displayed as a
part of the image no less (not a flat unicode below the image!), then
there's no solid way for a machine to even know WHAT to input. Properly
implemented and randomised, and with enough questions and answers
comprising the static vars portion of the software that you suggest
entirely negate it's value; the percentage of accurate machine-guesses
versus the reliable and simple (a million vars, choose one(random)) is
going to be no better than "one in a million" (as you can see i'm no
statistician ;). 

Anyway, I'm not trying to be disagreeable and perhaps we'll have to
agree to disagree, but I feel that the way I explained it which i THINK
is accurate to the way it is conceptualized in my head, makes it far
more viable than you had suggested by 'useless via source exposure', but
of course as I stated it certainly would not be the final say either. I
do think it'd be far preferrable to current implementations if it were
done correctly; and that is to spend the most time on accurate standards
and main iterations, but keeping the core process stupidly simple (to
avoid bugs and injecting predictability). Carnegie Mellon seems to think
it has some merit which I was glad to see the link to; in fact, the
drop-down word selector on their implementation had somewhere around 32
variables/words or thereabout. Couple that with the statefulness
required during signup over http in most if not all cases, that kills
the possibility of bad source addresses and therefore process-count
source address banning can be implemented if someone is engaged in a
guessing game over N amount of time. 

Sorry so long winded, just wanted to address the differences I felt in
viability with my presentation (and that of where carnegie mellon is
taking it as well). Random is the secret of which you speak. The rest is
safety in numbers prior to source-quashing........

Best Regards, 
Jayson


On Thu, 2005-08-25 at 11:32 -0600, Mark Burnett wrote:
The problem with coming up with effective CAPTCHA's is that the dataset 
should not rely on obscurity or secrecy to work. Anyone can come up with hard 
questions that can consistently trip up a computer, but how effective would 
those questions be if the adversary had access to your question/answer 
database? Many ideas I hear for CAPTCHA's rely solely upon the secrecy of the 
data set. And that is security that relies upon obscurity.

Try coming up with a CAPTCHA where the code is public, the dataset is public, 
and the only secret is the randomness generated for each individual test, and 
you will find that it is quite difficult. The problem, in part, is that we 
want a machine to generate a test that another machine with the same data 
cannot solve, but that a human can. We can bypass that by having humans come 
up with the questions, but that means they also need to store the answers for 
verification, again bringing us back to the problem that we are relying upon 
the secrecy of the data.

Another mistake that people often make with CAPTCHA's are questions with 
multiple choice answers. If you asked a question like "Which of these 
strawberries is most rotten?" you would have to provide enough pictures to 
reduce the significance of a luck guess. Even if you had 10 possible answers 
to select from, that might not be effective in stopping a spammer from 
setting up massive free e-mail accounts. It might statistically take them 10 
times as long but they can still do it. However, if you provide too many 
answers, the chances of several good answers increases, making it less 
effective. How many times have you taken a multiple choice test and there are 
two answers that, in your opinion, would work? Especially in the case of a 
subjective question such as which strawberry is most rotten.

It is definitely a good challenge and it will be cool seeing someone someday 
solve this problem.


Mark Burnett






On Thu, 25 Aug 2005 08:40:40 -0700, Jayson Anderson wrote:
That was an interesting article, I definetely got caught up clicking
thru for awhile.. One has to wonder, why hasn't a more effective system
been placed into production let alone conceptualized and largely
accepted as a solid approach for the future ? More specifically, the
claim that CAPTCHA as it stands now is not a Turing machine. I'm not
sure if that's entirely true as symbols pre-date their interpretation by
machine.=20
Regardless, like one gentleman mentioned in an article, a much more
clear method to differentiate man vs. machine would be to ask abstract
questions. Barring the cultural, linguistic and socioeconomic
implications, why not ask things like "which one is a pachyderm?". Or
"which texture most resembles stipple?". Or "Which of these strawberries
is most rotten?". Or "Which person is taller?" with same-sized figures,
but one the same sized as the car she stands next to, the other only
half. etc. etc. Ya know ? Sure it would take a significant multi-faceted
approach utilizing an amazingly heterogeneous set of contributors, but
that's where open source comes in. Pool a huge bank of acceptable
abstracts based on image size, obscurity and all the other standards
(which do NOT need to be complex at all), then refine that, seed the
array and answer presentations with some decent entropy, use yet more
entropy to randomize the units by which answers are delineated,
"a,b,c,d", "circle[~],eye{=3D],carrot[%],money[E]" each different each
time, and all the hundreds of other variables i've not thought of. It
seems like it is workable to me. Keep the project always living so that
submissions and refined objects are always being added to an update-able
system.....  SOMETHING is going to have to be done that is superior to
"crazytext", as ultimately it will be rendered nothing worse than a
speedbump. I think CAPTCHA still qualifies as Turing, just not an
effective one in it's environment. Seems that machine-proofing should
use anything BUT that which is found in almost every machine that would
be used to circumvent it :)=20

Sorry for the chatter but I've ALWAYS felt that crazytext(tm) was an
amazingly poor way to differentiate machine from man, and these articles
just prove what I and so many others I'm sure had always felt.....

Jayson

-
On Wed, 2005-08-24 at 14:29 -0400, robert@webappsec.org wrote:
This was linked off of slashdot 
(http://it.slashdot.org/article.pl?sid=05/08/24/1629213&tid=172&tid=95)
and explains some of the ways people are breaking CAPTCHA 
(http://en.wikipedia.org/wiki/Captcha) based systems.

http://sam.zoy.org/pwntcha/

- Robert
robert_at_webappsec.org
http://www.cgisecurity.com


<Prev in Thread] Current Thread [Next in Thread>