Ethical Hacking

Learn to find vulnerabilities before the bad guys do! Gain real world hands on hacking experience in our state of the art hacking lab. Course designed and taught by expert instructors with years of penetration testing experience. 12 student maximum in every class. Certification attempt included in every package.
Computer Forensics Training at InfoSec Institute

Gain the in-demand skills of a certified computer examiner, learn to recover trace data left behind by fraud, theft, and cybercrime perpetrators. Discover the source of computer crime and abuse at your organization so that it never happens again. All of our class sizes are guaranteed to be 12 students or less to facilitate one-on-one interaction with one of our expert instructors.




Network Security Web-App-Sec
[Top] [All Lists]

Re: Combatting automated download of dynamic websites?

Subject: Re: Combatting automated download of dynamic websites?
Date: Tue, 30 Aug 2005 13:24:53 +0200
Matthijs R. Koot wrote:

Thanks for your reply zeno! But actually, referer-based anti leeching
won't do it for me and mod_throttle isn't suitable for Apache 2. I'm in
need of a throttling function based on something more advanced like a
'request history stack' to check the order in which pages were
requested, probably within a certain time period, et cetera. Maybe it'd
be better to move such security measures into the actual web application
itself, but I'm still hoping someone knows of a service-based solution
(i.e. like the beforementioned Apache module).

Several web-oriented proxy firewalls implement a "request history stack" like you mentioned to prevent IP address from going directly against a given resource without following the "flow" established by the webapp programmer.


You could implement this by way of session handling, tying session identifiers to the client (through IP or user-agent) and then checking, application-wise, if the session is being handled as you would normally expect. Don't use referer information, stick session information to some kind of finite state machine that tells you if the user went through your defined procedure. In your Amazon example: first look at the book, then at the book details and then allow him to browse contents.

Of course, a user can try to reuse his session ID and spoof the identifiers (User-Agent) in alternative download technologies to be able to retrieve the content in the end. But it might raise the bar somewhat. I'm not aware of the capabilities of Teleport Pro or other software but I would defeat those checks by implementing a targeted web crawler with Perl's LWP::UserAgent.

If you want to stop even a determined (malicious?) user from retrieving the content then you will want to impose resource limits as suggested in the thread. Problem is, you can only tie that to the IP address (all other browser presented information is spoofable) but then you have the issue that some IP address (dynamic ranges from ISPs) only have one "client" behind while others (ISP's transparent proxies, companies proxies) might have more than one "client" behind. So either you monitor that, investigate deviations and tailor it for those IP address that might be more resource intensive or you might be blocking legitimate users from accessing the content in the second situation (i.e. proxies being used by a large number of users).

My 2c.

Javier

<Prev in Thread] Current Thread [Next in Thread>