Ethical Hacking

Learn to find vulnerabilities before the bad guys do! Gain real world hands on hacking experience in our state of the art hacking lab. Course designed and taught by expert instructors with years of penetration testing experience. 12 student maximum in every class. Certification attempt included in every package.
Computer Forensics Training at InfoSec Institute

Gain the in-demand skills of a certified computer examiner, learn to recover trace data left behind by fraud, theft, and cybercrime perpetrators. Discover the source of computer crime and abuse at your organization so that it never happens again. All of our class sizes are guaranteed to be 12 students or less to facilitate one-on-one interaction with one of our expert instructors.




Network Security Focus-IDS
[Top] [All Lists]

Re: Applying data mining to Intrusion Detection System

Subject: Re: Applying data mining to Intrusion Detection System
Date: Sun, 17 Jul 2005 18:27:48 +0200
trantichphuoc@yahoo.com wrote:
Hi all, I am a newbie in Network Security.

Then perhaps you have tackled a problem which is way too difficult as an introductory exercise, but aside from this:


I would like to try the dataset
and use some data mining tools to mine this.

You may wish to look for the HUGE VOLUME of scientific papers on this subject before asking questions ;)
http://portal.acm.org
http://www.ieeexplore.ieee.org
http://scholar.google.com


Are excellent starting points.

kddcup.data.gz The full data set (18M; 743M Uncompressed) -> I need
the output (classified as normal or an intrusion) so that a
supervised learnign can be done. This file is too big so I cant even
open it to see if it contains the output.

What are you opening it with ? Hopefully, not notepad under windows...

-> is that true the test data is not extracted from the training data

It's obvious, otherwise you wouldn't call it test data...

I see so many test sets and have no clue which one to use.

Perhaps you really should begin by reading some of the scientific articles that were written on this topic.


Also, a good book on data clustering such as the one by Han would be helpful to you.

2. What tool would you recommend me to use to mine these data?

One written by yourself, weka, matlab, mathematica, statistica...

3. How can I run the scoring script in
http://www-cse.ucsd.edu/users/elkan/awkscript.html

With awk ?

The script compares a result file produced by your tool with the real one, and gives you a score based on how different your file is.

Do I have to send my model
to the commeetee in order to have it evaluated,

Please don't. The evaluation took place 6 years ago, those people have moved on since :)


Best,
Stefano Zanero

Dottorando di Ricerca / Ph.D. Student
Politecnico di Milano - Dip. Elettronica e Informazione
Web:    www.elet.polimi.it/upload/zanero

------------------------------------------------------------------------
Test Your IDS

Is your IDS deployed correctly?
Find out quickly and easily by testing it with real-world attacks from CORE IMPACT.
Go to http://www.securityfocus.com/sponsor/CoreSecurity_focus-ids_040708 to learn more.
------------------------------------------------------------------------


<Prev in Thread] Current Thread [Next in Thread>