This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.

i am planning to do automatic classification of emails as personal,business,etc..and store in appropriate folder using naive bayes algorithm. Here Features are the keywords in the document and classes are the folder . But i am stuck after that step.please help me on how to apply naive bayes algorithm to my automatic mail classification application.

Here Features are the keywords in the document and classes are the folder . But i am stuck after that step.

Could you give a little bit more information about where are you stuck?

Oleg.

gayathri murugesan
Ranch Hand

Joined: Dec 21, 2009
Posts: 32

posted

0

i am confused at how to apply this algorithm to our application of automatic classification of mails.can you please tell how to calculate the probability of a message belonging to a folder.

Oleg Tikhonov
Ranch Hand

Joined: Aug 02, 2008
Posts: 55

posted

0

-----------------------------------------------------
| description
-----------------------------------------------------
A | is a mail belonging to folder F_1
-----------------------------------------------------
B | is a mail belonging to folder F_2
-----------------------------------------------------
C | has a mail been classified before
-----------------------------------------------------
P | will a mail be classified to F_2
-----------------------------------------------------

Let’s assume that a mail that belonging to folder F_1, is also belonging to folder F_2, and has
been classified before. We want to predict the probability that the mail will be classified to F_2:
Pr{P=T|A=T,B=T,C=T}=Pr{A=T,B=T,C=T|P=T}Pr{T}/Pr{A=T,B=T,C=T}
Pr{P=F|A=T,B=T,C=T}=Pr{A=T,B=T,C=T|P=F}Pr{F}/Pr {A=T,B=T,C=T}

One of the easiest ways to compute an event’s probability is to take its frequency count.
In our table for example, all A,B,C events happened 20 times, event A happened 5 times, event B - 12, event c - 3.
Pr{A}=5/20; Pr{B}=12/20; Pr{C}=3/20.

Pr{A or B } = Pr{A} + Pr{B} – Pr{A and B}
Pr{A and B} = Pr{A}Pr{B|A} = Pr{B} Pr{A|B} - Bayes' rule
output attribute could be either T - true or F -false.
Something like that.

It's still not clear to me where you're stuck: implementing the algorithm itself? Determining how to use the results?

gayathri murugesan
Ranch Hand

Joined: Dec 21, 2009
Posts: 32

posted

0

i must find the keyword in the incoming mail and determine which folder is suitable for the mail. i am stuck at applying the naive bayes algorithm to this problem.

for example :if i find the keywords in the mail as

microsoft offers windows

then

suppose there are two folders personal,technology

then how could i apply naive bayes algorithm to classify the mail with keywords "microsoft offers windows" in to the appropriate folder.

sorry for not explaining my problem in detail before.

So you've arrived at a spam probability, right? What's left to do?

gayathri murugesan
Ranch Hand

Joined: Dec 21, 2009
Posts: 32

posted

0

i have a doubt over here..

for example

suppose already i have stored the mails with the keywords microsoft , windows , iphone , itunes , ipod into technology folder and mails with the keyword market,home,school,college in to the personal folder.

actually mail with the keyword "microsoft offers technolgy" should be classified in to technology folder but the probability turns out to be zero.
and so i dont know whether i am going in the right path.