How to analyze text in Java
I have a site with a form. Users use that form and send me requests, so I naively thought to automate the process of reviewing those requests by analyzing the text - makes sense no ? Well, apparently I opened a Pandora box called NLP . It seems that text analyzing is a very vast subject with different algorithms of doing so. In order to have some order out of the chaos I want to separate the subject to several sub-subjects: Sentence isolation - breaking the paragraph to sentences [not everyone is using "period"] Naming - identifying names, places, dates, currency etc. POS-TAGging - finding the type of each word in the sentence (Noun, Verb etc) Parsing - Identifying sentence parts like subject, direct object etc There are many more parts and sub parts but the above are those I decided to focus on Please note that in order to be accurate the tools need a big "dictionary" of the parsed language, thus these tools might be very heavy on