AI Research on Text Classification - Dictionary of Arguments

Author	Concept	Summary/Quotes	Sources
Philosophy Dictionary of Arguments Home


Text classification: Text classification in Artificial Intelligence is the process of categorizing or labeling text documents into predefined classes or categories based on their content, enabling automated analysis and organization of textual data. Fields of application are spam filtering, sentiment analysis, topic modeling, intent detection, question answering. See also Spam, Question answering, Artificial Intelligence. _____________ Annotation: The above characterizations of concepts are neither definitions nor exhausting presentations of problems related to them. Instead, they are intended to give a short introduction to the contributions below. – Lexicon of Arguments.

> AI Research	> Text Classification	AI Research on Text Classification - Dictionary of Arguments Norvig I 865 Text Classification/text categorization/AI Research/Norvig/Russell: (…) given a text of some kind, decide which of a predefined set of classes it belongs to. Language identification and genre classification are examples of text classification, as is sentiment analysis (classifying a movie or product review as positive or negative) and spam detection (classifying an email message as spam or not-spam). >Spam/AI Research. Norvig I 884 Manning and Schütze (1999)⁽¹⁾ and Sebastiani (2002)⁽²⁾ survey text-classification techniques. Joachims (2001)⁽³⁾ uses statistical learning theory and support vector machines to give a theoretical analysis of when classification will be successful. Apté et al. (1994)⁽⁴⁾ report an accuracy of 96% in classifying Reuters news articles into the “Earnings” category. Koller and Sahami (1997)⁽⁵⁾ report accuracy up to 95% with a naive Bayes classifier, and up to 98.6% with a Bayes classifier that accounts for some dependencies among features. Lewis (1998)(6) surveys forty years of application of naive Bayes techniques to text classification and retrieval. Schapire and Singer (2000)⁽⁷⁾ show that simple linear classifiers can often achieve accuracy almost as good as more complex models and are more efficient to evaluate. Nigam et al. (2000)⁽⁸⁾ show how to use the EM algorithm to label unlabeled documents, thus learning a better classification model. Witten et al. (1999)⁽⁹⁾ describe compression algorithms for classification, and show the deep connection between the LZW compression algorithm and maximum-entropy language models. 1. Manning, C. and Schütze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press. 2. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47. 3. Joachims, T. (2001). A statistical learning model of text classification with support vector machines. In SIGIR-01, pp. 128–136. 4. Apté, C., Damerau, F., and Weiss, S. (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12, 233–251. 5. Koller, D. and Sahami, M. (1997). Hierarchically classifying documents using very few words. In ICML-97, pp. 170–178. 6. Lewis, D. D. (1998). Naive Bayes at forty: The independence assumption in information retrieval. In ECML-98, pp. 4–15. 7. Schapire, R. E. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168. 8. Nigam, K., McCallum, A., Thrun, S., and Mitchell, T. M. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2–3), 103–134. 9. Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images (second edition). Morgan Kaufmann. _____________ Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Concept/Author], [Author1]Vs[Author2] or [Author]Vs[term] resp. "problem:"/"solution:", "old:"/"new:" and "thesis:" is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.	AI Research Norvig I Peter Norvig Stuart J. Russell Artificial Intelligence: A Modern Approach Upper Saddle River, NJ 2010

Send Link

> Counter arguments against AI Research

> Counter arguments in relation to Text Classification

Authors A B C D E F G H I J K L M N O P Q R S T U V W Y Z

Concepts A B C D E F G H I J K L M N O P Q R S T U V W Z

Ed. Martin Schulz, access date 2024-04-25

Legal Notice Contact Data protection declaration