|Learning: learning is acquiring the ability to establish relationships between signs, symptoms or symbols and objects. This also includes e.g. recognition and recollection of patterns, similarities, sensory perceptions, self-perception, etc. In the ideal case, the ability to apply generalizations to future cases is acquired while learning. See also knowledge, knowledge-how, competence._____________Annotation: The above characterizations of concepts are neither definitions nor exhausting presentations of problems related to them. Instead, they are intended to give a short introduction to the contributions below. – Lexicon of Arguments. |
|Norvig I 693
Learning/AI Research/Norvig/Russell: Any component of an agent can be improved by learning from data. The improvements, and the techniques used to make them, depend on four major factors:
- Which component is to be improved.
Norvig I 694
- What prior knowledge the agent already has,
- What representation is used for the data and the component.
- What feedback is available to learn from.
Components to be learned:
1. A direct mapping from conditions on the current state to actions.
2. A means to infer relevant properties of the world from the percept sequence.
3. Information about the way the world evolves and about the results of possible actions the agent can take.
4. Utility information indicating the desirability of world states.
5. Action-value information indicating the desirability of actions.
6. Goals that describe classes of states whose achievement maximizes the agent’s utility.
>Representation/Norvig, >Knowledge/AI Research, >Supervised Learning/AI Research, >Environment/AI Research, > Artificial Neural Networks, >Learning Theory/Norvig.
Norvig I 744
Support vector machines/SVM: The support vector machine or SVM framework is currently the most popular approach for “off-the-shelf” supervised learning: if you don’t have any specialized prior knowledge about a domain, then the SVM is an excellent method to try first. Properties of SVMs:
1. SVMs construct a maximum margin separator - a decision boundary with the largest possible distance to example points. This helps them generalize well.
2. SVMs create a linear separating hyperplane, but they have the ability to embed the data into a higher-dimensional space, using the so-called kernel trick.
3. SVMs are a nonparametric method - they retain training examples and potentially need to store them all. On the other hand, in practice they often end up retaining only a small fraction of the number of examples - sometimes as few as a small constant times the number of dimensions. >Artificial Neural Networks/Norvig.
Norvig I 748
Ensemble Learning: The idea of ensemble learning methods is to select a collection, or ensemble, of hypotheses from the hypothesis space and combine their predictions. For example, during cross-validation we might generate twenty different decision trees, and have them vote on the best classification for a new example. The motivation for ensemble learning is simple. Consider an ensemble of K =5 hypotheses and suppose that we combine their predictions using simple majority voting. For the ensemble to misclassify a new example, at least three of the five hypotheses have to misclassify it. The hope is that this is much less likely than a misclassification by a single hypothesis.
Independence of hypotheses: (… ) obviously the assumption of independence is unreasonable, because hypotheses are likely to be misled in the same way by any misleading aspects of the training data. But if the hypotheses are at least a little bit different, thereby reducing the correlation between their errors, then ensemble learning can be very useful.
Hypotheses space: Another way to think about the ensemble idea is as a generic way of enlarging the hypothesis space. That is, think of the ensemble itself as a hypothesis and the new hypothesis
Norvig I 749
space as the set of all possible ensembles constructible from hypotheses in the original space. If the original hypothesis space allows for a simple and efficient learning algorithm, then the ensemble method provides a way to learn a much more expressive class of hypotheses without incurring much additional computational or algorithmic complexity.
Boosting: The most widely used ensemble method is called boosting. [It uses] the idea of a weighted training set. In such a training set, each example has an associated weight wj ≥ 0. The higher the weight of an example, the higher is the importance attached to it during the learning of a hypothesis. Boosting starts with wj =1 for all the examples (i.e., a normal training set). From this set, it generates the first hypothesis, h1. This hypothesis will classify some of the training examples correctly and some incorrectly. We would like the next hypothesis to do better on the misclassified examples, so we increase their weights while decreasing the weights of the correctly classified examples. From this new weighted training set, we generate hypothesis h2. The process continues in this way until we have generated K hypotheses, where K is an input to the boosting algorithm. The final ensemble hypothesis is a weighted-majority combination of all the K hypotheses, each weighted according to how well it performed on the training set.
Norvig I 757
The performance of a learning algorithm is measured by the learning curve, which shows the prediction accuracy on the test set as a function of the training-set size.
When there are multiple models to choose from, cross-validation can be used to select a model that will generalize well.
Norvig I 847
Learning a model for an observable environment is a supervised learning problem, because the next percept gives the outcome state. >Reinforcement Learning/AI Research._____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. The note [Author1]Vs[Author2] or [Author]Vs[term] is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
Stuart J. Russell
Artificial Intelligence: A Modern Approach Upper Saddle River, NJ 2010