Stop N Learn: April 2014

What is Machine Learning?

Any study of Machine Learning should begin with a formal definition of what is meant

by Learning. A definition due to Simon (1983) is one of the best:

“Learning denotes changes in the system that are adaptive in the sense that they

enable the system to do the same task (or tasks drawn from a population of

similar tasks) more effectively the next time.”

We can easily extend this definition easily to our AI systems:

“Machine learning denotes automated changes in an AI system that are adaptive

in the sense that they enable the system to do the same task (or tasks drawn from

a population of similar tasks) more effectively the next time.”

The details of Machine Learning depend on the underlying knowledge representations,

e.g. learning in neural networks will be very different to learning in rule based systems.

The strategies for learning can be classified according to the amount of inference the

system has to perform on its training data. In increasing order we have

1. Rote learning – the new knowledge is implanted directly with no inference at

all, e.g. simple memorisation of past events, or a knowledge engineer’s direct

programming of rules elicited from a human expert into an expert system.

2. Supervised learning – the system is supplied with a set of training examples

consisting of inputs and corresponding outputs, and is required to discover the

relation or mapping between then, e.g. as a series of rules, or a neural network.

3. Unsupervised learning – the system is supplied with a set of training examples

consisting only of inputs and is required to discover for itself what appropriate

outputs should be, e.g. a Kohonen Network or Self Organizing Map.

Early expert systems relied on rote learning, but for modern AI systems we are

generally interested in the supervised learning of various levels of rules.

“An agent is anything that can be viewed as perceiving its environment through sensors

and acting upon that environment through actuators.” (Russell & Norvig, page 32)

It is no accident that this diagram is of exactly the same form as the diagram for the

interaction of the human nervous system with its environment.

Intelligent Agents

Agents are comprised of an architecture (e.g. a computer) plus a program that runs on

that architecture. In this module we are primarily interested in designing the programs.

In designing intelligent systems there are four main factors to consider:

P Percepts – the inputs to our system

A Actions – the outputs of our system

G Goals – what the agent is expected to achieve

E Environment – what the agent is interacting with

We shall consider four types of agent system of increasing sophistication:

1. Simple Reflex Agents

2. Reflex Agents with an Internal State

3. Goal based agents

4. Utility based agents

The Agent Environments

We have talked a lot about intelligent agents, but little about the environments with

which they interact. There are five particularly useful dimensions along which one can

categorize the properties of an agent’s environment:

Accessible vs. Inaccessible – An environment is accessible to an agent if the agent’s

sensory apparatus gives it access to the complete state of the environment. The

agent will not then need to maintain an internal state to keep track of the world.

Deterministic vs. Non-deterministic – The environment is deterministic if its next

state is determined completely by its current state and the actions of its agents. A

deterministic but inaccessible environment may appear non-deterministic to the

agent. Most real environments are so complex that, in practice, they have to be

treated as non-deterministic.

Discrete vs. Continuous – A discrete environment has a limited/finite number of

distinct, clearly defined percepts and actions.

Importance of Search

searches is determined by a search strategy. In this

lecture we shall begin by looking at a number of uninformed (blind) search strategies,

i.e. search strategies that do not use any information about the distance to the goal.

Then we will have an overview of some important informed search strategies.

State Space Representations

The state space is simply the space of all possible states, or configurations, that our

system may be in. Generally, of course, we prefer to work with some convenient

representation of that search space.

There are two components to the representation of state spaces:

State Space Graphs

If the number of possible states of the system is small enough, we can represent all of

them, along with the transitions between them, in a state space graph, e.g.

Routes Through State Space

Our general aim is to search for a route, or sequence of transitions, through the state

space graph from our initial state to a goal state.

Sometimes there will be more than one possible goal state. We define a goal test to

determine if a goal state has been achieved.

The solution can be represented as a sequence of link labels (or transitions) on the state

space graph. Note that the labels depend on the direction moved along the link.

Sometimes there may be more than one path to a goal state, and we may want to find

the optimal (best possible) path. We can define link costs and path costs for measuring

the cost of going along a particular path, e.g. the path cost may just equal the number of

links, or could be the sum of individual link costs.

For most realistic problems, the state space graph will be too large for us to hold all of it

explicitly in memory at any one time.

Search Trees

It is helpful to think of the search process as building up a search tree of routes through

the state space graph. The root of the search tree is the search node corresponding to

the initial state. The leaf nodes correspond either to states that have not yet been

expanded, or to states that generated no further nodes when expanded.

At each step, the search algorithm chooses a new unexpanded leaf node to expand. The

different search strategies essentially correspond to the different algorithms one can use

to select which is the next mode to be expanded at each stage

BFS expands the leaf node with the lowest path cost so far, and keeps going until a goal

node is generated. If the path cost simply equals the number of links, we can implement

this as a simple queue (“first in, first out”).

This is guaranteed to find an optimal path to a goal state. It is memory intensive if the

state space is large. If the typical branching factor is b, and the depth of the shallowest goal state is d – the space complexity is O(bd), and the time complexity is O(bd).

DFS expands the leaf node with the highest path cost so far, and keeps going until a goal

node is generated. If the path cost simply equals the number of links, we can implement

this as a simple stack (“last in, first out”).

This is not guaranteed to find any path to a goal state. It is memory efficient even if the

state space is large. If the typical branching factor is b, and the maximum depth of the

tree is m (possibly ¥) – the space complexity is O(bm), and the time complexity is O(bm).

What is Knowledge?

The Chambers 20th Century Dictionary provides as good a definition as any:

knowledge, nol¢ij, n. assured belief; that which is known; information; …

In order to solve the complex problems encountered in AI, one generally needs a large

amount of knowledge, and suitable mechanisms for representing and manipulating all

that knowledge.

Knowledge can take many forms. Some simple examples are:

John has an umbrella

It is raining

An umbrella stops you getting wet when it’s raining

An umbrella will only stop you getting wet if it is used properly

Umbrellas are not so useful when it is very windy

So, how should an AI agent store and manipulate knowledge like this?

What is a Knowledge Representation?

The object of a knowledge representation is to express knowledge in a computer

tractable form, so that it can be used to enable our AI agents to perform well.

A knowledge representation language is defined by two aspects:

1. Syntax The syntax of a language defines which configurations of the components

of the language constitute valid sentences.

2. Semantics The semantics defines which facts in the world the sentences refer to,

and hence the statement about the world that each sentence makes.

This is a very general idea, and not restricted to natural language.

Suppose the language is arithmetic, then

‘x’, ‘³’ and ‘y’ are components (or symbols or words) of the language

the syntax says that ‘x ³ y’ is a valid sentence in the language, but ‘³ ³ x y’ is not

the semantics say that ‘x ³ y’ is false if y is bigger than x, and true otherwise

Requirements of a Knowledge Representation

A good knowledge representation system for any particular domain should possess the

following properties:

1. Representational Adequacy – the ability to represent all the different kinds of

knowledge that might be needed in that domain.

2. Inferential Adequacy – the ability to manipulate the representational structures to

derive new structures (corresponding to new knowledge) from existing structures.

3. Inferential Efficiency – the ability to incorporate additional information into the

knowledge structure which can be used to focus the attention of the inference

mechanisms in the most promising directions.

4. Acquisitional Efficiency – the ability to acquire new information easily. Ideally

the agent should be able to control its own knowledge acquisition, but direct

insertion of information by a ‘knowledge engineer’ would be acceptable.

Finding a system that optimises these for all possible domains is not going to be feasible

First Order Logic as a Knowledge Representation

We can combine sentences by the ‘rules of logic’ to produce new sentences, e.g.

Øman(Chris)

Øman(x) Þ woman(x)

woman(Chris)

As a knowledge representation, first order logic has pros and cons:

Advantages

1. It is very expressive.

2. It has unambiguous syntax and semantics.

Disadvantage

1. There is no generally efficient procedure for processing knowledge

Sources of Uncertainty

There are many potential sources of uncertainty that AI systems (such as expert

systems) must be able to cope with, but most can be attributed to one of:

Imperfect Domain Knowledge

The theory of the domain may be vague or incomplete. Incompleteness

necessitates the use of rules of thumb (or heuristics) which may not always give

optimal or correct results. Even if the domain theory is complete, an expert may

use approximations or heuristics to save time or simplify the problem solving.

Imperfect Case Data

Sensors have only finite resolving power and less than 100% reliability. Human

reports may be ambiguous or inaccurate. Evidence from different sources may be

missing or in conflict. Even if exact data were available, it may be too costly in

time or resources to get it.

Whatever the source of uncertainty, we need our AI systems to be able to deal with it.

Comparing Methods of Inexact Reasoning

We can now compare the main alternative approaches for treating uncertainty:

Bayesian Probability Theory / Bayesian Networks : This is the obvious consistent

approach, but it is often extremely computationally intensive unless we can make good

use of conditional independencies among the variables.

Dempster-Shafer Theory : This allows one to state that certain prior and conditional

probabilities cannot be assessed, and provides the notion of a compatibility relation

between beliefs. It appears to be a consistent generalization of probability theory.

Fuzzy Logic : This is another consistent alternative to standard probability theory.

However, the fuzzification of truth values is inconsistent with the basic idea of

conditional probabilities because of the modified definition of conjunction. It also

involves the non-standard and counter-intuitive property f(A Ú ØA) ¹ f(True).

Fuzzy Set Theory

Another useful alternative to probability theory for uncertain reasoning in AI systems is

fuzzy logic. It is based on the idea that many concepts are not sharply defined (e.g. ‘fast’,

‘tall’, ‘hot’) and consequently we cannot use standard set theory and if-then rules when

reasoning with them. Fuzzy logic is built upon the underlying idea of fuzzy set theory.

Classical set theory is based on two-valued logic, in which relations such as X Î S are

either true or false. Such classical sets are sometimes called crisp sets. We can define a

crisp set of fast cars as those cars which have a top speed greater than 150mph :

FastCars = { X Î Cars : TopSpeed(X) > 150mph }

But the concept of fast car is not really precise like that. It is more reasonable to define it

as a fuzzy set with elements that are members to a certain degree. A fuzzy set is thus

defined as a function from the appropriate domain to the interval [0, 1] such that f(X) = 1

denotes X is definitely a member, f(X) = 0 denotes X is definitely not a member, and other

values denote intermediate degrees of membership.

What are Neural Networks ?

1. Neural Networks (NNs) are networks of neurons, such as found in real (i.e.

biological) brains.

2. Artificial neurons are crude approximations of the neurons found in brains. They

may be physical devices, or purely mathematical constructs.

3. Artificial Neural Networks (ANNs) are networks of artificial neurons, and hence

constitute crude approximations to parts of real brains. They may be physical

devices, or simulated on conventional computers.

4. From a practical point of view, an ANN is just a parallel computational system

consisting of many simple processing elements connected together in a specific

way in order to perform a particular task.

5. One should never lose sight of how crude the approximations are, and how oversimplified

our ANNs are compared to real brains.

What are Artificial Neural Networks used for?

As with the field of AI in general, there are two basic goals for neural network research:

Brain modelling : The scientific goal of building models of how real brains work.

This can potentially help us understand the nature of human intelligence, formulate

better teaching strategies, or better remedial actions for brain damaged patients.

Artificial System Building : The engineering goal of building efficient systems for

real world applications. This may make machines more powerful, relieve humans

of tedious tasks, and may even improve upon human performance.

Basic Components of Biological Neurons

1. The majority of neurons encode their activations or outputs as a series of brief

electrical pulses (i.e. spikes or action potentials).

2. The neuron’s cell body (soma) processes the incoming activations and converts

them into output activations.

3. The neuron’s nucleus contains the genetic material in the form of DNA. This is

the same as in most types of cells, not just neurons.

4. Dendrites are fibres which emanate from the cell body and provide the receptive

zones that receive activation from other neurons.

5. Axons are fibres acting as transmission lines that send activation to other neurons.

6. The junctions that allow signal transmission between the axons and dendrites are

called synapses. The process of transmission is by diffusion of chemicals called

neurotransmitters across the synaptic cleft.

The basic artificial neuron is the following simplified model of a biological neuron:

1. A set of synapses (i.e. connections) brings in activations from other neurons.

2. The processing unit sums the inputs, and then applies a non-linear activation/

squashing/transfer/threshold function f(x).

. An output line transmits the result to other neurons.

Real world applications

Financial modelling – predicting stocks, shares, currency exchange rates

Other time series prediction – climate, weather, airline marketing tactician

Computer games – intelligent agents, backgammon, first person shooters

Control systems – autonomous adaptable robotics, microwave controllers

Pattern recognition – speech recognition, hand-writing recognition, sonar signals

Data analysis – data compression, data mining, PCA, GTM

Noise reduction – function approximation, ECG noise reduction

Bioinformatics – protein secondary structure, DNA sequencing

Inference Rules

We are already familiar with the kind of rules our AI systems may use, e.g.

Deductive Inference Rule

Modus Ponens

Given “A” and “A implies B”, we can

conclude “B”:

A Þ B

Example:

It is raining

If it is raining, the street is wet

The street is wet

Abductive Inference Rule

Abduction

Given “B” and “A implies B”, it might

be reasonable to expect “A”:

A Þ B

Example:

The street is wet

If it is raining, the street is wet

It is raining.

Forward Chaining

Forward chaining or data-driven inference works by repeatedly: starting from the current

state, matching the premises of the rules (the IF parts), and performing the corresponding

actions (the THEN parts) that usually update the knowledge base or working memory.

The process continues until no more rules can be applied, or some cycle limit is met, e.g

The Forward Inference Chain

In this example there are no more rules, so we can draw the inference chain:

This seems simple enough, but in this case we only had a few initial facts, and a few

rules. Generally, things will not be so straight forward.

Disadvantages of Forward Chaining

1. Many rules may be applicable at each stage – so how should we choose which one

to apply next at each stage?

2. The whole process is not directed towards a goal, so how do we know when to

stop applying the rules?

Backward Chaining

Backward chaining or goal-driven inference works towards a final state by looking at the

working memory to see if the sub-goal states already exist there. If not, the actions (the THEN

parts) of the rules that will establish the sub-goals are identified, and new sub-goals are set up

for achieving the premises of those rules (the IF parts). The previous example now becomes:

What is an Expert System?

Jackson (1999) provides us with the following definition:

An expert system is a computer program that represents and reasons with knowledge

of some specialist subject with a view to solving problems or giving advice.

To solve expert-level problems, expert systems will need efficient access to a

substantial domain knowledge base, and a reasoning mechanism to apply the

knowledge to the problems they are given. Usually they will also need to be able to

explain, to the users who rely on them, how they have reached their decisions.

They will generally build upon the ideas of knowledge representation, production rules,

search, and so on, that we have already covered.

Often we use an expert system shell which is an existing knowledge independent

framework into which domain knowledge can be inserted to produce a working expert

system. We can thus avoid having to program each new system from scratch.

The Architecture of Expert Systems

The process of building expert systems is often called knowledge engineering. The

knowledge engineer is involved with all components of an expert system:

The Backward Inference Chain

The first part of the chain works back from the goal until only the initial facts are

required, at which point we know how to traverse the chain to achieve the goal state.

Advantage of Backward Chaining

1. The search is goal directed, so we only apply the rules that are necessary to

achieve the goal.

Disadvantage of Backward Chaining

1. A goal has to be known. Fortunately, most AI systems we are interested in can be

formulated in a goal based fashion.

What Exactly is AI?

“Artificial Intelligence (AI) is the part of computer science concerned with designing

intelligent computer systems, that is, systems that exhibit characteristics we associate

with intelligence in human behaviour – understanding language, learning, reasoning,

solving problems, and so on.” (Barr & Feigenbaum, 1981)

Engineering Goal To solve real world problems using AI techniques such as

knowledge representation, learning, rule systems, search, and so on.

Scientific Goal To determine which ideas about knowledge representation, learning,

rule systems, search, and so on, explain various sorts of real intelligence.

AI has roots in a number of older fields: Philosophy, Logic/Computation, Psychology/

Cognitive Science, Biology/Neuroscience, Evolution. Note: Strong AI v. Weak AI.

AI has many sub-fields (such as Neural Networks, Evolutionary Computation, Expert

Systems, Natural Language Processing, Planning, Robotics, Vision), but they employ

common techniques (such as Representation, Learning, Rules, Search).

DEFMACRO

DEFMACRO stands for DEFine MACRO.
The basic skeleton of a DEFMACRO is quite similar to the skeleton of a DEFUN.

(defmacro name (parameter*)

  "Optional documentation string."

  body-form*)

Like a function, a macro consists of a name, a parameter list, an optional documentation
string, and a body of Lisp expressions.
However, the job of a macro isn't to do anything directly--its job is to generate code that
will later do what we want.

(defmacro mac1 (a b) `(+ ,a (* ,b 3)))

(mac1 4 5) =>  19

DEFUN defines named functions

We can define a named function using the DEFUN form:

> (defun secret-number (the-number)

        (let ((the-secret 37))

          (cond ((= the-number the-secret) 'that-is-the-secret-number)

               ((< the-number the-secret) 'too-low)

                  ((> the-number the-secret) 'too-high))))


 SECRET-NUMBER

The DEFUN form has three arguments:

The name of the function: SECRET-NUMBER,

A list of argument names: (THE-NUMBER), which will be bound to the function's parameters when it is called, and

the body of the function: (LET ...).

car and cdr of a list

car returns the first element of list .

cdr returns the rest of the list.

Eg: (car '(1 2 3 4))

-> 1

(cdr '( 1 2 3 4))

-> ( 2 3 4)

4. Similarly we can have the following:

5.  (car x)    ==  (first x)

6.  (cadr x)   ==  (second x) ==  (car (cdr x))

7.  (caddr x)  ==  (third x)  ==  (car (cdr (cdr x)))

8. (cadddr x) == (fourth x) == (car (cdr (cdr (cdr x))))

9. RPLACA, RPLACD

1 A list is constructed of CONS cells. Each CONS has two parts, a CAR and a CDR . The CAR holds the data for one element of the list, and the CDR holds the CONS that makes up the head of the rest of the list.

1 By using RPLACA and RPLACD to change the two fields of a CONS, we can alter the normal structure of a list. For example, we could splice out the second element of a list like this:

4 ? (defparameter *my-list* (list 1 2 3 4))

*MY-LIST*

? (rplacd *my-list* (cdr (cdr *my-list*)))

(1 3 4)

? *my-list*

(1 3 4)

Not All Comparisons are Equal

Lisp has a core set of comparison functions that work on virtually any kind of object. These are:

EQ
EQL
EQUAL
EQUALP

The tests with the shorter names support stricter definitions of equality.

The tests with the longer implement less restrictive, perhaps more intuitive, definitions of equality.

EQ is true for identical symbols

EQ is true for identical symbols. In fact, it's true for any identical object. In other words,

an object is EQ to itself. Even a composite object, such as a list, is EQ to itself. (But two

lists are not EQ just because they look the same when printed; they must truly be the same

list to be EQ.) Under the covers, EQ just compares the memory addresses of objects.

EQ is not guaranteed to be true for identical characters or numbers.

This is because most Lisp systems don't assign a unique memory address to a particular

number or character; numbers and characters are generally created as needed and stored

temporarily in the hardware registers of the processor.

EQL is also true for identical numbers and characters

EQL retains EQ's notion of equality, and extends it to identical numbers and characters.

Numbers must agree in value and type; thus 0.0 is not EQL to 0. Characters must be truly

identical; EQL is case sensitive.

EQUAL is usually true for things that print the same

EQ and EQL are not generally true for lists that print the same.

Lists that are not EQ but have the same structure will be indistinguishable when printed;

they will also be EQUAL.

Strings are also considered EQUAL if they print the same. Like EQL, the comparison of

characters within strings is case-sensitive.

EQUALP ignores number type and character case

EQUALP is the most permissive of the core comparison functions. Everything that is EQUAL

is also EQUALP. But EQUALP ignores case distinctions between characters, and applies the

(typeless) mathematical concept of equality to numbers; thus 0.0 is EQUALP to 0.

Furthermore, EQUALP is true if corresponding elements are EQUALP in the following composite data types:

Arrays
Structures
Hash Tables

Longer tests are slower; know what we're comparing

The generality of the above longer-named tests comes with a price.

They must test the types of their arguments to decide what kind of equality is applicable;

this takes time.

EQ is blind to type of an object; either the objects are the same object, or they're not. This

kind of test typically compiles into one or two machine instructions and is very fast.

We can avoid unnecessary runtime overhead by using the most restrictive (shortest-

named) test that meets we need.

Stop N Learn

Get Presentations

Downloads-Free

Friday 25 April 2014

Machine Learning

DEFMACRO

DEFUN defines named functions

Not All Comparisons are Equal