Machine decision – The On/Off problem

We describe the problem as follows:
There are two attributes X and Y, and the two attribute values are less and greater.
If x < 1 (or y < 1) then X (corresponding Y) is less. If x > 1 (or y > 1) then X (corresponding Y) is greater.
The result has two values, corresponding to the two classes On and Off.
There are 10 samples as follows:

 Order     X         Y           RESULTS
1 less less Off
2 greater less On
3 greater greater On
4 greater greater On
5 greater greater On
6 greater greater On
7 greater greater On
8 greater greater On
9 greater greater On
10 greater greater On

The samples are shown in the following illustration:

Ask that if x < 1 and y > 1 then the result is On or Off?

At first glance we may think that the result is On because of the number of samples on the On class when y > 1 is the majority, there are 8 such samples out of a total of 10.

But not! The result is Off.
Based on the theoretical basis of the id3 decision tree, when we divide the sample set by attribute X we get two subsets of elements of the same class. That means the information is fired completely and the IG is biggest. So attribute X is the most important attribute. Therefore every case with x < 1 regardless of y will result in Off.
This problem can be solved by running the program id3 as follows

with the data file’s content as follows

-----BEGIN DATA DEFINITION-----
X: less greater
Y: less greater

RESULTS: On Off
-----END DATA DEFINITION-----

ID3 data

Order X Y RESULTS
1 less less Off
2 greater less On
3 greater greater On
4 greater greater On
5 greater greater On
6 greater greater On
7 greater greater On
8 greater greater On
9 greater greater On
10 greater greater On

However, if we add a large number of identical samples of x > 1, y > 1 with the result is On, the information divided by the attribute Y starts to be fired relatively well and the attribute Y gradually becomes important. It is no longer too far from the X attribute. For example, we add 410 such samples and the total number of samples is 420.

But the id3 decision tree does not show that, nor does it reflect the frequency of the sample’s information. The result stays the same even though the sample set has changed

This can be overcome using Football Predictions 2.0, we are right again, the result is On:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.