Viewing posts for the category Omarine User's Manual
We already know that although id3 is a legacy with limited capabilities, it is still the No. 1 candidate for distinguishing important features with a strong theoretical basis. So why not use it to remove irrelevant features?
REGULARIZATION
Regularization is a measure of transforming the problem into a basic form that can be solved by known methods, which can be applied when certain assumptions are satisfied. For example linearization of data in linear regression.
In machine learning (deep learning) there is no specific way to do that. It is desirable to remove less relevant or irrelevant features to simplify the problem. Since then name the method. A typical example is Google's "L1 Regularization" method, which has been detailed in the article Neural network: Using genetic algorithms to train and deploy neural networks: Need the test set? so I don't repeat it here. One thing can be seen immediately that the "L1 Regularization" method eliminates the less relevant features only after they have been put into the network. What do you think about this? Put noise into the network and then find a way to remove it! How many features are less relevant? Which ones?
If there are indeed less relevant features then id3 is a great way to remove them. This is done in the data mining step, ie before putting data into the network for training. You can identify these less relevant features by programming or using the tool fpp. Removed features will not be present in the id3's output
Many data scientists confuse in distinguishing between Classification neural network and Regression neural network. There are several reasons:
Currently the popular output class encoding method is one-hot, each class corresponds to a vector that only one bit of the class is turned on (by 1), while the other bits are 0. The most applicable in this way is TensorFlow with one_hot () function
The output is a level 4 square matrix, each row representing one class. Suppose those classes have the following labels:
Local minima is also a controversial issue. A theoretical proof (Hornik 1989) under strong assumptions with the conclusion that the neural network has no local minima is rejected by experimentation on the real model with finite sample set. In this article we will analyze the problem in a different direction with conclusive conclusion: The neural network may have local minima but not serious.
In the previous article I talked about the limit of mathematics. So what is that limit? That is the limit on its axioms.
PRINCIPLE
"Can't be inside but prove the outside".
Mathematics is limited to axioms, its scope is only a small special case in the problem space in general. For example, vector space on ℝ must satisfy its 10 axioms. What would you do if, for example, only 9 axioms are satisfied? You cannot use mathematics to prove the outside things that is not bound by those axioms. If you put all the problems inside you will become misguided. We are entering the era of AI of cognitive programs that simulate the activity of the human brain. Human awareness is very rich and cannot be calculated. AI also, the capacity of a neural network is not the same as a normal program.
LOOK AT THE PARAMETERS, DO NOT CONSIDER THE WEIGHTS
Instead of proving, we are outside observing. We do not question the local minima of the error function in the weight space, because by doing that we have defaulted the problem to the minima problem of a function. Instead, consider the weights as the network parameters.
For simplicity, we consider a network that has only one input node, one output node, no hidden node, and no transfer function (many called activation function). Only two weights a_{1} and a_{0}, the output of the network is simply a linear function
y = a_{1}x + a_{0}
We also use only one sample (Xs, Ys).
For a_{0} = 0, we have y = a_{1}x
Can't see mail in Inbox? Check your Spam folder.