Viewing posts for the category Omarine User's Manual
Many data scientists confuse in distinguishing between Classification neural network and Regression neural network. There are several reasons:
Currently the popular output class encoding method is one-hot, each class corresponds to a vector that only one bit of the class is turned on (by 1), while the other bits are 0. The most applicable in this way is TensorFlow with one_hot () function
The output is a level 4 square matrix, each row representing one class. Suppose those classes have the following labels:
Local minima is also a controversial issue. A theoretical proof (Hornik 1989) under strong assumptions with the conclusion that the neural network has no local minima is rejected by experimentation on the real model with finite sample set. In this article we will analyze the problem in a different direction with conclusive conclusion: The neural network may have local minima but not serious.
In the previous article I talked about the limit of mathematics. So what is that limit? That is the limit on its axioms.
"Can't be inside but prove the outside".
Mathematics is limited to axioms, its scope is only a small special case in the problem space in general. For example, vector space on ℝ must satisfy its 10 axioms. What would you do if, for example, only 9 axioms are satisfied? You cannot use mathematics to prove the outside things that is not bound by those axioms. If you put all the problems inside you will become misguided. We are entering the era of AI of cognitive programs that simulate the activity of the human brain. Human awareness is very rich and cannot be calculated. AI also, the capacity of a neural network is not the same as a normal program.
LOOK AT THE PARAMETERS, DO NOT CONSIDER THE WEIGHTS
Instead of proving, we are outside observing. We do not question the local minima of the error function in the weight space, because by doing that we have defaulted the problem to the minima problem of a function. Instead, consider the weights as the network parameters.
For simplicity, we consider a network that has only one input node, one output node, no hidden node, and no transfer function (many called activation function). Only two weights a1 and a0, the output of the network is simply a linear function
y = a1x + a0
We also use only one sample (Xs, Ys).
For a0 = 0, we have y = a1x
It can be said that setting up a neural network without using a test set is a legitimate desire of neural network researchers, because taking away some examples to create a test set the network will not be learned those examples. In the past, it must be mentioned that these approaches are made by John Moody, David MacKay, Vladimir Vapnik. However, those proposals have not come up with a solution that we can use today.
So the question "Need the test set?" Or "Whether or not a test set exists?" is left open, and we will answer the question here.
The problem of test set becomes important when the example set is too small, the loss of some rare examples used in testing will be an expensive price for the learning quality of the network. So is there any way that we do not need to use the test set and still ensure the requirements of the network? To answer this question, first of all, let's see what the test set is for. What does it check. If we achieve the requirement that the test requires and do not use it as a mandatory element, then we do not have to cost it.
The overfitting is a phenomenon that a network well fits with the examples it learns, but gets large errors for examples that it is not learned. In other words, it is not capable of generalization.
Data, example, sample and pattern
Training data of neural network includes examples. These examples can be called samples because the neural network learns them. However, examples in the form of raw data and not called patterns - concepts that contain knowledge. Examples of pattern concepts are regular expression patterns, wool patterns, metallic patterns.
The concept of mining here is exactly the same as its literal meaning of [mineral] mining, patterns and knowledge are extracted from raw data including a large number of examples. It differs from data analysis in general only describing data. Because data mining is related to knowledge, it shares knowledge with machine learning.
batch and mini-batch
The batch method is the method of putting all samples for each step to train the network. This method has the advantage of simplicity and high accuracy. However, this method contains the potential to lead the entire set of examples to local minima. In addition, it is only suitable for small example sets (a few dozen examples) because for a large example set the network must learn all the examples at the same time, it will take a lot of time.
Large example sets need data mining for training. A small group of examples with size batch_size is extracted to train the neural network at a training step. This method is called mini-batch. The group must contain knowledge to model the sample set. In other words, it must have the role of a pattern. In particular, it must ensure a statistical balance for each class in the neural network.
Interestingly, the statistical nature of the pattern is important, not its size. For example, 1m fabric pattern and 10m fabric pattern are similar. Therefore we use a small batch_size which will greatly reduce computational complexity.
More specifically, the training pattern of neural network is not a physical pattern. This means that a specific example group used as a pattern in a training step does not necessarily have to be statistically balanced and does not need to be present all classes. But the pattern is changed smartly and over thousands of steps it meets the requirements.
The video below shows a neural network with 3000 examples completing training in a few minutes with a test accuracy of over 99% . The batch_size is 32. If it has to learn 3000 examples at the same time as the batch method, the learning of the network will last all day
Can't see mail in Inbox? Check your Spam folder.