Each time I attempt to see intelligence in machine learning models, there would be a doubt rising in my head - how could intelligence, characterized by high-level abstraction capability, be acquired simply by a gradient descent process reaching convergence? I would rather believe that the journey of pursuing intelligence is a back-and-forth, twist-and-turns, and chaotic-and-conscious process with loops, eventually jumping out of loops and reaching some enlightment point. Therefore, it cannot be achieved only with one single objective formulated by one specific optimizatoin problem.
If you begin to question that, you might find the root of the problem lies in our research motivation for studying AI. Many people would think that the subject of AI research should be a concrete application-oriented problem, but I would rather believe the intelligence itself should be laid at the core of the subject we should study. If we wish to solve almost everything using intelligence, how could it be possible to achieve it bypassing intelligence. Even if we may not know what exact properties artificial intelligence should have, at least we could find some clues from ourselves - human intelligence.
From this perspective, I acquired some inspiration from children learning to propose a hypothesis of a continuous self-training framework - a continuous process alternating between finding a puzzle and solving a puzzle. As we were children, we explored the world mainly out of curiosity. This process can be decomposed into three questions:
- Is it easy to find or construct a puzzle by the child or the agent?
- Is this puzzle likely to be solved by the child or the agent?
- How much sense of achievement is gained from solving a puzzle by the child or the agent?
I take a specific scenario that children love watching cartoons more than movies. It is obvious that cartoons have a lot of non-realistic scenes, such as bright colors with high contrast and talking animals. We must notice that the lack of fidelity does not affect children’s cognition at all. On the contrary, children may be sensitive to the cognition complexity required in a scene. Bright colors catch children’s eyes because they help kids to distiguish objects from one another in their field of vision. Talking animals appeal to children because children may find it easier to understand anthropomorphic characters by animal appearances than to understand human psychology and behavior. Therefore, children’s cognition progressing needs incentive built properly as a driving engine to keep them focusing and learning autonomously, implicitly or explicitly, which I call continuous self-training.
The scenario fits the above three-questions schema in the finding-solving framework. First, would it be easy for children to form a puzzle in their heads by watching cartoons. An obvious fact is that children prefer reading pictures to reading texts, and prefer reading cartoon pictures to reading real-world images. A cartoon character with a strange shape or a dumb movement, such as a banana-like head or a falling-down movement, tends to create a cognitively simple but surprising event and draw children’s attention, forming a puzzle more easily. By watching cartoons, a small rudimentary environment model would develop in children’s heads, often with a narrative storyline made up from the children’s point of view, which always goes with a series of WHY questions to seek explanations. I think this is where children begin to acquire the basic causal reasoning capability. Although adults might see these questions either having obvious answers or being meaningless to ask, for children such questions constitute initial puzzles to help develop their common knowledge.
After forming a puzzle, we will assess whether the puzzle is easy to solve and might try multiple times harder and harder. If we solve it, the degree of being able to acquire new knowledge will contribute to the sense of reward. Little knowledge acquired might make us feel boring, whereas much new knowledge gained tends to cause us to feel satisified and even excited. If we fail to solve it after many attempts, we may give it up without any reward obtained but being exhausted. Often, we would first consider whether it is worth solving based on our previous experience, by evaluting how likely we are able to solve it and how much reward it can bring us. For children, their sense of reward may be mainly driven by their curiosity and possible acheivement they might get. As they grow up, answering such simple questions cannot bring them new knowledge and experience any more, leading to little reward. They need to find new puzzles and solve them.
The process of finding, evaluating, tring to solve, failing or succeeding, getting rewarded or not, and then re-evluating, happens so quickly that we might not be aware of how we are motivated in our everyday life. When we become adults, the reward expectation system becomes so complex that the curiosity and knowledge achievement are not the only inccentives to drive us. As our environment models become strong, we are able to do long-term planning and get self-motivated by some ultimate goal we call the life purpose.
I know this is a big topic beyond our current techniques pursuing AI, but we can still apply this thinking to some computation models such as GANs. In this adversarial training framework, the two opponent roles can be simultaneously hosted in our brain, one to propose a puzzle, one to solve it. The puzzle is proposed by the discriminator and left for the generator to solve. To find the most rewarding puzzle, the discriminator has to play in an adversarial style, that is, to pick the biggest weakness in the generator leading to a large margin to progress. Some troubles in training GANs can be also explained using this framework. Sometimes, the generator fails to solve puzzles given by the discriminator, getting stuck in a poor finding-solving process.
For future work, we need to figure out more computation framework to reflect the finding-solving process besides GANs.