There are two major theories of the brain’s cognitive function - the theory of modularity and the theory of distributive processing. Instead of asking whether brain’s regions are functionally interconnected or specialized, I tend to think of them as complementary to each other.
For the modularity theory, specialized regions are domain specific for different cognitive processes. From the evolutionary perspective, human mind evolved with enhanced functionality developing as a result of its increasing fitting capability to gain more adaptiveness, such as prefrontal cortex that handles high-level cognitive processes, and visual area V4 and V5 in charge of the perception of color and vision motion.
For distributive processing, brain areas are highly interconnected and process information in a distributed manner. Thanks to advances of brain imaging techniques such as MRI and PET scans, neural interactions can be measured by analysis in neuroimaging, which provides more evidence to support the interaction theory of distributive processing. Many regions in the brain are physically interconnected in a nonlinear system, procuding behaviors as a result of a variety of system organizations.
The two theories should be combined to collaboratively characterize the functioning of the brain. Modularity is a matter of degree rather than a rigid separation, and the nervous system always integrates information produced in different regions.
One of the biggest breakthroughs for artificial neural networks is to realize the importance of the idea of distributive representation that brings powerful expressivity. The specific meaning for each individual neuron unit is not neccessary, but instead the space spanned by a set of peer units may carry a semantically meaningful manifold, sort of a collective production.
However, the specialization property has not been well taken into account. Considering a fully-connected layer, each unit has access to its previous and next layers, receiving signals from all units in the previous layer and sending signals to all units in the next layer. There is no any specialization for each unit, since every unit has a similar role to one another. The only difference between each peer unit is their weights of connections to the two adjacent layers. Relying on such level of difference, one cannot expect that some sort of specialization and modularity would emerge. Of course, scalar units hold too limited information to carry subspace-level semantics and develop their specialization, and thus we should consider multi-dimension neuron unit or neuron node consisting of multiple atomic units to build a big neuron-node network. From this point of view, a standard DNN would turn into a chain computation architecture by wrapping units of each layer into a neuron node, without any branch or bypass. Therefore, each time we run a forward pass, every node would be activated and then updated regardless of whether it should be or not. I guess the chain architecture may be one of the factors that cause catastrophic forgetting or the interference problem.
I tend to think of a multi-way graph-structured architecture instead of a single-way chain-structured architecture, with branching or mergining possibly occuring at any point, which can be seen as an ultimate exention to skip connections, lateral connections and highway. We can image a gaint high-dimensional mesh net with some fractal structure, that it, with local-scale structures that hold unforgetable details as well as global-scale structures that enable regions to interacte with one another and exchange information efficently. Moreover, this gaint mesh net will not be operated fully, but instead only a small fraction of nodes will be actived to interact, forming a small subgraph performing computation.
A quetion is raised in the this computation framework - how do we pick the actived subgraph out of the gaint mesh net? And what does it stand for?
It actually implies a navigation problem that requires choosing nodes and edges at each step, a type of actions that happens in the computation graph. From a low-level computation view, it navigates computation flow through a subgraph of the big neural-node network; from a high-level concept view, a graph-structured architecture may be a better way to encode the external environment into an internal environment model, the imagination world, so that a couterfactual experiment could be conducted in this world by taking virtual actions through an imaginary adventure. Couterfactual reasoning and action intervention are key factors to develop causal cognition for humans.
Finally, I would take a bold move to speculate that it is attention that causes our consciouness flow. On the one hand, there is an attention mechanism in the computation framework level to draw out a subgraph and navigate the computation flow; on the other hand, attention can be viewed as a mental action, consciously navigating us in our conceptual or imagination world.
There is an old saying, one of the principles in Chinese Tai Chi martial art - “Use Yi (use wish), don’t use Li (apply force)”.