Deep learning and predictive coding architectures commonly assume that inference in neural networks is hierarchical. However, largely neglected in deep learning and predictive coding architectures is the neurobiological evidence that all hierarchical cortical areas, higher or lower, project to and receive signals directly from subcortical areas. Given these neuroanatomical facts, today’s dominance of cortico-centric, hierarchical architectures in deep learning and predictive coding networks is highly questionable; such architectures are likely to be missing essential computational principles the brain uses. In this Perspective, we present the shallow brain hypothesis: hierarchical cortical processing is integrated with a massively parallel process to which subcortical areas substantially contribute. This shallow architecture exploits the computational capacity of cortical microcircuits and thalamo-cortical loops that are not included in typical hierarchical deep learning