Novel view synthesis aims to generate novel views from one or more given source views. Although existing methods have achieved promising performance, they usually require paired views with different poses to learn a pixel transformation. This article proposes an unsupervised network to learn such a pixel transformation from a single source image. In particular, the network consists of a token transformation module that facilities the transformation of the features extracted from a source image into an intrinsic representation with respect to a pre-defined reference pose and a view generation module that synthesizes an arbitrary view from the representation. The learned transformation allows us to synthesize a novel view from any single source image of an unknown pose. Experiments on the widely used view synthesis datasets have demonstrated that the proposed network is able to produce comparable results to the state-of-the-art methods despite the fact that learning is unsupervised and o