Abnormal event detection is one of the most focused areas of task in video analysis, which is aimed to differentiate abnormal and normal events in the surveillance videos. As the differences between normal and abnormal events are uncertain, more discriminating methods or motion information need to be explored. There are three main classes of techniques to solve this problem unsupervised, supervised and semi-supervised. Recent work of applications in convolutional neural networks have shown significantly reliable results in identifying multiple types of objects present in the scene, which is given as an input in the form of a image with the hep of convolutional layers. The downside of this is, it is a supervised learning model. An efficient method for detecting abnormalities in videos is proposed. It is semisupervised, progressing from existing supervised techniques. We aim to develop a spatiotemporal architecture consisting of two components, one for learning spatial information and the other for learning temporal information obtained from phylogeny of spatial features. The architecture proposed requires only normal event videos during training. The proposed architecture uses reconstruction error to make predictions.