Why does sigmoid activation function work better than tanh f