In this work, we attempt to solve the Time Dependent Schrödinger Equation directly by placing a neural network ansatz on the solution and designing a loss metric based on the constraints imposed by the equation. We show that the method works for simple cases, such as those in 1D or with the free Hamiltonian, but fails in higher dimensions (3D+) or when a complex potential is present. We then examine why the neural network fails to converge and identify the main failure mode, in which the model learns the zero function, and its underlying cause. That cause being the pathological training dynamics with different scales, which hinders the optimization process. The different dynamical scales arise from the composite nature of the loss. Optimization is made more difficult because most modern optimizers were designed for a different class of loss functions stemming from more traditional applications of neural networks, which are not equipped to deal with the pathologies of our use case. We attempt to solve this problem by normalizing the gradients and compare our approach to recent developments but conclude that both approaches fail to address the underlying problem.