learning agile gate traversal via analytical optimal policy gradient