Deep Drone Racing Research
Peter Wei
Mr. Kemp
AE Research
2018/9/21
Article Review 1
Deep Drone Racing: Learning Agile Flight in Dynamic Environments is published on arXiv.org(Cornell university) and is written by Elia Kaufmann, Antonio Loquercio, Rene Ranftl, Alexey Dosovitskiy, Vladlen Koltun, Davide Scaramuzza. The article is published on 22 June 2018.
The article basically presents an approach to accomplish an autonomous, vision-based drone racing in environments where may be dynamic obstacles. Their method is combined with a compact convolutional neural network(CNN) which runs on an onboard processor by the side of a classic/basic flight controller. The whole system only takes raw pictures Taken by a front facing camera as input to generate predictions of its trajectory and the system is capable of running fully onboard. The raw pictures will firstly be processed by the well trained CNN and output as desired waypoints and desired speed, then these high level commands will be executed by the flight controller to the motors so that the main processor will only focus on the trajectory calculation. The authors of the paper trained their neural network by using pre-generated global reference trajectories(about 5k sets of them according to the article). As a result, the control system is able to combine the robust perceptual awareness of modern machine learning pipelines with the stability and speed of well-known flight control algorithms.
Significance of the problem
The problem is well defined in this research, the researchers find that reliable state estimation, reacting optimally to dynamically changing environments, and coupling perception and action in real time under severe resource constraints are some of the major problems that people are facing in development of autonomous agile flight robots. And by the end of the article, they successfully presented their own innovative “hybrid” approach to overcome the addressed problems.
My initial thought was to develop a system that will enable the drone to fly through paths that are used in drone racing competitions autonomously. And the article takes my idea a step even further that its approach can also be used in dynamic environments. In a word, totally related.
I find this article useful with respect to my research because it presents a valid thought that can potentially be an approach to my research problem with a lot of related information.
Scholarship and objectivity
The paper clearly separates methodology from analysis, the authors address the problem of robust, agile flight of a quadrotor in a dynamic environment. To analyze the problem, they firstly consider approaches that exist already and talked about advantages as well as disadvantages of each of methods before they come to their own conclusion/approach.
Simultaneous Localization and Mapping(SLAM) can provide accurate position estimates against a previously-generated, globally-consistent map. But the approach may fail due to its limited adaptation ability to unknown(any non-previously-mapped) environment and little tolerance towards motion blur(will lead to loss of feature tracking) caused by high moving speed.
“State-of-the-art” state estimation(still need to do some research on what exactly this is referring to) pipelines may require expensive sensors, have high computational costs, or be subject to drift due to the use of compressed maps. And they are designed for a predominantly-static world, “where no dynamic changes to the environment, or to the track to follow, occur.”
Traditional handcrafted gate detectors quickly become unreliable when there are occlusions, partial visibility, and motion blur in the system. The classical way to solve this problem is by introducing the use of visual servoing, where the robot is given a set of reference images so that they can recognize the gates in real world when the way they looks match the images they get. However, this approach only works well when there is only small differences between the given image and the actual object. And, like before, it is not robust to occlusions and motion blur.
End-to-end trainable machine learning systems: they derive action directly from images. While being independent of any global map and position estimate. However they are not directly applicable to the problem due to their high computational complexity, as well as their low maximum speed or the inherent difficulties of generalizing to 3D motions. Furthermore, the optimal output representation for learning-based algorithms that couple perception and control is still an “open question”.
Known output representations range from predicting discrete navigation commands --- which enables high robustness but leads to low agility.
To direct control --- which can lead to highly agile control, but suffers from high sample complexity.
They finally come to their own analysis on how to solve the problem. By taking the best from two ways(above), they want to combine the benefits of agile trajectories with the ability of deep neural networks to learn highly expressive perception models, so that it will be able to work with high-dimensional, raw sensory inputs. The supervision to train their neural network is going to come from global trajectory methods, and the learned policy only operates on raw perception input, i.e., images, without requiring any information about the system’s global state. Plus, the “learner” acquires abilities that the “teacher” it imitates do not poss, so that it can cope with dynamic environments.
For methods, the paper divide the the whole thing into two subsystems --- Perception and Control. The goal of the perception system is to analyze the image and provide the flight direction to the control system. They implement the perception system by a convolutional network, which takes a 300*200 pixel RGB image as input and outputs a tuple {x,v}, where “x” is a two-dimensional vector that encodes the direction to the new goal in normalized image coordinates and “v” is a normalized desired speed to approach it. To allow for onboard computing, they implement a special architecture(which I didn’t understand fully). And with their setup, the network and process about 10 frames per second.
With a given tuple {x,v}, the control system generates low-level control commands. And then the basic flight commands are sent to the motors.
Before the perception method can be used in the actual flight, the CNN needs to be trained by a significantly amount of given images and reference trajectories(imitation learning). For generate these trajectories, they make the assumption that at training time the location of each gate of the race track, “expressed in a common reference frame, is known.”
They use an Expert policy to train and supervise their neural network. Which basically means there is a minimum-snap trajectory generated by a method presented by professor Kumar and Mellinger(who are top scholars within the field of drones and autonomous control approach). And they collect a dataset of state estimations and corresponding camera images. By using the global references trajectory, they evaluate the expert policy on each of these samples and use the result as the ground truth for training. And the performance of the actual policy will not be limited to what it has been taught/shown.
After the neural network is well-trained, they tested it through both simulations and actual flights comparing to the other approaches(e.x. The end-to-end learning approach). And they tested the performance of the system both in small and large tracks(different in number of gates and total loop length).
5) Clarity of presentation
For results, the authors of the paper state that they have presented a new approach to autonomous, vision-based drone racing. And they have demonstrated the capabilities of this integrated approach to perception and control in a extensive set of experiments on real drones and in simulation, and results also showed that the resulting system is able to robustly navigate complex race tracks, avoids the problem of drift that is inherent in system relying on global state estimations, and is able to cope with highly dynamic and cluttered environments. And it has shown great advantages over other more common approaches.
6) Insight and perspective
The article has implement some of the pre-existing methods(e.x. Convolutional Neural Network, professor Kumar’s reference trajectory generating method, etc.) but it has brought its own hybrid approach which successfully solved the problem addressed for the first time ever. The authors of the paper has clearly analyzed the advantages and limitations of each of the existing approaches that might be able to solve the problem and finally came up with their own approach. The outcome of this research is remarkable in its field as it has given the best answer to the problem of robust, agile, vision-based autonomous flight.
At the end of the paper, the authors also acknowledged that scaling such hybrid approaches to more general environments is an exciting avenue for future work that poses several challenges. “First, while the ability of our system to navigate through moving or partially occluded gates is promising, performance will degrade if the appearance of the environment changes substantially beyond what was observed during training. Second, in order to train the perception system, our current approach requires a significant amount of data in the application environment. This might be acceptable in some scenarios, but not practical when fast adaptation to previously unseen environments is needed. This could be addressed with techniques such as few-shot learning. Third, in the cases where trajectory optimization cannot provide a policy to be imitated, for instance in the presence of extremely tight turns, the learner is also likely to fail. This
issue could be alleviated by integrating learning deeper into the control system.”
7) Conclusion
This research paper is overall well presented and is very helpful for me doing my research. Although there might be things that I may need to do some further research on, I still consider the approach presented in this paper to be very valuable for me accomplishing my research. And I will strongly suggest people others to read this paper if they are interested in what I do because they are incredibly closely related from both topic wise and possible approaches to them.
Reference
K., E., A., R., R., D., . . . D. (2018, June 22). Deep Drone Racing: Learning Agile Flight in Dynamic Environments. Retrieved from https://arxiv.org/abs/1806.08548