Porting Algorithms from ROS 1 to ROS 2
Note: Apex.AI is creating an automotive grade ROS 2 called Apex.OS. In this blog post we refer to some proprietary Apex.OS constructs about which we will report in the upcoming blog posts.
Converting an application framework from ROS 1 to ROS 2 can be as easy as switching a library and changing some types. To properly convert an application and to ensure the ROS 2 implementation is production grade, the following tasks must be considered and implemented:
Ensure the algorithm implementation follows architectural and software engineering best practices
Ensure concerns are separated
Follow a safety-critical coding standard
Use warnings and static checkers
Ensure the algorithm implementation is fully tested
Ensure the implementation is well documented
Tune and optimize the implementation for the target platform
To more concretely understand the steps needed to port an algorithm implementation from ROS 1 to ROS 2, the ROS 1 Velodyne driver is examined in detail.
ROS 1 is an excellent framework for developing and quickly prototyping robotic applications. This is achieved by having vast community support and a wide array of tools. For more performant use cases with stricter requirements, it has become increasingly clear that ROS 1 is not sufficient for developing production quality and safety critical robotic applications.
ROS 2 by contrast is a framework for safe, secure, and robust applications, such as autonomous driving which falls into the domain of safety-critical applications.
Converting an algorithm or application to use ROS 2 as a framework can be as easy as switching out constructs such as publishers, subscribers, and nodes, in addition to modifying the build process by including the correct headers and linking against the correct libraries. Modifying an application in this way would result in an application that nominally runs on top of ROS 2, but it cannot be called a ROS 2 application, as it is not real-time or robust. To convert a ROS 1 application to an ROS 2 application, a number of improvements must be implemented.
Often times ROS 1 is used for rapid prototyping and development of robotic applications. In engineering, an umbrella under which software development falls, prototyping is a necessary evil to get applications and systems working. From a prototype, an engineer is afforded a holistic view of an application that works, which is not always available during low-level development.
From this holistic view, an opportunity to understand what the application does at the highest level is available. This view allows the engineer to map these steps to the most appropriate software engineering and architectural patterns. When rewriting an application to run on ROS 2, the engineer is also afforded the opportunity to fix the architecture of the application to be most appropriate to the target problem.
For a concrete example, consider the architecture and workflow of the ROS 1 Velodyne driver.
What the driver does
Before implementing architecture changes, it's important to analyze the high-level functionality of the existing solution. Schematically, the ROS 1 Velodyne lidar driver does the following:
Fundamentally, the operation of the driver can be broken down into four steps:
Wait for a packet
Deserialize the packet into points
Aggregate points into a point cloud
Publish the message
Applying best practice - multithreading
Architecturally, the existing Velodyne node can be thought of as having two worker threads, plus any number of threads ROS 1 automatically generated. What can first be reviewed is the multi-threading in this implementation.
While there are many rules on how to correctly implement multi-threading, such as prefer locks to lock-free programming, and so on, the most widely agreed rule for best multi-threading best practice is to not use multi-threading.
If multi-threading is not needed, then it should not be used. This has complexity implications (i.e. multi-threaded programs are more complex and harder to debug), and it also has performance implications; some performance is lost in multi-threading due to context switching at the kernel level. The use of two threads in a sequential configuration can potentially have some latency benefits, and attenuate bursty effects from input data.
Conceptually, the input to this software component can be thought of as steady stream of packets. If a packet cannot be handled in the time before the next packet arrive (i.e. the packet cannot be deserialized in the time available), then there is no hope of the application keeping up with the stream of data input. As such, this application can be rearchitected to reside in a single thread.
Applying best practice - polling
In order to avoid blocking, the ROS 1 Velodyne driver polls its socket infinitely.
Polling is generally understood to increase jitter in systems.
To avoid jitter, the a special UdpReceiver (Apex.OS IP) is used, which wraps POSIX's recvfrom along with select, the combination of which allows the developer to wait for data up to a timeout. Using a waiting or event-driven pattern reduces CPU load and jitter in a system.
Separating concerns is a problem endemic to rapidly prototyped code, and a problem that is common in ROS 1 code, is not obeying the design principal of separation of concerns. In general, software engineering best practice suggests that concerns should generally be kept as separated as possible, and a loosely coupled software architecture should be used whenever possible. Recent best practice suggests that in a large robotic system, it is important to keep the following five concerns separate:
A robotic framework, such as ROS 1 or ROS 2 typically has ownership over concerns 2-5. This implies that the remaining concern, computation (i.e. algorithms, control logic etc.), falls in the developer's domain and the operation of this concern should be separate from the other concerns.
Because of the rapidly developed nature of many ROS 1 applications, concerns are seldom kept well separated. A port to ROS 2 is an opportunity to properly separate concerns. Returning to the example of the ROS 1 Velodyne driver, it can be seen that concerns are not kept separate.
Interleaved concerns in the ROS 1 Velodyne driver
Fundamentally, the driver does the following:
Wait for a packet
Deserialize the packet into points
Aggregate points into a point cloud
Publish a message
The second and third, and to some extent the first, operations can be thought of the computational concern of this application.
One fundamental problem in the ROS 1 Velodyne driver is that it is dependent on ROS 1 for it's fundamental computational operation. That is, this driver is dependent on ROS 1 nodelets, publishing, and subscribing in order for the second and third operation to work. This means that the basic computational operation of the driver is dependent on composition (i.e. nodelets), and communication (i.e. publishing and subscribing). In addition, constructing and configuring the packet receiving portion of this application is entirely dependent on ROS 1 parameters. This means that the computational component cannot be used without ROS 1 as a framework in play.
Interleaving the computational concern of the Velodyne driver with its framework results in two fundamental problems:
Proper testing and development of the computational concern is impossible as it depends wholly on the framework (meaning issues cannot be properly isolated)
Switching to a different framework is impossible without a full rewrite of the driver
Properly separating concerns in the driver
Separating concerns of a software component can be thought of as an intermediate-level architecture problem. Broadly speaking, a general recipe exists which allows the concerns to be properly separated:
With this recipe, the algorithm, which represents the computational portion of the application, is wholly encapsulated in its own class with its own configuration object. This allows the algorithm to be operated and tested in isolation. The algorithm node is a boilerplate class which uses framework features to fill out the configuration object and construct the algorithm. The communication from the framework also provides the input to the algorithm (properly separated), and the output of the algorithm can be passed to the framework.
Using the example of the Velodyne driver, the following concepts can be seen instantiated in ROS 2:
The deserialization of points is its own algorithm object
Configuration of the algorithm is decoupled from the framework
Interaction between the framework and the computation only happens in a boilerplate class
For more details on how ROS 2 nodes should be structured tune back for the future blog posts (Apex.OS IP).
When developing applications for a safety-critical context, it is important to use an appropriate coding standard. A number of such standards exist, including:
Most coding standards for the safety-critical use case are also relevant for both data and time (i.e. hard real-time) deterministic systems, which can be thought of as preconditions to having a safe system. Common themes that these coding standards cover include:
No blocking calls
No dynamic memory allocation (during steady time loop)
No undefined behavior (e.g. unsafe conversions)
No implicit conversions
Check return values
Properly catch and handle errors
Adherence to these rules can be enforced through use of static analysis tools.
Avoiding dynamic memory allocation
Of particular interest is the rule forbidding dynamic memory allocation during runtime. This is important because it decouples the application from the memory management portion of the whole system during runtime. The end result of being static memory means that no memory leaks can occur during steady time, and if another application has memory leaks, the static application will not be affected, because it will not request memory from the kernel during runtime.
The benefits of having a static memory application can be seen through running a memory stress test (e.g. stress command).
In the ROS 1 Velodyne driver, there are two notable locations where memory is allocated during the normal operation of the application:
The ROS 2 version of the Velodyne driver avoids runtime memory allocation in the first case due to architectural and algorithmic modifications.
Because unnecessary multi-threading was removed in the redesigned architecture, a packet is received, and in the space when the next packet is received, the packet is deserialized into the output point cloud. This redesign results in only needing one packet to be statically allocated either in the class or function scope.
The second form of memory allocation is avoided by doing the allocation during initialization rather than during runtime. By constraining configuration to only occur during initialization, the maximum size of the point cloud can be known during initialization. As a result, the maximum capacity of the point cloud can be preallocated during initialization.
To support the development of memory static applications, there are tools for identifying unexpected memory allocations in a test setting.
Avoiding blocking calls
Dynamic memory allocation is a form of a blocking call. It is a blocking call because its runtime is nondeterministic and is not upper bounded in time. This occurs due to memory fragmentation.
In general, blocking calls should be avoided during the runtime in safety-critical applications because the presence of such calls makes the application nondeterministic in the time sense. Making an application deterministic in the time sense is achievable by upper bounding the number of operations (i.e. using bounded for loops, no infinite recursion), and ensuring each operation terminates in an upper bounded number of CPU cycles.
The ROS 1 Velodyne driver makes use of polling to avoid blocking, thereby introducing a different problem. The ROS 2 version of the Velodyne driver uses the UdpReceiver which uses recvfrom and select. This allows the call to timeout, and can thus be upper bounded in runtime.
Reporting and handling errors
One critical area where the ROS 1 Velodyne driver falls short is in error handling.
Fundamentally, a LiDAR driver's four operations are all potential modes of failure:
The LiDAR device sends data
The application receives data
The application deserializes the data
The application aggregates and publishes the data
Of these steps, the first, third, and fourth are either straightforward mathematical operations, or outside the purview of the application. The remaining operation, receiving data, is an operation within the purview of the application, and potentially indicative of further system or hardware faults.
By default, the ROS 1 Velodyne driver polls for data forever if data does not arrive in time. In contrast, the ROS 2 Velodyne driver reports a timeout and notifies a higher level manager outside of the application.
For more details on the error handling guidelines tune back for the report on software architecture and error handling (Apex.OS IP).
Developing or redeveloping an application is only the first step. For an application to be considered certifiable or production grade, a number of further steps must be taken to assure the highest quality of code.
These steps include testing, various forms of analysis, and documentation.
Static analysis tools can detect adherence to a coding standard, bugs, and undefined behavior. A number of well-known bugs were and are detectable with static analysis tools. An example of a bug that can be caught by static analysis is Apple's goto fail. A good first pass on any software application is to turn on all warnings ( e.g. CLang's -Weverything ). While this flag cannot detect all bugs or undefined behavior, it does help identify usage of problematic language features, many of which go against safety-critical coding standards, such as implicit conversions of integer types.
Unit testing is necessary to prove the correctness of code. In general, all functions, lines, branches, and conditions of code should be covered, including template instantiations. A high level of testing coverage promotes confidence in code, and is also necessary for some forms of certification.
Architecturally, separating the computation of an application into a library also makes it easier to instantiate and unit test in isolation.
Further levels of testing, including integration, torture, and hardware-in-the-loop (HIL) testing are also valuable for proving the correctness of an application against a use-case and requirements.
For more details on writing tests and measuring coverage, see the how to write tests and measure coverage article. See this article for more details on Modified Condition/Decision Coverage, a stringent code coverage metric.
Dynamic analysis (a.k.a. runtime analysis) involves running an application inside a testing harness, such as through the sanitizer family of tools, or Valgrind's memcheck tool. This further level of analysis is necessary and can catch a number of insidious bugs, such as those occurring due to uninitialized data.
The Heartbleed vulnerability is an example of a bug that can be caught by dynamic analysis, but not through static analysis.
Finally, the testing executable also provides a convenient starting point for dynamic analysis before moving up to the full application.
All public facing APIs should be documented to lower the chance of API misuse.
While further documentation is not strictly needed to create quality code, it is important in creating a good and user-friendly product.
Optimization and real-time support
It is only after an application has been fully developed and properly tested that it may be optimized and tuned for a target platform.
The first step of optimization should always be in the form of improving the algorithm. This includes choosing the correct data structures, sorting data, doing dynamic programming and so on. Once this has been done, or when the input data is provably small, more granular forms of optimization can be done.
Fine-tuned optimization is best done using an instrumentation tool in support. Examples of such tools include the perf tool, Valgrind's callgrind utility, and Intel's VTune. These tools can identify hot spots in the code, and should always be run on code generated in a release build.
In the case of the ROS 1 Velodyne driver, it is identified that the use of sin and cos functions are a bottleneck. These functions are then precomputed into a table.
Similar observations are made in the ROS 2 Velodyne driver.
In addition, because the ROS 2 Velodyne driver more liberally breaks the computation into functions, a large amount of time is spent entering the polar_to_xyz function, which resides in the inner-most portion of a loop. A small optimization is made by inlining this function. Other additional optimizations are made, including making large objects class variables rather than stack variables.
Once an application is reasonably optimized, it can be tested for real-time (Apex.OS IP) and then tuned for a real-time setting.
In Apex.AI version of ROS 2 we provide the capabilities (Apex.OS IP) to do the following:
Set CPU affinities
Set process priorities
However, properly setting CPU affinities and process priorities must be done in the context of a larger system on the target hardware. This is a final step before deploying to production.
It is not difficult to use ROS 2 as a generic inter-process communication framework. However, using ROS 2 properly to develop a safety-critical and production-grade application is not trivial, and takes a large amount of disciplined engineering work.
When developing safe applications, it is also important to look forward towards developing safe and deterministic system architectures.
Written by Christopher Ho.