• Christopher Ho

Building Safe Algorithms in the Open, Part 1 - Design

Consider the waterfall. Powerful. Implacable. Always surging forward towards the inevitable bottom. Powered by one of the few fundamental forces in the universe.

Waterfalls are truly awe-inspiring constructs, so it’s no wonder engineers have a bit of an obsession with them. The old DOD-STD-2167A standard recommended the use of a waterfall model, and my out-of-date engineering education suggested the use of a Phase-Gate model, which looks an awful lot like a waterfall to me. On the flip side, those of us who studied computer science at a university probably know that the waterfall model is a bit of an anti-pattern. Our friends in the ivory tower of academia tell us that, no-no, agile is the way to go, and that seems to have rung true in industry.

So between the old-fashioned, antiquated waterfall model, and the new hotness that is agile, what’s a developer to choose? Does the equation change when you’re trying to build algorithms? Or some safety-critical software?

As with most things in life, the answer is somewhere in between.

Hybrids, V’s, and Spirals

Hybrid development is that in between. Where the waterfall model doesn’t permit going back and changing requirements, the hybrid model permits it. And where agile development doesn’t necessarily lend itself well towards up-front planning, hybrid development also gives you space for that. What’s more, hybrid development is touted to reduce the number of defects in a final product, something we probably want when developing algorithms for safety-critical applications.

Sounds good, but how good is it really?

To answer that question, we’re taking a stab at hybrid development as we develop the NDT algorithm for localization in the open. Localization is an integral part of any autonomous driving stack that goes beyond purely reactive control. If you don’t believe me, or you’re not familiar with localization, I gently recommend that you take a look at some of the design documents that came out of this process.

So what is hybrid development in a nutshell? From my layman’s view, I’d argue it’s the idealized V model, or the Spiral Model. You plan, design, implement, and test, and iterate on that whole process based on the lessons you’ve learned and the friends you’ve made along the way.

Putting it into practice

A little bit more concretely, with the NDT working group in Autoware.Auto, we’ve currently finished up our first swing down the left side of the V (aka the first iteration through the design phase) in preparation for the Autoware Hackathon in London (hosted by Parkopedia!). Our first pass through the design phase consisted of the following steps:

  1. Literature review

  2. Review existing implementation

  3. High-level component design; use cases and requirements

  4. Failure analysis

  5. Metrics definition

  6. Architecture and API design

You can have a gander at each of the resulting documents if you’re interested in those kinds of words, but for the remainder of this post, I’ll try to break down some of the why’s and what’s that went into each of these steps.

Literature review and existing implementations

The first step in any worthy undertaking (within which I would categorize the implementation of NDT) is to see what other people have done. Humans, after all, are social creatures, and in all of our achievements, we stand on the shoulders of giants.

Allusions aside, there are two important directions to look at when reviewing “prior art”: academic literature, and functional implementations.

It’s always a good idea to look at what starving graduate students have worked on in the height of their hunger. Best case, you find out there’s a wholly superior algorithm that you can implement instead. Worst case, you understand the space of, and variations in solutions (which can help inform architecture), and you can be made aware of some of the theoretical underpinnings of the algorithm (and thus which invariants you need to protect).

On the flip side, it’s similarly good to look at what other people have done--after all, it’s always easiest to start solving something with a good initial guess. Not only do you get to gratuitously borrow good architectural ideas, but you might also discover some gotchas and dirty tricks you might need to use in order to get the algorithm to work in practice (which you might even be able to holistically integrate into your architecture).

From our literature review of NDT, we gleaned the following useful bits of information:

  • The NDT family of algorithms has a couple of variations: · P2D · D2D · Constrained · Semantic

  • There are a bunch of dirty tricks you can use to make the algorithm work better

  • NDT is generally compared to ICP

  • NDT is a bit faster and a bit more robust

  • NDT works robustly (high rate of success) within a specified local neighborhood

Nothing earth-shattering, but useful points of information to file away for later use, in both design and in implementation.

Similarly, from our review of an existing implementation, not only did we see concretely what the steps are, but we also saw some interesting initialization strategies.

Use cases, requirements, and mechanisms

An integral part of any “design-or-plan-first” development process is to look at the problem you’re trying to solve at a high level. Broadly, from a functional safety perspective (which I am admittedly far from an expert on), “looking at the high-level problem” is organized something like the following process:

  1. What are the use cases you’re trying to solve?

  2. What are the requirements for (or constraints on) a solution in order to satisfy the above-stated use cases?

  3. What mechanisms satisfy the above requirements?

What the above process does is provide a disciplined way of looking at the problem from a high level, and progressively getting more granular.

To get a picture of what this might look like, you can take a look at the high-level localization design document that we put together in preparation for our development of NDT. If you’re not in the mood for some bedtime reading, then read on below.

Use Cases

I like to think about use cases in three different ways (caveat emptor, I am not a functional safety guy):

  1. What is the component supposed to do in general? (remember SOTIF!)

  2. What are the different ways I might get inputs into the component (input or upstream use cases, as I like to term it)?

  3. What are the different ways I might use the output (output or downstream use cases)?

  4. Bonus question: What kind of whole system architectures might this component reside in?

Taking these all together, we came up with the following:

  • Most algorithms (can) use localization, but ultimately they can be divided into algorithms which operate locally or globally