Tackling Cryptic Codebases

Yam Marcovitz
Aug 31, 2022
10 min read

Say you have a task to do with an existing codebase that you just can’t figure out. You can’t understand the meaning of what needs to be done, let alone how to do it. Worse yet, you have a deadline to meet, your performance is being quantified with KPIs, and that crucial performance review is right around the corner.

Here at Apex.AI, we often deal with complex and idiosyncratic external codebases (ROS 2, intricate communication protocols, and more) as part of our job is to optimize and certify them for use in automotive software. In this post, we’ll share some tips and tricks on approaching such a codebase: a combination of mental attitudes and useful techniques. With the right approach, while you won’t necessarily become an expert overnight, you can more swiftly gain the ability to produce valuable and credible work that makes a difference. This will also form a solid foundation for building deeper expertise later on. Let’s get started!

Bridging the Conceptual Gap

Large codebases are hardly ever built just the way you’d have personally done it. Even if you knew everything about the problem domain, you’d still need to adjust to how it’s been viewed by the people who wrote the code. Of course, lacking the conceptual background will make your study a lot more difficult than it needs to be. As such, the first step we recommend is becoming acquainted with what the codebase is trying to accomplish on a conceptual level.

In our experience, the most interesting problems we face in our day-to-day work have already had someone look into them and share some information on the Internet in an orderly manner. You can often find introductory videos, specification sheets, RFCs, or books. Here are a few tips on how to make efficient use of each of these resources:

Videos. If you can find some 5-minute clips, start there. From there, jump straight into the hour-long presentations. This is a good way to absorb information, as watching a real person speak communicates more than the written word—it gives you a relatable sense of their mindset, and it begins to form a connection that normalizes the perceived gap between you and the rocket scientists who, you may imagine, wrote the code. Grab a drink, sit back, and relax, letting your mind soak up the information. When a new or interesting idea comes up in the presentation, you can pause the video and look it up to get more context. Open up some tabs for later reference. These are your first steps in forming an important conceptual web in which you can keep orienting yourself when you feel lost or lose track of some detail. By the time you’ve watched roughly 2-3 hours of video, you can begin to ask intelligent questions. This is a crucial milestone.
Specifications and RFCs. Normally very similar to legal documents, they’re perhaps the most boring kind of resource. However, their formality is also their strength. As you continue your research, you will have questions, and a spec will be able to give you answers you can generally rely on. If your problem domain has related spec sheets, save those and briefly skim their structure so that you get a sense of where to look for answers within them for what type of questions. Granted, in some cases, actual code goes beyond specs—and in some rogue cases even violates them—but if you do find a gap in a spec, that’s actually good for you, as it’ll provide you with a great springboard for getting personal attention from authoritative maintainers. Relevant tips on contacting authors will follow.

Books. If you can find some on the subject, try to get one. There’s no need at all to commit to a whole book. Normally, the introduction sections do a good job of communicating the most important details and context that make the problem seem much more approachable than you first perceived it to be. After the introduction, go over the table of contents and see if any chapter names come up that spark your curiosity. Briefly skim those to get a sense of how much detail they go into and roughly what keywords they’re dealing with. If you find something interesting, take note of it so you can refer back to it later. Entering New Lands

As we said, even when you’re well-acquainted with the subject matter, there is still the issue of dealing with different models, ways of thinking, programming styles, and dialects. These can potentially take something very familiar and make it seem strange. However, there is a number of things you can do to overcome this obstacle.

Start with documentation and examples. See if the code comes with built-in documentation and examples. If there’s documentation, explore it. Sometimes you need to run a tool, like Doxygen, to generate some HTML files, but don’t sweat that as it’s very simple. Keep in mind that documentation is sometimes not as up to date as code examples that you can compile and execute. However, even if it’s not up-to-date, the documentation can still give you a glimpse into the original, streamlined core ideas.

If the codebase comes with an examples directory, look into those as a more authoritative starting point. Examine the structure of these examples and see if you can make sense of them. Generally, coders will include the high-level work they’re most proud of in this directory, so you know you’re looking at primary code paths and not some rarely used niche feature. Take note of key classes or functions here. These will probably be your codebase’s first-class citizens, around whose requirements everything else is built.

If the codebase has automated tests, those are usually a really great entry point, as they should help you understand clearly the expected input and output of each component. In addition, you can easily debug and step through the code, as well as change it and see what happens. Taking those key components you found in the examples and debugging their tests is a great way to build up a vertical orientation.

Organize. The next step we recommend is gauging the authors’ personal sense of organization. Look at the directory tree and recall your conceptual understanding of the problem domain. Ask yourself, “Now, where would be some code that deals with that part?” See if you can demarcate some boundaries in the tree. Most coders will have given thought to this, and understanding their preferences will already give you a feel for their architecture. This will generate your basic prediction model for where to find different types of information in the codebase itself.

If the case is so dire that you really can’t find any pattern of order in the structure, you can still use the tips below, such as tracing the commit history or contacting the authors.

Study the dialect. Once you’ve got a rough sketch of where several key bits are located, it’s time to familiarize yourself with the jargon and coding style. Does the code use metaphors or idiosyncratic lingo? Try to put the same ideas into your own words. Does it contain lots of abbreviations? Start working on a personal glossary that you can refer back to. This is where specification sheets shine, as their terms and definitions sections will often help you make sense of abbreviations used in the code. However, you can also form a tentative glossary and improve it as you progress, using your findings.

Cross-reference. For forming a glossary and generally understanding entities in the code through the nature of their relationships, your number one tool would be cross-referencing these entities. You can use your IDE to find usages of them, or step through the code in a debugger to see what calls what. Try placing a few strategic breakpoints around and examining the call stack to see how you get from one point to another to get direct evidence of the causal and temporal dependencies. Use this to progressively capture your growing personal understanding of the code’s structure.

Mind the history. All code starts from seeds. Often, as programmers, we first hear of a problem and go, “Oh yeah, that’s simple.” We start by writing straightforward implementations, but over time things become more complicated and entangled. Unfortunately, due to various pressures, the core ideas can easily get obfuscated by secondary concerns. As an explorer, you want to first to understand those core ideas before you concern yourself with those other layers. We recommend tracing the development of key files, classes, or functions, by looking at their commit history. Look at what they were like on the first few commits—there you will find that the author’s main ideas shine through clearly. While you’re there, briefly examine the chain of commits from the starting point to their current version to see what interesting challenges came up as things developed.

If there’s an associated issue-control system you can consult, this would also be a great resource since issue descriptions are often directed at managers and prioritizers and, as such, may contain more conceptual context and underlying motivation than code commits.

This will add a whole new dimension to your understanding when you look at the code. It will no longer be merely flat monotonous lines but distinct scenarios and times popping out of those lines, making it easier to see through the noise.

Contacting the Authors. In reality, some questions are not answered by any resource you may find, and the code really doesn’t help make sense of things conceptually. If the issue is important to forming the required understanding, consider finding and contacting the original authors. Go over the commit history or message boards and find the contact info of people who either wrote the code or participated in its design. For example, git will have an email address associated with each change. We have had a lot of success in reaching out to original authors. Generally, developers, as creative individuals, appreciate a polite and educated inquiry into their work, especially when one has already made an effort to consult all other accessible material and still seeks clarification. Managing Fatigue

Let’s not forget a major difficulty with diving into large codebases, which is that you’re still trying to get into the zone of grasping their core ideas; by definition, you’re not in the zone, and things aren’t flowing along. It’s natural to tire more quickly and, consequently, get more easily distracted. Try to work efficiently within your limits. A good analogy here is the gears in a car: while the first gear is the most powerful and demanding—and good at getting you started—it’s not meant for high speeds. So when you’re just starting out, go slow but steady. Running out of fuel quickly may feel more like you’re actually doing something—even when you do not understand much—but it will stop you from getting the best results. As athletes say: Train smart, not hard

Take breaks. One hour of focused study of someone else’s thoughts—let alone many people’s somewhat discoordinated ones—is much more taxing than comfortably toying with your own ideas. The intensity of the effort and the inner mental resistance one has to deal with when different outlooks are essentially forced on you make for a very different ball game. Balance this with proper, healthy breaks—go for a walk, do home chores if you’re working from home, grab a drink with someone, or really anything that personally works best for you for this purpose.

Take it easy. Studying new codebases generally gets easier with experience. Because there are only so many kinds of coding styles—good or bad—sooner or later you’ll learn to recognize what style you’re looking at, and you’ll be better at knowing what to expect. Nonetheless, it remains a challenging task! Not having a grip on things, being uncertain of when or whether that’ll happen, or failing to get up to speed quickly enough—especially when we’re expected to deliver—is liable to make any of us start to question our own adequacy. See the task for what it is and try not to take it personally. Stay practical and positive. This skill will never develop if everyone quits. Be someone who doesn’t quit.

Many issues can be avoided if we find a way to keep having fun. View the task as a puzzle, a set of riddles. Gamify it. If it helps, discuss it with others to get some social momentum behind you. You’re likely to find you’re not the only one who is having a hard time with it—even among people who’ve worked with the codebase for months or years. Most importantly, when you can’t figure something out, don’t waste energy by beating yourself up; try something else. When you can’t actively think any longer, try watching some videos on related subjects or anything else that may improve your ability to relate to the material.

Divide and conquer. Verbalize a few big leading questions, then break each of them down into a progressive set of smaller ones. Estimate the difficulty you perceive for each one, and try to fit more easy ones than hard ones to tackle in one day. With the lengthier breaks you’d be taking, it’s good to be able to report progress and demonstrate a method to your madness. Having many small items to report progress on has the benefit of making both you and your manager feel much more in control of the situation.

Keeping the goal in mind

Valuability. Sure, the learning curve can be steep and demanding, but once you pass a certain threshold, you’ll possess a valuable voice in any related discussion. The depth of background you’ll have accumulated will also make it more enjoyable to debug or piece things together to solve curious problems. So don’t give up on the effort. Generally speaking, the more difficult and complex the tasks you can successfully take on, the less competition you’ll find in the business—simply because many stop short once they get comfortable—so your value will increase significantly. Paradoxically, the job will also become easier and more fun!

Hard-won experience. If we clearly direct ourselves at comprehending the code—taking care not to lose ourselves in complaining or criticizing but simply following the steps needed to form an understanding of the code and of how to change it efficiently to fit new purposes—this will instill in us clear and intuitive notions of good and bad code. We will no longer be judging code merely by a vague standard of perfection, or by the degree to which it employs the latest and greatest design patterns, but rather because its various structures either accelerated or hampered our honest efforts to understand and improve it. This nurtures an aware, no-BS approach that translates into highly practical and communicable design skills.

As an analogy, there are builders who look at a nail and think, “For this job, I would need a 5 lb hammer with a round head and a 12-inch maple shaft, or else I cannot proceed,” and then there are those who find a suitable rock nearby and pound it in dexterously. The latter builder truly understands the mechanics of building, and it is this undogmatic practicality that we deem an important part of a coder’s skillset.

Finally, all of these are, of course, only a part of our own necessarily limited experience. It’s up to you to take these ideas further or simply as inspiration to craft your unique set of tools and methods.

If you are interested in Apex.AI products for your projects or if you have thoughts about the topic of this blog post, Contact Us.

We are also growing our team worldwide. Visit Careers.

Driven by software, developed by people!
Join us. Be you. Make an impact.

Tackling Cryptic Codebases

Recent Posts