Why you never heard of codemods and why that should change

Almost no one in my engineering circles uses codemods. Few of them have heard of them, and even fewer use them. Codemods are cool, so why is that?

If you’re of the small number of people that know about codemods, feel free to skip the next section and get to what I think is the good stuff.

History of Codemods

An engineer at Facebook named Justin Rosenstein used the term codemod to describe his Python utility that did simple find-and-replace across code. His inspiration was simple:

Part of why most code -- and most software -- sucks so much is that making sweeping changes is hard.

...

Let's say that a month ago you wrote a function that you -- or your entire company -- have been using frequently. And now you decide to change its name, or change the order of its parameters, or split it up into two separate functions and then have half the call sites use the old one and half the call sites use the new one, or change its return type from a scalar to a structure with additional information. IDEs and standard \nix tools like sed can help, but you typically have to make a trade-off between introducing errors and introducing tedium. The result, all too often, is that we decide (often unconsciously) that the sweeping change just isn't worth it, and leave the undesirable pattern untouched for future versions of ourselves and others to grumble about, while the pattern grows more and more endemic to the code base.*

This is true: if a developer handed us a change that improved the code, we’d probably take it. But, no – I’m not going to do it.

I’m not sure what happened, but the codemod project seemed to have died, and jscodeshift eventually became the de facto toolkit for JavaScript codemods and reactjs/react-codemod became the most popular (primary?) “user” of that library, publishing little codemods that help you migrate React code to new versions.

Codemods Now

Fast forward a few years, and nobody seems to be using codemods to do anything more interesting. (It's worth noting I’m not an investigative reporter and I didn't try very hard.) I shared Justin’s enthusiasm for codemods, so why don’t I see people using them solve any more interesting or difficult problems we all face? Simple upgrades, sure – but what about performance, quality, and security? How come no one is turning their requirements into code that enforces requirements?

Here’s the rub: it turns out that codemods are just not expressive enough, neither in how you identify "shapes” of code to change nor in what changes they can effect. They're also kind of hard to write with the libraries out there.

But, there are lots of great tools for performing super expressive static analysis, and there are lots of great tools for mutating code – and it seems everyone writing these technologies was trying to re-create both.

Me? I just want to make it stupidly easy to put those 2 technologies together.

The Future of Codemods

Meet codemodder. I believe it is the future of codemods.

The chief "innovation" of codemodder is that a codemod library should not have much complicated logic. At its core, codemodder really just provides orchestration magic.

Why re-solve the problems that great “pattern finding” OSS tools like PMD, Sonar, and even very fancy vulnerability-identifying tools like Checkmarx, Semgrep and Contrast, have already done? So many hundreds of man years have already been invested there.

Why re-solve the problems that great “code mutation” tools like recast, LibCST, and JavaParser have already solved? To me, it seems obvious that the right answer is to have these tools just talk to each other.

An Example: JUnit Tests Shouldn’t Be Public or Final

Most of the codemods I write for my day job at Pixee are for security. You can check some of them out here if you like. But, we all have opinions about other things, so let’s make a code quality example!

Often time I see some JUnit test code that looks like this:

public class WidgetTest {

  @Test
  public void doTest() { 
     assertThat(thisThing()).isEqualTo(thatThing);
  }

JUnit, beginning with version 5, recommends that test classes be declared package-private, as opposed to public:

Test classes, test methods, and lifecycle methods are not required to be public, but they must not be private.

It is generally recommended to omit the public modifier for test classes, test methods, and lifecycle methods unless there is a technical reason for doing so – for example, when a test class is extended by a test class in another package. Another technical reason for making classes and methods public is to simplify testing on the module path when using the Java Module System.

It makes sense. There’s no need to pollute the “visible classpath” with these types when we never use them as components in other places. So, let’s make a codemod that makes public tests package-protected.

Step #1: Pick a tool and find the code you want to change

We’re big fans of Semgrep, so let’s write some Semgrep to find JUnit test types that aren’t final. Here’s the code (you can also play with it on the Semgrep Playground):

rules:
  - id: find-public-junit-class-modifiers
    patterns:
      pattern: class $CLASS { ... }
      pattern-not: final class $CLASS { ... }
      pattern-not: abstract class $CLASS { ... }
      metavariable-regex:
        metavariable: $CLASS
        regex: .*Test
      pattern-inside: |
        ...
        import org.junit.jupiter.api.Test;
        ...

This rule is pretty simple, as far as Semgrep goes. But, even capturing this level of expressiveness with previous codemod or codemod-adjacent tools would be quite difficult.

Step #2: Write the change you want to perform on that code

Ok, so now we can find these class definitions that we need to change. That’s half the battle. Now, let’s write some JavaParser to add the final modifier:

ClassOrInterfaceDeclaration typeDefinition = ...;
typeDefinition.getModifiers().add(Modifier.finalModifier());

Pretty easy! Now we just need to connect these two pieces of code.

Step #3: Putting it all together

We have a really great way of finding things (Semgrep) combined with a really great way of fixing things (JavaParser). Let’s see how we can use codemodder to stitch these things together:

/** A codemod that makes JUnit 5 tests package private. */
@Codemod(
    id = "codemodder:java/make-junit5-tests-package-private")
public final class MakeJUnit5TestsPackagePrivateCodemod
    extends SarifPluginJavaParserChanger<ClassOrInterfaceDeclaration> {

  private static final String DETECTION_RULE =
      """
            rules:
              - id: find-public-junit-class-modifiers
                patterns:
                  - pattern: public class $CLASS { ... }
                  - metavariable-regex:
                      metavariable: $CLASS
                      regex: .*Test
                  - pattern-inside: |
                      ...
                      import org.junit.jupiter.api.Test;
                      ...
            """;

  @Inject
  public MakeJUnit5TestsPackagePrivateCodemod(
      @SemgrepScan(yaml = DETECTION_RULE) final RuleSarif sarif) {
    super(sarif, ClassOrInterfaceDeclaration.class, CodemodReporterStrategy.empty());
  }

  @Override
  public boolean onResultFound(
      final CodemodInvocationContext context,
      final CompilationUnit cu,
      final ClassOrInterfaceDeclaration typeDefinition,
      final Result result) {
    typeDefinition.getModifiers()
      .removeIf(modifier -> modifier.equals(Modifier.publicModifier()));
    return true;
  }
}

As you can see, most of the code is the expression of what you want to change, and how you want to change it. Here’s how you’d run it:

$ ./gradlew assemble # or mvn package
$ build/distributions/my-acme-codemod /my_code/my_project

Boom! All your tests in my_project are now final!

“But wait, couldn’t this break something?” It could, because Semgrep doesn't have enough type hierarchy knowledge, and relatively weak interfile analysis to make a 100% dependable query, but this doesn’t bother me for a few reasons.

The cost of improving the accuracy to eek out that last 1-2% of accuracy is extremely high in terms of introduced complexity and degraded user experience, and it will never be 100% anyway.
It probably won’t break anything (like, 99% of the time). If it did, the errors would be immediate, loud and obvious – in other words, trivially easy to fix. A small price to pay in order to rapidly create big changes.

Step #4: Operationalizing

Of course, you ran it once – how do we make sure it’s continuously applied? With what the project offers today, you could create a collection of codemods and run them before you check in code. To scale to the rest of your team, you could add a GitHub Action that runs the codemods on check-in, or maybe on the creation of a pull request.

As you may imagine, we have big dreams for operationalizing your codemods in at my day job. Stay tuned!

Codemods Will Change Software Development

Codemods are super fun. They feel like problem-solving at scale. Why write another PR comment? Just write a codemod!

If you want to just get the benefits of some of the codemods I'm writing today to harden your code and fix your bugs, add @pixeebot to your repo today.