Vibing with the Machine 2: Fun ways to look at code

I’ve been on a mission these past few days. I’m building a software architecture tool, and it’s taking most of my free time. I'll show you what I've got first:

You're looking at all the internal imports and function call dependencies of the source code that makes the interactive graphs you're seeing, which you can find here. 10'000 lines of TypeScript in 5 late afternoons, try doing that by hand. It also works for other codebases of course, or at least it should. The vscode plugin is in the "works for me" stage and I'm finishing the LSP integration so I can include more languages. I think it's nice. It does other things too, which I'll get to in time.

So now that you are maybe midly intrigued, let's talk about why having a graph view of the codebase might be an interesting idea. First, I think it stands on its own, you usually don't see code like this. I can see what a terrible job I did at isolating some components, and how scarce testing is, so that's why I know it works. I think it becomes even more interesting if we try to give some of this information to LLMs, packaged in a way that they can understand.

For some context, current coding agents are really good, and probably better than any human, at some aspects of software development. They are, however, still limited in other areas, which makes them less effective in large codebases. Long-term planning is one of them, but we’re not solving that today. The limitations I’m interested in are more specific: directing LLMs toward better code maintainability, and providing higher-quality context, particularly for refactoring tasks on large codebases.

You see, current SOTA coding agents mostly navigate by grepping around, or other types of search. That evidently works fine in a lot of cases, but generates a lot of noise for context, and requires reasoning for things that should be evident. It is also less effective when the code has hidden dependencies, shared state, or callbacks. Humans and LLMs share very similar flaws: we don’t know what we don’t know. This tool is an attempt to solve some of these issues. My goal is to provide a top-down view of the codebase, so LLMs can understand what dependencies exist, and what needs to change to make the code more modular and future-proof.

Imagine tasking an LLM with refactoring a large project. You want to split a big pile of spaghetti code into smaller modules with constrained responsibilities and clear inputs and outputs. It’s a hard task for everyone. The real difficulty is that the moment you change one thing, something else breaks. Incremental progress is extremely hard without a solid understanding of the codebase and thoughtful execution, especially when it’s code you didn’t write.

So we need to plan. And to plan effectively, we need information than we can reasonably process ourselves, or, you know, dump into an LLM. This is what I want: dependencies and call graphs for every folder, file, and function in the codebase, along with granular semantic summaries. This will still not solve every situation, but it might make the LLM more efficient at finding the things it needs.

This visualization, by itself, is not that useful for this purpose. Graphs' primary task is to look pretty. Still, I can see what function would probably need some refactoring, as I have an understanding of what that code does. The LLM will not know how to interpret this graph, at least not directly. If we want to do that, we'll need to extract meaning from it, and transmit it in text. Next time, we will do just that, create summaries for parts of the codebase, make functions LLMs can call with MCPs, and hopefully make refactoring a little easier.