Writing & Thinking with AI Assistance
Video 5 in a series on Claude Code
Part 5 of a series on AI coding tools for empirical research, accompanying my Markus Academy video series.
The previous posts in this series have been practical: setting up Claude Code, making figures, scraping EDGAR, and working with large datasets. We are going to pivot slightly in this video in two ways:
We’ll focus less on data analysis, and more on writing and analysis
We’ll do almost everything in Claude’s desktop app (e.g. Cowork and Chat)
Writing with LLMs is a contentious, complicated topic. I’m going to focus on it from the perspective of a researcher—both in terms of generating research, communicating research, and converting analysis into written output. There’s a wide spectrum of ways that we use writing. But before getting into the practical stuff, I want to address two preliminary points.
Writing as thinking
Writing itself often serves as the process through which we think and digest ideas. As writers, the act of putting words on a page is how we work through our thinking. A reasonable concern is that when we use LLMs to help with writing, we engage in what’s called cognitive offloading—we’re essentially offloading the thinking itself. If you have a couple of vague ideas and hand them off to an LLM, it’s very easy to let the model do the intellectual work for you. I think that concern is well-founded.
It’s useful to work backwards from that recognition. We want to have thought through our ideas thoroughly, because eventually people will ask us about them, and we’ll need to discuss them. So how do we minimize the extent to which you accidentally—whether subconsciously or consciously—offload that cognitive work. I don’t have a perfect answer, but I’ll flag a few strategies for this as we go.1
Homogenization of voice
LLMs are what we’d call in sports a “floor raiser.” They can significantly improve relatively poor writing. That said, there’s a legitimate concern that LLMs produce a kind of homogenized default voice. It’s uninteresting and it’s potentially cringeworthy when you see it.2
There’s a view that this homogenization could diminish the quality of overall written output. That’s partly a social problem — do we really need forces to create more uniformity — but more personally, having a unique voice is important. Nay, it’s part of being a human academic! You as an academic should be aware of your voice and try to preserve it.
In the context of AI, we’ll talk about writing guides and style guides, but to be honest, I think it’s quite challenging to perfectly preserve your writing voice through these tools. It’s just not the same. Some aspects may come close, but we’re all humans who vary from day to day. The output won’t consistently reflect how you’d want to write a particular thing. The corpus we use to train and calibrate these tools is only as good as the information we provide, and we change all the time. What we want our writing to look like today may be different from what it was before.
Banal but necessary writing
While writing is an important vehicle for thinking, there’s also a large amount of entirely banal and painful writing that we do for our jobs. Think about boilerplate emails, memo summaries, and routine documentation. Think about even the initial write-ups of regression coefficient summaries. Some of it is complicated and needs to be done carefully, but the first pass can be done very quickly, and it helps you get off the ground.
My view is LLMs are quite good at this kind of work, and this type of time-saving is immensely valuable. The real trick is knowing what bright line we put between the banal and the important. The more you can offload the banal, the more time you have to think about the important. But if you offload too much of the important, then you end up with a problem.
What goes into the LLM
As we discussed in the first post, an LLM is fundamentally a text machine. It reads and predicts serialized text in a context window.
Why does this matter for writing? What goes into the LLM is text. The LLM is an enormously powerful text prediction machine operating in a very high-dimensional space, and we’re pivoting within that space as we change the context window. Changing the prompt shifts the model’s focus a bit, preparing it to reason in a particular way—perhaps more reflective of how you might approach something. But that doesn’t mean it necessarily captures the right things.
Building a style guide
This is a pretty straightforward exercise. What we’re going to do is build a style guide—really a skill—based on our own writing. What it’s going to produce is basically a list of rules of what it thinks about how to tweak the LLM’s behavior. It’s not necessarily going to capture your voice perfectly. It’s almost like—what happens when you take a complicated, multifaceted person and try to project them on the wall? It’s going to capture some aspect of it. The question is whether it does it correctly.
Here’s how to build one:
Collect samples of writing you’re happy with. Papers, memos, emails—anything where you think “yes, this sounds like me.” These don’t have to be perfectly representative—I used my own papers, even though they’re all co-authored—but a broader sample is better.
Ask the LLM to analyze patterns. Give it your writing samples and ask it to identify patterns: sentence length, tone, word choice, structure, things you do consistently, things you never do.
Curate the output into a
writing_style.mdfile. The LLM’s analysis will be a starting point, not a finished product. You’ll need to edit it—remove things that are wrong, add things it missed, sharpen the rules.Reference it in prompts. When you want Claude to write or edit in your style, point it to the style guide.
One thing to keep in mind: Claude actually knows a lot about itself. If you’re interested in writing skills for repeatable tasks, you can ask Claude how to do it—say, “I want to be able to do X, Y, or Z many, many times”—and it will know more or less how to set it up. You can even ask it without doing it, and it will tell you what you need.
What building one actually looks like
In the video, I demonstrated this live using Claude Cowork. If you remember from the first video, there are three different things in the Claude app: Chat, Cowork, and Code. Cowork is the application version of doing Claude Code on the command line—and for writing tasks, I think it works great as a first pass. I don’t want people to feel like everything has to always be done at the command line to take advantage of these tools.
I pointed Cowork at my personal website repository—which has both my academic papers and blog posts—and gave it a deliberately imperfect prompt:
> I would like you to take the papers in the papers folder, and the
blog posts on my blog, and create a writing style guide for Claude
as a skill. So that I can get Claude to write in my voice when I
want to make it sound more like me.As I told Markus, this is a pretty bad prompt—not exactly articulating what I want. Opus spawned a sub-agent to read through the blog posts and papers (part of the reason it only reads a few is that it doesn’t want to blow through its context window), then created a skill file at .claude/skills/paul-voice/skill.md.3
This style guide is not magical. If anything, it’s kind of reductive and silly? It’s just a markdown file—a text file with hashes for sections and lists. What’s going to happen when you use this skill is this text just gets dumped into the context. It’s basically like having someone try to synthesize how you write.
What did it produce?
This is a pretty positive (dare I say sycophantic) read on my style. Every person, even the world’s worst researcher, would get a relatively positive take. It’s like how they say feedback from a supervisor should always have two good things around the bad thing—there’s no bad thing that gets stuck in there. You have to push Claude pretty hard to say negative things.
But is it wise to push for negative feedback? It depends. If you have writing tics or things that you think are important reflections of your style, it matters. There’s a great comment by Nabokov writing to the New Yorker about editing:
You could think of that as a bad writing style, but it reflects what he wanted for his writing. Sometimes quirks in writing are reflective of what we want our voice to sound like.
Markus also asked: can you do this in languages other than English—German, French? Yes, absolutely. Claude is quite multilingual. I often write in French, and the output is quite good. I don’t know if it’s true for every language but certainly it would do reasonably well in many languages.
Comparing output with and without a style guide
To show the difference, I asked Claude to explain the difference-in-differences methodology to a PhD-level audience twice—once with no style guide, once with my writing skill applied.
The unstyled version was quite good—what’s amazing about Opus is how well it can write these things up. The styled version is somewhat different. It started with the practical problem and set up the two-by-two example before getting into any regression analysis. This does sound closer to how I would do it. That said, I’d say it’s a little bit of a caricature of how I write, which I think is often how people feel about these things. They make a skill that looks like them, and it’s recognizable but a bit exaggerated.
The point I want to emphasize: when people talk about skills and style guides, it’s really important to know that it’s just a laundry list of rules. That’s all it is—text that gets dumped into the context window.
Markus then suggested we try asking Claude to explain diff-in-diff in the style of Dostoevsky.4
Over the top, a complete caricature. But it illustrates the same thing happening with your personal style guide, just more obviously.
What a more developed style guide looks like
The demo version was built from just a few papers. My actual style guide—the one I use day to day—is much more detailed, because I had Claude Code iterate over many more papers and do much more work. It has more specific writing style patterns: use subordinate clauses, I do a lot of em-dashes, translate coefficients into dollar terms when discussing economic significance. It has things like decomposition of policy and economic significance—it’s just a much bigger laundry list than what we built in the demo.
I’ll link my full style guide so people can see what a more developed version looks like. It also has examples from my papers of how I might open different kinds of sections. It’s the same idea as the demo—just more detailed. It’s not obvious that it’s perfect, but it is useful for me to have.
A few things I’ve learned about style guides in practice:
They’re more “list of rules” than “voice capture.” Think of the style guide as a set of constraints that push the LLM’s output in a better direction, not as a perfect representation of your voice. The output will be better with a style guide than without one, but it won’t sound exactly like you.
You have to iterate. Your first style guide will miss things. There are ways you can hone this and have it do more. As you use it and notice problems in the output, go back and add rules. It’s a living document.
You might want multiple guides. I have one for writing papers and a different one for social media and blog posts. They capture different registers of how I write. The social media one is, I’ll admit, painfully sycophantic to read back.
Referee reports and the strategic revision skill
This section covers two related things: having Claude write referee reports (on your own work, for self-assessment), and using a skill to process referee reports you receive and plan your revision.
Writing referee reports
I am not encouraging you to inherently use Claude to write referee reports for journals. I use this skill to write a referee report on my own or if I have a paper I want to quickly give feedback to someone when I am not being a formal referee. In cases where I might not have ever given any ideas, I can more quickly focus in on things I would typically worry about. So having Claude summarize and assess the paper is helpful as a first pass.
To build a referee report skill, I pointed Claude at a folder of my own past referee reports:
> Please read through all the referee reports in this folder that
I wrote. I want you to construct a skill to generate reports that
look like these reports — focus on the types of issues and concerns
I flag.This is the same pattern as the style guide: give it samples of output you’re happy with, ask it to extract the patterns, and save the result as a reusable skill.
Markus asked how this compares to Refine.ink, the tool Ben Golub built. The distinction is useful: Refine.ink is doing much more to make sure things are internally consistent—that the paper is making claims consistent with the results and that the results are true. It doesn’t try to establish as much taste or preference about what a paper should look like. A referee report, by contrast, is some combination of identifying truth versus taste.
The strategic-revision skill
There’s a skill called strategic revision by Jukka Sihvonen at Aalto University that I really like. What it does: you’ve submitted a paper, you’ve gotten a set of referee reports back and an editor’s letter. Given that, how should you create a revision plan?
What I’ve liked about this is that it:
creates a set of tasks that you need to do based off the referee report
Identifies the order, conflicts, and priorities across theses tasks
I find this very helpful because even though you read the referee report, it can often be a overwhelming, especially if you have four reports and you need to decide what to do. It organizes everything into a master document.
Specifically, the skill says: “Create a rigorous dependency map revision master plan from peer review reports with computational DAG validation using NetworkX.” Here’s what that means:
Parses every referee comment into a discrete task. Each individual point gets extracted and numbered.
Categorizes each task. Some are argumentative—just writing text. Some are empirical robustness things you need to do. Then clarification points and editorial decisions.
Builds a dependency graph. It uses a DAG—a Directed Acyclic Graph—to figure out which tasks depend on which. A DAG is a way of saying that X affects Y, that Y is dependent on X. Because you make it as a directed graph, you can use network theory to check for cycles—cycles would mean you’ve got the wrong dependence structure. It actually has to use Python to run a validator, and if the skill’s package isn’t available, Claude writes the validator itself.
Organizes tasks into execution blocks. Block A can be done in parallel with Block B, but Block C depends on both finishing first. It produces a visual diagram of these blocks.
Identifies conflicts between referees. If you have multiple referees—and Markus, presumably you’ve had this—there will be referees who disagree about what you should do. The skill identifies when those conflicts are present and flags them: “if you do X, then referee Y might be annoyed.”
Identifies collateral risks. What’s relying on what, and what are the knock-on effects of each change.
To demo this, I took one of my already-published papers (so I wouldn’t stress about it being public), had Claude write a referee report on it, and then fed both the paper and the generated report into the strategic revision skill.
The output was a 26-task revision plan organized into five execution blocks. It identified the critical path, the key bottlenecks, and which tasks could be parallelized. It even proposed how co-authors should divide the work. The visual diagram showing task dependencies organized into batches is what I find most useful. You can basically copy these over and say, “all right, now we need to do these tasks,” and structure them into issues.
Now, what I’ll tell you is that you still have to have a lot of discretion in how you do these things. Remember, we talked about being concrete and specific as really helpful for LLMs. Having a really specified set of tasks—the same way as working with an RA—makes it a lot easier to then approach the problem and work with LLMs on subsequent steps as well.
Markus asked whether this works as well for theoretical papers as empirical ones. I’ve actually used it for both, and I thought it was good for both. In some ways it was actually better for theoretical work—you know which proofs are relying on what, so the dependency structure is cleaner. Empirical work has so much taste involved that you kind of have to revisit more.
This obviously is not the end-all be-all. There are ways you could improve on this skill—I’ve only used it on two papers so far. But what I like most about it is that it’s concrete in trying to make the tasks actionable, categorize them, and see what’s dependent on what.
You can install this skill in Claude through the app. Go to manage skills, add a skill, and upload the markdown file. Once it’s installed, it’s available in both Chat and Cowork.
Using the LLM as an editor, not a rewriter
While AI may have certain unusual writing tics, it is a good editor. It doesn’t have to rewrite your writing, but the LLM can give you explanations for why it thinks something is poor reasoning or poor writing. This prevents cognitive off-loading while still using the very strong reasoning capacity that comes from LLMs.5
Here’s an example. I took one of my blog posts—we used a blog post just to make it smaller and less complicated, but this would work just as well with a research paper—and gave it this prompt:
> Please take a look at this blog post. I would like to get edits in
the style of a New York Times editor for writing and clarity.
However, do not directly edit my writing. Instead, please create
inline comments around places where my argument is poor or unclear
and where I can improve the writing.What it produced was the original text, completely unchanged, with markdown HTML comments inserted at specific points. They wouldn’t show up in the rendered version—they haven’t edited anything. But instead you get markup, just like an editor would do.
I find this helpful because if you’re trying to preserve your own voice, this is the kind of editorial style you’d want. If somebody marks up and tells you exactly how to write things—if you’re a graduate student, sometimes you’re very happy with it—but it’s harder to preserve your own voice in the writing. With comments only, you’re forced to engage with the feedback actively. You read each comment and decide what to do with it—rewrite the sentence yourself, cut it, or disagree and leave it.
Markus suggested you could also ask it to propose alternatives alongside the comments. That works too—sometimes it will do this unprompted. But the danger is always that you just accept the rewrite. Sometimes what I’ll do is copy that specific piece and say, “all right, what should I do here?” We don’t always have infinite amounts of energy to think about our writing, so having options is fine. But the comments-only approach keeps you more engaged.
One practical note: this needs to be done in Cowork or Claude Code, not Chat, because it needs to read and edit files on your computer. You could upload the document to Chat and have it do the same thing, but it works much better in Cowork because it uses your local filesystem and any skills you have installed. What Chat does is create a computing environment up there to do the same task, and the more distance you create between your files and the LLM, the more problems you’ll have.
Thinking through ideas with LLMs
It’s very easy if you have a few vague ideas to hand them off to an LLM and try to let the model do the intellectual work for you. What I want to encourage is similar to how we talked about the context window in the first post: you want to do intentional compaction and note-taking to iterate on ideas.
Iterate your ideas before talking to the LLM. Better for you to really iterate out the ideas as much as you can before you talk to the LLM, because the specificity is going to be enormously valuable. If I say something like “hey, I’d really like to think about a life cycle model that thinks about savings”—you’re not going to get particularly groundbreaking ideas there. The more specific you can be, the better.
Load more context. One reason it’s very valuable to use Claude Code or Cowork for this—rather than a simple chat window—is that you can add more files and more context into the LLM. You can say: “Look, I’m in this folder. Here’s a bunch of papers that are related to the thing I’m interested in. Here’s some notes that we did. Here’s some slide decks. Here’s some websites that are relevant to this policy. Here’s what we’re thinking about. Work from there.” The more information you can give, the better.
Consider the “margin notes” approach here too. Instead of asking “what should I write?” ask the LLM to comment on what you’ve already written.
This connects back to the writing-as-thinking point from the start. The goal isn’t to avoid using LLMs for intellectual work—it’s to stay engaged in the intellectual work while using them. Always remember this image, famous on the internet, from an IBM training manual from 1977:
Key takeaways
Writing is thinking. Be deliberate about what you offload versus what you do yourself. The test: can you discuss the ideas in depth without the LLM?
Style guides are useful and interesting, but aren’t going to be perfect. You should curate them, iterate, and do more—but they’re never going to be perfect. In the end, a skill is a summary that is a caricature in some ways of what you are, but is useful. It’s a sounding board that reflects you—it’s like a puppet.
Banal writing is the easiest win. Boilerplate, first-pass summaries, routine documentation—this is really helpful personally. The biggest value is less about doing it for you, but giving you a very good first pass.
The strategic revision skill is quite lovely. Taking a pile of referee reports and producing a structured, dependency-aware revision plan with prioritized execution blocks—there are other skills too that people should explore, but this one makes referee comments actionable rather than overwhelming.
Ask for comments, not rewrites. I’ve found this to be a useful way to prevent myself from writing too much that sounds like an LLM, in cases where I really want to maintain my voice. It’s an editing approach that keeps you engaged.
Think first, then write with AI. Iterate your ideas as much as you can before talking to the LLM. The specificity is enormously valuable. Load as much context as you can.
What’s next
This is a very contentious topic, but I do think there’s a lot of value here, especially when you want to get stuff moving and get stuff off the ground. In the next post, we’ll do a lot more customization—we’ll talk about skills and containers and how to do stuff in what’s called YOLO mode, which is letting Claude run without any confirmation from us.
Generally, a good rule of thumb is if you have AI write for you when you are tired or in a rush, the output will end up being pretty undifferentiated and likely quite bad.
E.g. consider the many line breaks in many LinkedIn posts, which seem to play the same role as commas?
When I do this, I force Opus to read through a lot more papers and iterate many times to get a more detailed style guide. The one in the demo is just a starting point.
And of course I spelled Dostoevsky wrong in the prompt, which doesn’t matter because strong LLMs can figure out what you meant.
In my mind, I put air quotes around “reasoning” here, because it’s not really reasoning in the human sense, but literally the thinking tokens are effectively meant to proxy for this type of human reasoning.










> The real trick is knowing what bright line we put between the banal and the important.
Yes, this is the most important sentence of this entire post.
Very useful post. This feedback-but-never-rewrite approach is a game changer for serious writing. You instantly get the kind of perspective that only used to come with days away from the writing.