Jerry rigging session history and campaign management with AI

Just thought I'd share what I'm doing to run my sessions with the help of ChatGPT and Codex.

Firstly, I create session history documents semi-automatically from audio recordings.

Caveats:
This is on a Linux system with a paid ChatGPT subscription though apparently Codex-CLI can be used with free ChatGPT, just with tighter limits.
It uses an OpenAI model called Whisper running locally on my GPU.
It's with a party of only 3 players, so that probably helps with accuracy.

First I record the session audio with ffmpeg - this is record.sh:

Code:

#!/usr/bin/env bash # To list devices: pactl list short sources # with typical filters, zoom mic and bluetooth headphones ffmpeg \ -loglevel error -hide_banner \ -f pulse -i alsa_input.usb-ZOOM_Q2n-4K_Web_Cam_000C24043875-02.analog-stereo \ -f pulse -i bluez_sink.64_A2_D2_F5_C5_22.a2dp_sink.monitor \ -map 0:a -ac 1 -ar 16000 -af "highpass=f=80,lowpass=f=8000" -c:a flac mic.flac \ -map 1:a -ac 1 -ar 16000 -af "lowpass=f=8000" -c:a flac system.flac

This produces a mic.flac for me (the DM) and a system.flac for the rest of the party.
`pavucontrol` can be used to monitor these sources during recording.

Then after session I run whisper to convert the audio into text - extract.sh:

Code:

#!/usr/bin/env bash base="${1%.*}" whisper "$1" --model medium --device cuda --fp16 True --language en --no_speech_threshold 1.0 --logprob_threshold -1.0 --condition_on_previous_text False rm -f "$base.json" "$base.srt" "$base.tsv"

Whisper was installed like so:

Code:

pip install openai-whisper sudo apt install ffmpeg

It will pull in PyTorch and a ton of NVidia dependencies - YMMV depending on your GPU (I have an AMD which is a bit more hassle).

After the session I run this extract.sh to dump text from the two audio files (and delete some extra text files I don't need):

Code:

./extract.sh system.flac ./extract.sh mic.flac

After this is done (it can take a while to convert), I run combine_vtt.py to combine the vtt files:

Code:

#!/usr/bin/env python3 import re def parse_vtt(filename, speaker): entries = [] with open(filename) as f: lines = f.readlines() i = 0 while i < len(lines): if "-->" in lines[i]: times = lines[i].strip() text = lines[i+1].strip() start = times.split(" --> ")[0] seconds = to_seconds(start) entries.append({ "time": seconds, "times": times, "text": f"{speaker}: {text}" }) i += 2 else: i += 1 return entries def to_seconds(t): parts = t.split(":") if len(parts) == 3: h, m, s = parts elif len(parts) == 2: h = 0 m, s = parts else: raise ValueError(f"Unexpected time format: {t}") s, ms = s.split(".") return int(h)*3600 + int(m)*60 + int(s) + int(ms)/1000 mic = parse_vtt("mic.vtt", "YOU") sys = parse_vtt("system.vtt", "THEM") merged = sorted(mic + sys, key=lambda x: x["time"]) with open("merged.vtt", "w") as f: f.write("WEBVTT\n\n") for i, e in enumerate(merged, 1): f.write(f"{i}\n{e['times']}\n{e['text']}\n\n")

This merges the two vtt files and tags speakers like so (merged.vtt):

Code:

72 04:45.560 --> 04:57.080 YOU: I'm going to share an image with you guys, there you go. 73 05:01.380 --> 05:13.820 THEM: that's cute 74 05:06.040 --> 05:11.960

Then I run codex - this is a CLI for ChatGPT that can read and edit local files. I use this extensively to maintain my campaign text files.
My campaign has a /whisper directory where all the recordings are. I tell Codex something like this:

Code:

Session 9 is complete! Please read session_history_8.md to understand context, then read whisper/merged.vtt and write session_history_9.md. In the vtt, YOU is me (the DM) and THEM is the party. You'll have to guess who is speaking - Onka the Barbarian, Stabitha the Rogue or Eustace the Wizard.

Codex then writes the next session history file. Typically it guesses who is speaking very well, based on context clues, and at most I have to tweak a few paragraphs for where the recording was bad or the AI misunderstood what was going on. Also, since it read the prior session history it can maintain a consistent style. In the beginning I had to work with it a bit to produce a narrative style I liked.

Here's an example of the output:

Code:

The road northwest from Greenest was pleasant: frost still crisp in the grass, birds already rejoicing, sky painted in a slow brilliant dawn. Onka had positioned herself at the front of the column and stayed there, occasionally slipping to the back only to slap both the other horses hard on the flanks and surge ahead again. She was winning a race nobody else had entered. Stabitha asked if she knew where she was going. Onka said yes. This did not mean she knew. But she was winning. It was perhaps fifteen minutes before Stabitha saw the figure in the bushes.

And:

Code:

The Marrow Mines looked different. The old chalked arch they'd squeezed through in what felt like another lifetime was bricked up. In its place: a ladder firmly attached to the cliff face, a small landing, and a new wooden door. The door had three knockers at different heights and a plaque that read: *Knock for big. Small pull. Low and fly high.* Eustace examined it. The door had some weak magic in the knockers — nothing approaching the intensity of the egg. The top knocker was out of Onka's reach. The middle sat at roughly her height. The bottom was at a taller person's knee. They deliberated. Onka stopped deliberating. She put her right foot in the bottom rung, grabbed the middle one with her left hand, and pulled the top one. All three, simultaneously. She was very small for this operation and it required total commitment. The doors split open like a book.

That's it. This took a bit to set up but now after sessions I can just run a few scripts and the ongoing story of our campaign is written. I dump this into a Google Docs file for everyone and they can read it between sessions. Since we only play every 2-3 weeks, this is a great reminder for all of us, and Codex can also refer to it when I ask questions about what happened. I can say "read session history and update scenes.md - remove any scenes that are done or no longer possible, update the rest with any new facts" and Codex will do this in seconds.

I also use Codex or Claude Code for most edits to my other campaign files. I use Obsidian to view and edit these markdown files manually - I have my campaign split into prep documents (ch8_prep.md, ch9_prep.md) and items.md, party.md, people_places.md, scenes.md etc.

[see next comment for more]

[continued from above...]

If I need a new NPC I'll tell Codex something like "add a magic shop to greenest in people_places.md"
Since it already has that file in the format I've created, it improvises accordingly, and adds something like this:

Code:

## Greenest — The Tangle (Curios & Recovered Goods) *A narrow shopfront wedged between two larger buildings, identifiable by the chandelier of mismatched lanterns hanging outside and the handwritten sign that just says THE TANGLE. Inside: floor-to-ceiling shelves, everything labelled in a handwriting too small to read without leaning in, the smell of cedar and old leather, and Fitch.* | Name | Location | Knows | Hook | Acting Tip | Appearance | Notes | |------|----------|-------|------|------------|------------|-------| | Fitch - Proprietor of The Tangle | The Tangle | What things are worth, where they came from, who might want them; has a buyer for almost anything | Will buy the party's loot, cult regalia, and anything unusual — no paperwork, fair prices, absolutely no questions | Picks up every item and examines it through a series of flip-down lenses before saying anything. Gives a number. Does not negotiate. If you don't like the number, put it back. Has a comment about everything but keeps them mostly to himself — they leak out as small noises.<br>**Motivations:** greed, discovery, legacy · **Pitfalls:** higher authority, power | Ancient gnome of unclear gender, no more than three feet tall, enormous brass spectacles with four hinged lens-rings. Wears a long apron with too many pockets. | Survived the raid by hiding under the counter. Has already restocked. Cult regalia is "complicated — I know someone but it takes time." Will pay a premium for anything magical or with a story he hasn't heard. |

It produces things I don't like of course, and I do lots of rewriting and tweaking but often just producing the skeleton in the right format and inventing names saves me time. The above is edited quite a bit and updated multiple times. The "has already restocked" comes from me telling Codex "read session history x, update the stock list and restock The Tangle with a chain shirt for Onka in people_places.md. And it just does this based on session history and context.

Here are the results:

Code:

### Stock (after Ch8 — sold/purchased) *Sold to party in Ch7/Ch8: +1 greataxe (Onka, 150gp charmed), +1 shortsword (Stabitha, 150gp), Gloves of Thievery (Stabitha, 125gp), Cloak of Protection (Eustace, 300gp), potion of healing (Stabitha, 80gp). Scroll of Misty Step stolen by Eustace in Ch7.* *Bought from party in Ch8: berserker handaxe (1000gp — Rare, cursed; he has a buyer), 2 scimitars (40gp total), sardonyx (40gp).* ### Remaining Stock | Item | For | Price | Notes | | --------------------------------- | ------- | ----- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Pearl of Power | Eustace | 350gp | Regain one spell slot (up to 3rd) per short rest | | Chain shirt *(Calenthar's Rings)* | Onka | 90gp | Sized for a small humanoid; the links are finer than standard chain — closer in appearance to fish scales than mail. Whoever made this put an extraordinary amount of work into it. Non-magical: AC 13 + DEX mod (max +2), so AC 14 for Onka — one better than her Unarmored Defense. Fitch acquired it in a Beregost estate lot and cannot identify the maker's mark: a hammer with a bifurcated handle he's never seen before. The name *Calenthar* is scratched on the interior hem in what might be Elvish script. He doesn't know if that's the maker or the last owner, and will say so. |

Then I go in and tweak. These tables are a bit hard to read here but come out beautifully formatted in Obsidian.

To prep for the next session I ask Codex to read session history and the last prep document and create a new one. Then I go in and rip out anything I don't like and tweak what I do.

This way I run campaigns using text files I can quickly read at the table, audio from our video call (we use Jitsi) and Fantasy Grounds to maintain character sheets and run combats, share maps, handle items, etc.

So far it's working pretty well!

I did not understand 90% of what you said, but the 10% I did understand sounds awesome!

This sounds like what I would like to do with my Chat GPT subscription. Unfortunately, I never got past basic analogue sound recording and running mains and stage sound at a church in the early naughts.

Quote:

Originally Posted by claedawg

I did not understand 90% of what you said, but the 10% I did understand sounds awesome!

This sounds like what I would like to do with my Chat GPT subscription. Unfortunately, I never got past basic analogue sound recording and running mains and stage sound at a church in the early naughts.

Yeah I appreciate it's a bit of hackery to set up. I imagine there are easier ways to record multiple streams on something like Windows - OBS Studio perhaps. I cleaned up the post above - previously the formatting was terrible.

I think the biggest bang for your buck is just Codex-CLI, editing a bunch of text files all at once. This should be easy for anyone to achieve and makes routine search/replace/update operations very convenient.

How polluted is the recording by ooc banter? Because the biggest issue I see here is not the difficulty of text-to-speech conversion, which is consolidated in modern industry standards, but the amount of non-relevant speech in a normal session. Does the AI manage to filter it out?

I would guess that would be at least mostly dependent on the Players using the right selections (Ctrl Alt, etc) as those each generate a different HTML code that an AI (or my Word Macro) can pull and use.

Quote:

Originally Posted by RosenMcStern

How polluted is the recording by ooc banter? Because the biggest issue I see here is not the difficulty of text-to-speech conversion, which is consolidated in modern industry standards, but the amount of non-relevant speech in a normal session. Does the AI manage to filter it out?

Yes that could be a problem, but we have only 3 people in the party and our sessions are short, typically around ~2.5 hours. We have practically zero ooc banter when the session is running, it's all at the beginning or the end. During breaks people usually step away from their mics so there's nothing recorded.