Build Notes & Feedback — Tanner's Project

01

File format conversion & how scripts read data

Tanner

My sanitize skill (in Claude Cowork) outputs ElevenLabs Studio format. The interview-audio project expects timestamp format. I manually converted between them.

I don't fully understand why these two formats exist, whether the parser could be modified to accept Studio format directly, or how continuation lines get attached to previous segments.

I got it working but couldn't explain to someone else why the format matters.

Jake

The bigger problem isn't really the format — it's that you're working across three different environments and manually moving files between them. That's the part that should go away.

We need to crack open your sanitize skill: if it's hardcoded to a specific output folder, we pull that out — either let it save where you're working, or have it ask.

Better yet: collapse the whole thing into one workflow. Hand Claude Code a raw transcript and have it sanitize, convert format, and generate audio in one step. No file shuffling.

The actual format answer: there's really only one format this project knows — [hh:mm:ss] Speaker: text. "Studio format" comes from a different ElevenLabs product. They exist because they serve different tools.

The parser uses speaker name + text body. Continuation lines fold into the previous segment. The timestamp itself is ignored — output uses fixed gap rules (250ms / 500ms), not your numbers. So timestamp accuracy doesn't matter.

02

Model version & voice settings

Tanner

The project uses eleven_v4 by default. I saw references to eleven_v3 and eleven_v4_hq.

What's the actual difference? The docs say voice settings are "mostly silently ignored" for v4 — does that mean stability/style do nothing? Should I have used the simple string format for voice config?

Should I have experimented with different models for different clips?

Jake

Quick rundown:

v3 — our expressive model. Good emotional range, less natural in long form.
v4 — newest model (not publicly released). Same architecture as v3, retrained to sound more natural. This is why your output sounded so good.
v4_hq — same as v4 but higher quality output. Bigger files. Worth it for finished deliverables, not for iteration.

The samples page runs the same interview through all the models — listen to it. That's the best way to internalize the difference.

Voice settings being silently ignored on v4 — yeah, mostly. You can simplify voices.config.json to the short string format. No harm leaving them in either.

And to your meta-question: yes, you should have experimented. Always experiment. That's the answer to a lot of these.

String vs object voice config: think of it like ordering coffee. Short form ("Tanner": "voice-id") is "I'll have a coffee" — default everything. Long form ({ voiceId, voiceSettings }) is "medium roast, oat milk, two sugars."

On v4 those settings mostly don't do anything — the model decides. On v3 they matter. For v4, short form is cleaner. And really, you shouldn't be hand-editing this file anyway — tell Claude what voices and model you want, and it picks the right form.

03

Hard-coded speaker names

Tanner

I hard-coded "Tanner" and "Jake" as speaker names during conversion. This felt brittle.

Every transcript I convert needs those exact names. If I wanted different names, I'd need to re-convert the transcript or create a new voice config.

Is there a better pattern that doesn't require manual find-and-replace every time?

Jake

You shouldn't be doing any of this find-and-replace by hand. That's exactly the kind of thing Claude should swap on the fly.

This belongs in the workflow from #1. When we build (or rebuild) the skill, it can:

Ask upfront: "What names should the speakers be?"
Auto-detect from the transcript and confirm with you
Read names from a config — your choice

If the script has speaker names hardcoded anywhere, that's a bug we just fix. Manual conversion is never the answer when Claude can do it deterministically.

04

Voice ID selection & variability

Tanner

I picked two voice IDs that felt right based on browsing. But is there a "science" to picking voices that work well together, or is it purely subjective?

Should I have tested multiple voice combinations and compared? How much would different voices change how people absorb the training content?

I made a decision, but it was mostly a guess.

Jake

There isn't really a science. It's mostly:

Some voices are optimized for expressiveness, some aren't — depends on how the voice was created
Some voices pair better with older models, some with newer
The voice library has descriptions worth reading ("works better than X for Y")

Trial and error is the move. Your two voices worked — the contrast between monotone interviewer and enthusiastic interviewee was a smart choice. If you want to level up, A/B a couple of alternates next time and pick the one that lands better.

05

Creating custom UIs & sharing them

Tanner

I asked Claude Code to build the showcase page. I have no idea whether the HTML follows best practices or if it's just "good enough."

How do I properly share this page? Should it be at /docs for GitHub Pages? Will the audio file paths work when deployed?

I basically accepted the output blindly because I don't know enough about web dev to evaluate it.

Jake

The page Claude built you is fine — it got the job done. But "good enough" and "polished" are different bars, and next session I'll show you how to make polished pages pretty easily.

Important reframe: GitHub isn't really a hosting platform. It's a great place to store code and an okay intermediary, but for actually hosting and showing off, we use Vercel. That's the production move.

GitHub Pages works for a basic landing page like our index.html, but it's not where polished projects live. Next session we'll cover Vercel and the accessibility/performance side of things.

06

Where files actually live — the big one

Tanner

I'm still struggling with the fundamental mental model of where information sits across three contexts: Claude Cowork, my local machine, Cursor/Claude Code.

When I copy a file between folders, where does it "live"? When Claude generates an MP3, where does it actually write? How does GitHub Pages serve files if they're gitignored?

I got everything working, but I don't have a clear mental model of file system, local vs remote, what's in git, and how the pieces connect.

Jake

This is the biggest gap, and it's not a "you" problem — it's that your computer doesn't have a workflow or organization system in place yet. Files land wherever they land. There's no clear pattern.

I want to make this the centerpiece of next session:

Consolidate what's on your machine into a sane structure
Set habits for spinning up new projects so this doesn't sprawl again
Deep-dive on infrastructure — local vs GitHub vs hosting
Build the mental model — what's in git, what isn't, what gets deployed

The fact that you can articulate the confusion this clearly tells me you're already 80% there — you just need the framework hung on it.

While we're here, one concrete answer: the out/ folder is gitignored generally, but .gitignore has explicit exceptions for your three training clips so Pages can serve them. New clips would need the same exception (Claude handles that — just say "add this clip to the gitignore exception list"). Everything else stays local.

07

Model selection & input format requirements

Tanner

I don't remember Claude Code asking me which model to use. It defaulted to eleven_v4 and I went with it.

My sanitize skill outputs three files: annotated-reference, elevenlabs-upload, synopsis. I used the upload one but don't know if that was right, or if different models need different inputs.

Jake

The reason it didn't ask is that I hardcoded v4 into the script. Not a Claude thing — a me thing. You can always ask Claude to swap it, or change it yourself.

Important detail buried in your notes: the 1,500 character limit you hit is a dialogue-mode constraint. Dialogue mode sends everything in one API call. Segment mode generates each speaker's lines separately and stitches them — no character limit, sounds better in testing.

You were in dialogue mode by accident — that's why you had to manually split that 2,268-char block. Segment mode would've absorbed it.

On the three files: using -elevenlabs-upload.txt was correct. The full picture needs us to look at the skill together. Parking-lotted.

Pulling on the framing note above: you don't need to memorize that v4 is hardcoded, or the dialogue character limit, or which sanitize file maps to what. Know what you want — clean two-speaker audio, this long, this voice — and tell Claude. If you hit a limit, Claude swaps modes. If a default is wrong, Claude changes it. Your job is knowing what good looks like. Claude's job is wiring up the path.

08

Speaker name consistency across interviews

Tanner

I used "Tanner" and "Jake" as names for all three clips, but they're from three different candidates. All three sound like the same "Jake" voice in the audio.

Should I have used different voice IDs per candidate? Generic names like "Interviewer"/"Candidate"? Separate voice configs per interview?

It works for now, but I can imagine this creating confusion if the training library grows.

Jake

You're spotting a real future problem. Three candidates sounding identical is fine for three clips, less fine when there are thirty.

Yes — use different voice IDs per candidate. My suggestion:

Pick 2–3 good male voices and 2–3 good female voices you like
Rotate them across candidates so each interview has a distinct sound
Voice differentiation matters more than name differentiation for audio listeners

Practically, Claude will create a different voice config per interview (or one config that maps candidate → voice). That's the fun part — when scale changes, the structure changes with it. You can do basically anything you can imagine here.

Notes & feedback

Interview training clips

File format conversion & how scripts read data

Model version & voice settings

Hard-coded speaker names

Voice ID selection & variability

Creating custom UIs & sharing them

Where files actually live — the big one

Model selection & input format requirements

Speaker name consistency across interviews

Still open · one I haven't fully answered

Parking lot

For Tanner to prep

For Jake to prep

Session agenda — draft