Skip to content

update: add podcast-creator skill#4

Open
mvanhorn wants to merge 1 commit intoMiniMax-AI:mainfrom
mvanhorn:osc/feat-podcast-creator-skill
Open

update: add podcast-creator skill#4
mvanhorn wants to merge 1 commit intoMiniMax-AI:mainfrom
mvanhorn:osc/feat-podcast-creator-skill

Conversation

@mvanhorn
Copy link

Summary

Adds a podcast-creator skill that converts text scripts into podcast episodes using MiniMax TTS and Music APIs.

What it does

  • Takes a text script (plain text, Markdown, or structured JSON with chapters)
  • Generates narration via MiniMax TTS API (speech-2.8-hd) with configurable voice selection
  • Generates intro/outro music via MiniMax Music API (music-2.5+)
  • Assembles everything with ffmpeg into a final podcast mp3 with crossfading and ID3 tags

Why

The TTS and Music APIs are only used inside frontend-dev for web asset generation. This skill surfaces them for audio content creation. The existing minimax_tts.py and minimax_music.py scripts serve as the foundation. The new podcast_create.py orchestrator handles chapter splitting, voice assignment, and ffmpeg assembly.

Structure

skills/podcast-creator/
  SKILL.md                          # Skill definition with 6-step workflow
  scripts/
    podcast_create.py               # Audio assembler (crossfade + concat + ID3)
    minimax_tts.py                  # TTS script (copied from frontend-dev)
    minimax_music.py                # Music script (copied from frontend-dev)
  references/
    requirements.txt                # Python deps
    script-format.md                # Input format documentation

Follows the same pattern as gif-sticker-maker: SKILL.md with mandatory workflow steps, scripts dir with self-contained Python CLIs, references dir with supplementary docs.

Test plan

  • Verify SKILL.md frontmatter parses correctly
  • Run python3 -m py_compile scripts/podcast_create.py (passes)
  • Test with MINIMAX_API_KEY: generate narration, music, and assemble

This contribution was developed with AI assistance (Claude Code).

Adds a podcast-creator skill that converts text scripts into podcast
episodes using MiniMax TTS and Music APIs. Supports plain text,
Markdown, and structured JSON input formats. Uses ffmpeg for audio
assembly with crossfading between narration and intro/outro music.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@wu335230960
Copy link

wu335230960 commented Mar 22, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants