Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool
| `pptx-generator` | Generate, edit, and read PowerPoint presentations. Create from scratch with PptxGenJS (cover, TOC, content, section divider, summary slides), edit existing PPTX via XML workflows, or extract text with markitdown. | Official |
| `minimax-xlsx` | Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Covers creating new xlsx from scratch via XML templates, reading and analyzing with pandas, editing existing files with zero format loss, formula recalculation, validation, and professional financial formatting. | Official |
| `minimax-docx` | Professional DOCX document creation, editing, and formatting using OpenXML SDK (.NET). Three pipelines: create new documents from scratch, fill/edit content in existing documents, or apply template formatting with XSD validation gate-check. | Official |
| `minimax-voice` | MiniMax voice synthesis and music generation toolkit. Text-to-speech (sync/async), voice management (query/clone/design), and music generation from lyrics. | Official |

## Installation

Expand Down
1 change: 1 addition & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
| `pptx-generator` | 生成、编辑和读取 PowerPoint 演示文稿。支持用 PptxGenJS 从零创建(封面、目录、内容、分节页、总结页),通过 XML 工作流编辑现有 PPTX,或用 markitdown 提取文本。 | Official |
| `minimax-xlsx` | 打开、创建、读取、分析、编辑或验证 Excel/电子表格文件(.xlsx、.xlsm、.csv、.tsv)。支持通过 XML 模板从零创建 xlsx、使用 pandas 读取分析、零格式损失编辑现有文件、公式重算与验证、专业财务格式化。 | Official |
| `minimax-docx` | 基于 OpenXML SDK(.NET)的专业 DOCX 文档创建、编辑与排版。三条流水线:从零创建新文档、填写/编辑现有文档内容、应用模板格式并通过 XSD 验证门控检查。 | Official |
| `minimax-voice` | MiniMax 语音合成与音乐生成工具集。支持文本转语音(同步/异步)、音色管理(查询/复刻/设计)、根据歌词生成音乐。 | Official |

## 安装

Expand Down
92 changes: 92 additions & 0 deletions skills/minimax-voice/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
name: minimax-voice
description: MiniMax voice synthesis and music generation API toolkit. Supports text-to-speech (sync/async), voice management (query/clone/design), and music generation. Use this skill when users need voice synthesis, voice cloning, or music generation.
license: MIT
metadata:
version: "1.0"
category: api-integration
---

# MiniMax Voice Toolkit

Python client toolkit for MiniMax voice synthesis and music generation APIs.

## Environment Variables

**⚠️ Important: Before each use, check if the API Key environment variable is set. If not, configure it first before calling the scripts.**

```bash
export MINIMAX_API_KEY="your_api_key_here"
```

**Default Output Directory**: All generated audio files are automatically saved to `./assets/audios/` (auto-created)

## Scripts

| Script | Function | API |
|-----|------|-----|
| `scripts/text_to_audio.py` | Synchronous TTS | `/v1/t2a_v2` |
| `scripts/text_to_audio_async.py` | Asynchronous TTS | `/v1/t2a_async_v2` |
| `scripts/voice_manager.py` | Voice Management | `/v1/get_voice`, `/v1/voice_clone`, `/v1/voice_design` |
| `scripts/music_generation.py` | Music Generation | `/v1/music_generation` |

## Character Limits

| Script | Character Limit | Use Case |
|------|---------|---------|
| `text_to_audio.py` (sync) | ≤ 10,000 chars | Short text, real-time synthesis |
| `text_to_audio_async.py` (async) | 10,001 - 50,000 chars | Long text, audiobooks |

**Note**: Texts exceeding 50,000 characters need to be split into multiple requests.

## Usage Examples

```bash
# Synchronous TTS (≤ 10000 chars)
python3 scripts/text_to_audio.py -t "Hello" -v male-qn-qingse -o output.mp3

# Asynchronous TTS (10001-50000 chars)
python3 scripts/text_to_audio_async.py -t "Long text..." -v audiobook_male_1 -w -o output.mp3

# List voices
python3 scripts/voice_manager.py list

# Clone voice
python3 scripts/voice_manager.py clone --file voice.mp3 --voice-id MyVoice001

# Design voice
python3 scripts/voice_manager.py design --prompt "Warm female voice" --preview "Preview text" -o trial.mp3

# Generate music
python3 scripts/music_generation.py -l lyrics.txt -p "Pop music, upbeat" -o song.mp3
```

## Supported Models

### Text-to-Speech
- `speech-2.8-hd` - Latest HD model, supports interjection tags
- `speech-2.8-turbo` - Latest high-speed model

### Music Generation
- `music-2.5` - Latest music generation model

## Common Voice IDs

- `male-qn-qingse` - Male-Youth-Innocent
- `female-shaonv` - Female-Young
- `tianxin_xiaoling` - Female-Sweet Ling
- `audiobook_male_1` - Audiobook Male
- `Chinese (Mandarin)_News_Anchor` - News Anchor

Full list available via `voice_manager.list_voices()`.

## Error Codes

- `0` - Success
- `1000` - Unknown error
- `1001` - Timeout
- `1002` - Rate limit triggered
- `1004` - Authentication failed
- `1008` - Insufficient balance
- `2013` - Parameter error
- `2038` - No cloning permission
279 changes: 279 additions & 0 deletions skills/minimax-voice/scripts/music_generation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
#!/usr/bin/env python3
"""
MiniMax Music Generation API Client
Supports generating music from lyrics and style descriptions
API: POST /v1/music_generation
"""

import os
import json
import base64
import requests
from typing import Optional, Dict, Any
from pathlib import Path


def _get_default_output_dir() -> Path:
"""Get default audio output directory"""
return Path.cwd() / "assets" / "audios"


class MiniMaxMusicGenerator:
"""MiniMax Music Generation Client"""

BASE_URL = "https://api.minimaxi.com/v1/music_generation"

# Supported models
MODELS = ["music-2.5"]

# Supported audio formats
FORMATS = ["mp3", "wav", "pcm"]

# Supported sample rates
SAMPLE_RATES = [16000, 24000, 32000, 44100]

# Supported bitrates
BITRATES = [32000, 64000, 128000, 256000]

def __init__(self, api_key: Optional[str] = None, group_id: Optional[str] = None):
"""
Initialize music generation client

Args:
api_key: MiniMax API Key
group_id: MiniMax Group ID
"""
raw_key = api_key or os.getenv("MINIMAX_API_KEY")
self.group_id = group_id or os.getenv("MINIMAX_GROUP_ID")

if not raw_key:
raise ValueError(
"API key is required.\n"
"Please set MINIMAX_API_KEY environment variable:\n"
" export MINIMAX_API_KEY='Bearer sk-api-xxxxx'\n"
"Or pass api_key parameter to MiniMaxMusicGenerator()."
)

# Auto-add Bearer prefix if not present
self.api_key = raw_key if raw_key.startswith("Bearer ") else f"Bearer {raw_key}"

def _get_headers(self) -> Dict[str, str]:
"""Get request headers"""
headers = {
"Content-Type": "application/json",
"Authorization": self.api_key
}
if self.group_id:
headers["X-Minimax-Group-Id"] = self.group_id
return headers

def generate(
self,
lyrics: str,
prompt: Optional[str] = None,
model: str = "music-2.5",
stream: bool = False,
output_format: str = "hex",
sample_rate: int = 44100,
bitrate: int = 256000,
format: str = "mp3",
aigc_watermark: bool = False,
) -> Dict[str, Any]:
"""
Generate music

Args:
lyrics: Lyrics content, use \\n to separate lines, supports [Verse], [Chorus] structure tags
prompt: Music style description (optional for music-2.5, required for other models)
model: Model version, default music-2.5
stream: Stream transmission, default False
output_format: Output format, hex or url, default hex
sample_rate: Sample rate, default 44100
bitrate: Bitrate, default 256000
format: Audio format, default mp3
aigc_watermark: Add watermark, default False

Returns:
Dictionary containing audio data and metadata
"""
if model not in self.MODELS:
raise ValueError(f"Unsupported model: {model}. Choose from {self.MODELS}")

if len(lyrics) < 1 or len(lyrics) > 3500:
raise ValueError("Lyrics length must be between 1 and 3500 characters")

if prompt and len(prompt) > 2000:
raise ValueError("Prompt length must be <= 2000 characters")

if stream and output_format != "hex":
raise ValueError("Streaming mode only supports hex output format")

payload: Dict[str, Any] = {
"model": model,
"lyrics": lyrics,
"stream": stream,
"output_format": output_format,
"audio_setting": {
"sample_rate": sample_rate,
"bitrate": bitrate,
"format": format,
},
"aigc_watermark": aigc_watermark,
}

if prompt:
payload["prompt"] = prompt

response = requests.post(
self.BASE_URL,
headers=self._get_headers(),
json=payload
)
response.raise_for_status()

result = response.json()

if result.get("base_resp", {}).get("status_code") != 0:
raise APIError(
f"API Error: {result['base_resp']['status_msg']} "
f"(code: {result['base_resp']['status_code']})"
)

return result

def save_audio(
self,
result: Dict[str, Any],
filename: Optional[str] = None,
output_dir: Optional[str] = None
) -> str:
"""
Save generated music to file

Args:
result: API response dictionary
filename: Filename (without path), default uses music_{timestamp}.mp3
output_dir: Output directory, default ./assets/audios

Returns:
Full path of saved file
"""
if "data" not in result or "audio" not in result["data"]:
raise ValueError("Invalid result: missing audio data")

# Determine output directory
if output_dir is None:
output_dir = _get_default_output_dir()
else:
output_dir = Path(output_dir)

# Ensure directory exists
output_dir.mkdir(parents=True, exist_ok=True)

# Determine filename
if filename is None:
import time
ext = result.get("extra_info", {}).get("audio_format", "mp3")
filename = f"music_{int(time.time())}.{ext}"

output_path = output_dir / filename

audio_hex = result["data"]["audio"]
audio_bytes = bytes.fromhex(audio_hex)

with open(output_path, "wb") as f:
f.write(audio_bytes)

extra_info = result.get("extra_info", {})
print(f"Music saved to: {output_path}")
print(f" Duration: {extra_info.get('music_duration', 'N/A')} ms")
print(f" Sample rate: {extra_info.get('music_sample_rate', 'N/A')} Hz")
print(f" Size: {extra_info.get('music_size', 'N/A')} bytes")
return str(output_path)

def generate_with_structure(
self,
verses: list[str],
choruses: list[str],
prompt: str,
bridge: Optional[str] = None,
outro: Optional[str] = None,
**kwargs
) -> Dict[str, Any]:
"""
Generate music using structured lyrics

Args:
verses: List of verse lyrics
choruses: List of chorus lyrics
prompt: Music style description
bridge: Bridge lyrics (optional)
outro: Outro lyrics (optional)
**kwargs: Other generate parameters

Returns:
API response result
"""
lyrics_parts = []

# Build structured lyrics
for i, verse in enumerate(verses):
lyrics_parts.append(f"[Verse {i+1}]")
lyrics_parts.append(verse)

for i, chorus in enumerate(choruses):
lyrics_parts.append(f"[Chorus {i+1}]")
lyrics_parts.append(chorus)

if bridge:
lyrics_parts.append("[Bridge]")
lyrics_parts.append(bridge)

if outro:
lyrics_parts.append("[Outro]")
lyrics_parts.append(outro)

lyrics = "\n".join(lyrics_parts)

return self.generate(lyrics=lyrics, prompt=prompt, **kwargs)


class APIError(Exception):
"""API Error Exception"""
pass


def main():
"""Command-line usage example"""
import argparse

parser = argparse.ArgumentParser(description="MiniMax Music Generation")
parser.add_argument("--lyrics", "-l", required=True, help="Lyrics file path or text")
parser.add_argument("--prompt", "-p", help="Music style prompt")
parser.add_argument("--model", "-m", default="music-2.5", help="Model name")
parser.add_argument("--output", "-o", default="music.mp3", help="Output file")

args = parser.parse_args()

# Read lyrics
if os.path.isfile(args.lyrics):
with open(args.lyrics, "r", encoding="utf-8") as f:
lyrics = f.read()
else:
lyrics = args.lyrics

generator = MiniMaxMusicGenerator()

print("Generating music...")
result = generator.generate(
lyrics=lyrics,
prompt=args.prompt,
model=args.model
)

generator.save_audio(result, args.output)
print("Done!")


if __name__ == "__main__":
main()
Loading