# AudioFly **Repository Path**: iflytekopensource/AudioFly ## Basic Information - **Project Name**: AudioFly - **Description**: AudioFly is an audio generation model. It synthesizes sound effects based on textual descriptions. The model can produce high-quality audio at a sampling rate of 44.1 kHz. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 3 - **Forks**: 0 - **Created**: 2025-09-28 - **Last Updated**: 2025-10-13 ## Categories & Tags **Categories**: llm **Tags**: None ## README # Model Card for AudioFly ## Model Introduction AudioFly is an audio generation model. It synthesizes sound effects based on textual descriptions. The model can produce high-quality audio at a sampling rate of 44.1 kHz. The generated audio shows strong alignment with the prompt text. AudioFly adopts the Latent Diffusion Model architecture. The model has 1 billion parameters and is trained on a large and diverse corpus. The training data include open-source datasets, such as AudioSet, AudioCaps, and TUT, as well as proprietary internal data. The model performs well in both single-event and multi-event scenarios. In these cases, the generated audio accurately reflects the described content. AudioFly achieves superior performance compared to previous audio generation models on the AudioCaps dataset. ## Evaluation Results The experimental results are reported on the AudioCaps dataset. For the baseline models, we reused the evaluation results from [STABLE AUDIO OPEN](https://arxiv.org/pdf/2407.14358). We followed the same evaluation methodology to ensure consistency. The evaluation results are shown as follows. | Model | FD ↓ | KL ↓ | CLAP ↑ | |--------------------|----------|------------|---------| | AudioLDM2-48kHz | 101.11 | 2.04 | 0.37 | | AudioGen-medium | 186.53 | 1.42 | **0.45** | | Stable Audio 1.0 | 103.66 | 2.89 | 0.24 | | Stable Audio 2.0 | 110.62 | 2.70 | 0.23 | | Stable Audio Open | 78.24 | 2.14 | 0.29 | | AudioFly | **40.1** | **1.35** | **0.45** | ## Usage **Requirements** We recommend setting up the runtime environment using the provided configuration file by running: ```bash pip install -r requirements.txt # make sure to set the PYTHONPATH to include the AudioFly project root export PYTHONPATH=/path/to/AudioFly:$PYTHONPATH cd /path/to/AudioFly ``` **Quickstart** ```python import yaml import torch from ldm.utils.util import instantiate_from_config configs = yaml.load(open('./config/config.yaml', "r"), Loader=yaml.FullLoader) model = instantiate_from_config(configs["model"]) checkpoint = torch.load('./models/ldm/model.ckpt') model.load_state_dict(checkpoint, strict=False) model.eval() model = model.cuda() text = 'Fierce winds howl through the valley' name = 'result' savedir = './result' model.generate_sample( textlist=[text], name=name, cfg=3.5,# Guidance scale (controls how strongly generation follows the text prompt); not recommended to change ddim_steps=200, # Number of denoising steps in the diffusion process; not recommended to change outputdir=f"{savedir}") ``` ## License AudioFly is licensed under Apache 2.0.