# AudioFly

**Repository Path**: iflytekopensource/AudioFly

## Basic Information

- **Project Name**: AudioFly
- **Description**: AudioFly is an audio generation model. It synthesizes sound effects based on textual descriptions. The model can produce high-quality audio at a sampling rate of 44.1 kHz.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 3
- **Forks**: 0
- **Created**: 2025-09-28
- **Last Updated**: 2025-10-13

## Categories & Tags

**Categories**: llm

**Tags**: None

## README


# Model Card for AudioFly

## Model Introduction
AudioFly is an audio generation model. It synthesizes sound effects based on textual descriptions. The model can produce high-quality audio at a sampling rate of 44.1 kHz. The generated audio shows strong alignment with the prompt text.

AudioFly adopts the Latent Diffusion Model architecture. The model has 1 billion parameters and is trained on a large and diverse corpus. The training data include open-source datasets, such as AudioSet, AudioCaps, and TUT, as well as proprietary internal data. The model performs well in both single-event and multi-event scenarios. In these cases, the generated audio accurately reflects the described content. AudioFly achieves superior performance compared to previous audio generation models on the AudioCaps dataset.


## Evaluation Results

The experimental results are reported on the AudioCaps dataset. For the baseline models, we  reused the evaluation results from [STABLE AUDIO OPEN](https://arxiv.org/pdf/2407.14358). We followed the same evaluation methodology to ensure consistency. The evaluation results are shown as follows.


| Model               | FD ↓     | KL ↓        | CLAP ↑    | 
|--------------------|----------|------------|---------|
| AudioLDM2-48kHz    | 101.11   | 2.04       | 0.37    | 
| AudioGen-medium    | 186.53   | 1.42       | **0.45**    | 
| Stable Audio 1.0   | 103.66   | 2.89       | 0.24    | 
| Stable Audio 2.0   | 110.62   | 2.70       | 0.23    | 
| Stable Audio Open  | 78.24    | 2.14       | 0.29    | 
| AudioFly              | **40.1**    | **1.35**       | **0.45**    | 


## Usage

**Requirements**

We recommend setting up the runtime environment using the provided configuration file by running: 
```bash
pip install -r requirements.txt 
# make sure to set the PYTHONPATH to include the AudioFly project root
export PYTHONPATH=/path/to/AudioFly:$PYTHONPATH
cd /path/to/AudioFly
```

**Quickstart**


```python
import yaml
import torch
from ldm.utils.util import instantiate_from_config


configs = yaml.load(open('./config/config.yaml', "r"), Loader=yaml.FullLoader)
model = instantiate_from_config(configs["model"])

checkpoint = torch.load('./models/ldm/model.ckpt')
model.load_state_dict(checkpoint, strict=False)
model.eval()
model = model.cuda()
text = 'Fierce winds howl through the valley' 
name = 'result'
savedir = './result'
model.generate_sample(
        textlist=[text],
        name=name,
        cfg=3.5,# Guidance scale (controls how strongly generation follows the text prompt）; not recommended to change
        ddim_steps=200,  # Number of denoising steps in the diffusion process; not recommended to change
        outputdir=f"{savedir}")
```
## License
AudioFly is licensed under Apache 2.0.