# MHCfovea
**Repository Path**: W-seventeen/MHCfovea
## Basic Information
- **Project Name**: MHCfovea
- **Description**: No description available
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-07-28
- **Last Updated**: 2021-07-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# MHCfovea
MHCfovea integrates a supervised prediction module and an unsupervised summarization module to connect important residues to binding motifs.
## Overview
First, the MHCfovea's predictor was trained on 150 observed alleles; 42 important positions were highlighted from MHC-I sequence (182 a.a.) using ScoreCAM. Next, we made predictions on 150 observed and 12,858 unobserved alleles against a peptide dataset (number: 254,742), and extracted positive predictions (score > 0.9) to generate the binding motif of an allele. Finally, after clustering the N- and C-terminal sub-motifs, we built hyper-motifs and the corresponding allele signatures based on 42 important positions to reveal the relation between binding motifs and MHC-I sequences.
The resultant pairs of hyper-motifs and allele signatures can be easily queried through a web interface (https://mhcfovea.ailabs.tw)

## Application
MHCfovea takes MHC-I alleles (all alleles in the IPD-IMGT/HLA database (version 3.41.0) are available) and peptide sequences as inputs to predict the binding probability. For each queried allele, MHCfovea provides the cluster information and allele information of N- and C-terminal clusters respectively.
- cluster information
- hyper-motif: the pattern of binding peptides in a specific cluster
- allele signature: the pattern of MHC-I alleles in a specific cluster
- allele information
- sub-motif: the binding sub-motif of the queried allele
- highlighted allele signature: the consensus residues of the allele signature and the queried allele
If you find MHCfovea useful in your research please cite:
Lee, K.-H., Chang, Y.-C., Chen, T.-F., Juan, H.-F., Tsai, H.-K., Chen, C.-Y.*
Connecting MHC-I-binding motifs with HLA alleles via deep learning. bioRxiv 2021.04.18.440359 (2021) doi:10.1101/2021.04.18.440359.
## Installation
1. Python3 is required
2. Download/Clone MHCfovea
```
git clone https://github.com/kohanlee1995/MHCfovea.git
cd MHCfovea
```
3. Install reqiured package
```
pip3 install -r requirements.txt
```
## Usage
```
usage: predictor [-h] [--alleles ALLELES] [--get_metrics] input output_dir
MHCfovea, an MHCI-peptide binding predictor. In this prediction process, GPU is recommended.
Having two modes:
1. specific mode: each peptide has its corresponding MHC-I allele in the input file; column "mhc" or "allele" is required
2. general mode: all peptides are predicted with all alleles in the "alleles" argument
Input file:
only .csv file is acceptable
column "sequence" or "peptide" is required as peptide sequences
column "mhc" or "allele" is optional as MHC-I alleles
Output directory contains:
1. prediction.csv: with new column "score" for specific mode or [allele] for general mode
2. interpretation: a directory contains interpretation figures of each allele
3. metrics.json: all and allele-specific metrics (AUC, AUC0.1, AP, PPV); column "bind" as benchmark is required
positional arguments:
input The input file
output_dir The output directory
optional arguments:
-h, --help show this help message and exit
--alleles ALLELES alleles for general mode
--get_metrics calculate the metrics between prediction and benchmark
```
## Example
```
python3 mhcfovea/predictor.py example/input.csv example/output
```
#### input file
| sequence | mhc |
|---|---|
| PVPTYGLSV | B*07:02 |
| APGARNTAAVL | B*07:02 |
| SPAPPTCHEL | B*07:02 |
| PGLAVKELK | B*07:02 |
| GPMVAGGLL | B*07:02 |
#### output file
| sequence | mhc | score | %rank |
|---|---|---|---|
| PVPTYGLSV | B*07:02 | 0.606 | 0.616 |
| APGARNTAAVL | B*07:02 | 0.987 | 0.015 |
| SPAPPTCHEL | B*07:02 | 0.997 | 0.004 |
| PGLAVKELK | B*07:02 | 0.569 | 0.692 |
| GPMVAGGLL | B*07:02 | 0.966 | 0.024 |
#### interpretation figure

## Development
The folder of development contains all source codes for the development of MHCfovea. The following is the description of these files.
- build_dataset.py: for building training, validation, and benchmark dataset
- util.py: utility functions for data analysis
- trainer.py: for the training process
- model.py: the model architecture
- BA.py: utility functions for training process
- predictor.py: for the prediction process
- cam.py: functions for CAM algorithm
- cam_run.py: for the CAM process
- run_pan_allele.py: for the prediction on all HLA alleles
- CAMInterp.py: utility functions for the interpretation of ScoreCAM results
- MHCInterp.py: utility functions for the summarization
[Tutorial](development/README.md)