1 Star 0 Fork 0

黄国华/mifs

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
BSD-3-Clause

MIFS

Parallelized Mutual Information based Feature Selection module.

Related blog post here

Dependencies

  • scipy(>=0.17.0)
  • numpy(>=1.10.4)
  • scikit-learn(>=0.17.1)
  • bottleneck(>=1.1.0)

How to use

Download, import and do as you would with any other scikit-learn method:

  • fit(X, y)
  • transform(X)
  • fit_transform(X, y)

Description

MIFS stands for Mutual Information based Feature Selection. This class contains routines for selecting features using both continuous and discrete y variables. Three selection algorithms are implemented: JMI, JMIM and MRMR.

This implementation tries to mimic the scikit-learn interface, so use fit, transform or fit_transform, to run the feature selection.

See examples/example.py for well examples and usage.

Docs

Parameters

method : string, default = 'JMI':

> Which mutual information based feature selection method to use:
> * 'JMI' : Joint Mutual Information [1]
> * 'JMIM' : Joint Mutual Information Maximisation [2]
> * 'MRMR' : Max-Relevance Min-Redundancy [3]

k : int, default = 5:

> Sets the number of samples to use for the kernel density estimation with the kNN method. Kraskov et al. recommend a small integer between 3 and 10.

n_features : int or string, default = 'auto':

> If int, it sets the number of features that has to be selected from X. If 'auto' this is determined automatically based on the amount of mutual information the previously selected features share with y.

categorical : Boolean, default = True:

> If True, y is assumed to be a categorical class label. If False, y is treated as a continuous. Consequently this parameter determines the method of estimation of the MI between the predictors in X and y.

verbose : int, default=0:

> Controls verbosity of output:
> * 0: no output
> * 1: displays selected features
> * 2: displays selected features and mutual information

Attributes

n_features : int:

> The number of selected features.

support : array of length [number of features in X]:

> The mask array of selected features.

ranking : array of shape [n_features]:

> The feature ranking of the selected features, with the first being the first feature selected with largest marginal MI with y, followed by the others with decreasing MI.

mi : array of shape n_features:

> The JMIM of the selected features. Usually this a monotone decreasing array of numbers converging to 0. One can use this to estimate the number of features to select. In fact this is what n_features='auto' tries to do heuristically.

Examples

The following example illustrates the use of the package:

import pandas as pd
import mifs

# load X and y
X = pd.read_csv('my_X_table.csv', index_col=0).values
y = pd.read_csv('my_y_vector.csv', index_col=0).values

# define MI_FS feature selection method
feat_selector = mifs.MutualInformationFeatureSelector()

# find all relevant features
feat_selector.fit(X, y)

# check selected features
feat_selector._support_mask

# check ranking of features
feat_selector.ranking_

# call transform() on X to filter it down to selected features
X_filtered = feat_selector.transform(X)

References

[1] H. Yang and J. Moody, "Data Visualization and Feature Selection: New Algorithms for Nongaussian Data" NIPS 1999
[2] Bennasar M., Hicks Y., Setchi R., "Feature selection using Joint Mutual Information Maximisation" Expert Systems with Applications, Vol. 42, Issue 22, Dec 2015
[3] H. Peng, Fulmi Long, C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy" Pattern Analysis & Machine Intelligence 2005
Copyright (c) 2016, Daniel Homola All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of mifs nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

简介

Parallelized Mutual Information based Feature Selection module. 展开 收起
Python 等 2 种语言
BSD-3-Clause
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/huang_guohua_admin/mifs.git
git@gitee.com:huang_guohua_admin/mifs.git
huang_guohua_admin
mifs
mifs
master

搜索帮助