MMLSpark 标签 - Gitee.com

v0.9

c1a08f1

2017-10-13 06:19

下载

v0.8.9

87c6a33

2017-08-31 02:22

下载

v0.8

New functionality:

* We are now uploading MMLSpark as a "Azure/mmlspark" spark package.
  Use `--packages Azure:mmlspark:0.8` with the Spark command-line tools.

* Add a bi-directional LSTM medical entity extractor to the
  `ModelDownloader`, and new jupyter notebook for medical entity
  extraction using NLTK, PubMed Word embeddings, and the Bi-LSTM.

* Add `ImageSetAugmenter` for easy dataset augmentation within image
  processing pipelines.

Improvements:

* Optimize the performance of `CNTKModel`.  It now broadcasts a loaded
  model to workers and shares model weights between partitions on the
  same worker.  Minibatch padding (an internal workaround of a CNTK bug)
  is now no longer used, eliminating excess computations when there is a
  mismatch between the partition size and minibatch size.

* Bugfix: CNTKModel can work with models with unnamed outputs.

Docker image improvements:

* Environment variables are now part of the docker image (in addition to
  being set in bash).

* New docker images:
  - `microsoft/mmlspark:latest`: plain image, as always,
  - `microsoft/mmlspark:gpu`: GPU variant based on an `nvidia/cuda` image.
  - `microsoft/mmlspark:plus` and `microsoft/mmlspark:plus-gpu`: these
    images contain additional packages for internal use; they will
    probably be based on an older Conda version too in future releases.

Updates:

* The Conda environment now includes NLTK.

* Updated Java and SBT versions.

b61bf51

2017-09-02 10:30

下载

v0.7.91

48d65f9

2017-08-31 02:22

下载

v0.7.9

8b3f6fe

2017-08-31 02:22

下载

v0.7.1

06dae08

2017-08-31 05:46

下载

v0.7

5ea6488

2017-08-17 15:24

下载

v0.6

New functionality:

* Similar to Spark's `StringIndexer`, we have a `ValueIndexer` that can
  be used for indexing any type of values instead of only strings.  Not
  only can it index these values, we also provide a reverse mapping via
  `IndexToValue`, similar to Spark's `IndexToString` transform.

* A new "clean missing" data estimator, example:

val cmd = new CleanMissingData()
        .setInputCols(Array("some-column"))
        .setOutputCols(Array("some-column"))
        .setCleaningMode(CleanMissingData.customOpt)
        .setCustomValue(someCustomValue)
      val cmdModel = cmd.fit(dataset)
      val result = cmdModel.transform(dataset)

* New default featurization for date and timestamp spark types and our
  internal image type.  For featurization of date columns, convert
  column to double features: year, day of week, month, day of month.
  For featurization of timestamp columns, same as date and in addition:
  hour of day, minute of hour, second of minute.  For featurization of
  image columns, use image data converted to double with width and
  height info.

* Starting the docker image without an `ACCEPT_EULA` variable setting
  would throw an error.  Instead, we now start a tiny web server that
  shows the EULA and replaces itself with the Jupyter interface when you
  click the `AGREE` button.

Breaking changes:

* Renamed `ImageTransform` to `ImageTransformer`.

Notable bug fixes and other changes:

* Improved sample notebooks, and a new one: "303 - Transfer Learning by
  DNN Featurization - Airplane or Automobile".

* Fix serialization bugs in generated python `PipelineStage`s.

Acknowledgments

Thanks to Ali Zaidi for some notebook beautifications.

bb6a495

2017-07-11 01:46

下载

v0.5

70be8dd

2017-06-02 23:57

下载

yankaics/MMLSpark .gitee-modal { width: 500px !important; }

搜索帮助

yankaics/MMLSpark