Skip to content

Latest commit

 

History

History

README.md

Component gallery

In this directory, you will find a wide array of components that can be used in Azure Machine Learning, contributed by Microsoft and open source community.

Components

A component is self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, model training, model scoring and so on. A component is analogous to a function, in that it has a name, parameters, expects certain input and returns some value. Data scientists or developers can wrap their arbitrary code as Azure Machine Learning component by following the component specification. Find the tutorials to get started.

Following are some available components in the gallery.

Scenario Description
Simple Algorithm for Recommendation (SAR)* An example of how to train, score and evaluate an SAR recommender using the Azure Machine Learning component.
This scenario contains the following components:
Stratified Splitter: split dataset into training dataset and test dataset.
SAR Training: Train a simple algorithm recommender.
SAR Scoring: using test dataset to score the trained recommender.
MAP: Mean Average Precision at K metric.
nDCG: Normalized Discounted Cumulative Gain (nDCG) at K metric.
Precision at K: Precision at K metric.
Recall at K: Recall at K metric.
Spectral Residual Anomaly Detection Anomaly detection aims to discover unexpected events or rare items in data. It is designed to be accurate, efficient and general, using Spectral Residual (SR) and Convolutional Neural Network (CNN).
Text classification using CNN An example of how to train, and score a CNN sentiment classifier using combination of Designer built-in modules and components.
This scenario contains the following components:
textCNN Train Model
textCNN Score Model
TextCNN Word to Id
Dimensionality Reduction This component is based on scikit-learn. You can use Dimensionality Reduction to reduce reduce the dimensionality of your data, and Score Dimensionality Reduction to apply the trained transformation on your scoring dataset.
Compute Correlation Compute correlation matrix of pairwise columns in dataset using kendall, spearman, pearson methods.
Image classification using AML labeling data This example use AML labeling dataset as training dataset of image classification. The pipeline contains one custom component Convert Labeling Data to Image Directory and several Designer built-in modules.
Natural Language Processing There are following sample components in NLP scenario:
Detect languages: Detect languages on text columns in a dataset.
Semantic Textual Similarity
Sequence Embedding: Model sequence data by extracting short/long term sequence features and generate emebedding in finite-dimensional space.
Score Sequence Embeddings: Apply sequence embeddings to score data using transformation state from training dataset.

Create new component in your workspace

In the Azure Machine Learning studio portal, you can create new component in your workspace and use it in the designer.

  1. Go to Modules asset page.
  2. Click on New Module and select From Yaml file.
  3. Input the component spec URL and Click Next
  4. Follow the guidance to finish your creation. And You could find your new components under Custom Module in the module list of Designer.

Help & Support

This project uses GitHub Issues to track bugs and feature requests.

Please search the existing issues before filing new issues to avoid duplicates.
For new issues, file your bug or feature request as a new Issue.

Following information are useful for debugging:

  • Pipeline run URL
  • Pipeline graph
  • Detailed error message
  • 70_driver_log of failed component

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Reference papers

  • Ren, Hansheng et al. “Time-Series Anomaly Detection Service at Microsoft.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019): n. pag. Crossref. Web.