github.com-google-model_search_-_2021-02-22_13-34-03
Item Preview
Share or Embed This Item
Flag this item for
- Publication date
- 2021-02-22
None
Model Search
Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. Itaims to help researchers speed up their exploration process for finding the rightmodel architecture for their classification problems (i.e., DNNs with different types of layers).
The library enables you to:
Run many AutoML algorithms out of the box on your data - including automatically searchingfor the right model architecture, the right ensemble of modelsand the best distilled models.
Compare many different models that are found during the search.
Create you own search space to customize the types of layers in your neural networks.
The technical description of the capabilities of this framework are found inInterSpeech paper.
While this framework can potentially be used for regression problems, the currentversion supports classification problems only. Let's start by looking at someclassic classification problems and see how the framework can automatically find competitivemodel architectures.
Getting Started
Let us start with the simplest case. You have a csv file where the features are numbersand you would like to run let AutoML find the best model architecture for you.
Below is a code snippet for doing so:
```pythonimport modelsearchfrom modelsearch import constantsfrom modelsearch import singletrainerfrom modelsearch.data import csvdata
trainer = singletrainer.SingleTrainer( data=csvdata.Provider( labelindex=0, logitsdimension=2, recorddefaults=[0, 0, 0, 0], filename="modelsearch/data/testdata/csvrandomdata.csv")), spec=constants.DEFAULT_DNN)
trainer.trymodels( numbermodels=200, trainsteps=1000, evalsteps=100, rootdir="/tmp/runexample", batchsize=32, experimentname="example", experimentowner="modelsearch_user")```
The above code will try 200 different models - all binary classification models,as the logits_dimension
is 2. The root directory will have a subdirectory of allmodels, all of which will be already evaluated.You can open the directory with tensorboard and see all the models with theevaluation metrics.
The search will be performed according to the default specification. That can be found in:model_search/configs/dnn_config.pbtxt
.
For more details about the fields and if you want to create your own specification, youcan look at: model_search/proto/phoenix_spec.proto
.
Now, what if you don't have a csv with the features? The next section showshow to run without a csv.
Non-csv data
To run with non-csv data, you will have to implement a class inherited from the abstractclass model_search.data.Provider
. This enables us to define our owninput_fn
and hence customize the feature columns and the task (i.e., the numberof classes in the classification task).
```pythonclass Provider(object, metaclass=abc.ABCMeta): """A data provider interface.
The Provider abstract class that defines three function for Estimator related training that return the following: * An input function for training and test input functions that return features and label batch tensors. It is responsible for parsing the dataset and buffering data. * The feature_columns for this dataset. * problem statement. """
def getinputfn(self, hparams, mode, batch_size: int): """Returns an input_fn
for train and evaluation.
Args: hparams: tf.HParams for the experiment. mode: Defines whether this is training or evaluation. See `estimator.ModeKeys`. batch_size: the batch size for training and eval.Returns: Returns an `input_fn` for train or evaluation."""
def getservinginput_fn(self, hparams): """Returns an input_fn
for serving in an exported SavedModel.
Args: hparams: tf.HParams for the experiment.Returns: Returns an `input_fn` that takes no arguments and returns a `ServingInputReceiver`."""
@abc.abstractmethod def numberofclasses(self) -> int: """Returns the number of classes. Logits dim for regression."""
def getfeaturecolumns( self ) -> List[Union[featurecolumn.FeatureColumn, featurecolumnv2.FeatureColumn]]: """Returns a List
of feature columns."""```
An example of an implementation can be found in model_search/data/csv_data.py
.
Once you have this class, you can pass it tomodel_search.single_trainer.SingleTrainer
and your single trainer can nowread your data.
Adding your models and architectures to a search space
You can use our platform to test your own existing models.
Our system searches over what we call blocks
. We have created an abstract APIfor an object that resembles a layer in a DNN. All that needs to be implemented for this class istwo functions:
```pythonclass Block(object, metaclass=abc.ABCMeta): """Block api for creating a new block."""
@abc.abstractmethod def build(self, inputtensors, istraining, lengths=None): """Builds a block for phoenix.
Args: input_tensors: A list of input tensors. is_training: Whether we are training. Used for regularization. lengths: The lengths of the input sequences in the batch.Returns: output_tensors: A list of the output tensors."""
@abc.abstractproperty def isinputorder_important(self): """Is the order of the entries in the input tensor important.
Returns: A bool specifying if the order of the entries in the input is important. Examples where the order is important: Input for a cnn layer. (e.g., pixels an image). Examples when the order is not important: Input for a dense layer."""
```
Once you have implemented your own blocks (i.e., layers), you need to register them with a decorator. Example:
```python@registerblock( lookupname='AVERAGEPOOL2X2', initargs={'kernelsize': 2}, enumid=8)@registerblock( lookupname='AVERAGEPOOL4X4', initargs={'kernelsize': 4}, enumid=9)class AveragePoolBlock(Block): """Average Pooling layer."""
def init(self, kernelsize=2): self.kernelsize = kernelsize
def build(self, inputtensors, istraining, lengths=None):```
(All code above can be found in model_search/blocks.py
).Once registered, you can tell the system to search over these blocks bysupplying them in blocks_to_use
in PhoenixSpec
inmodel_search/proto/phoenix_spec.proto
. Namely, if you look at the default specificationfor dnn
found in model_search/configs/dnn_config.pbtxt
, you can change therepeated field blocks_to_use
and add you own registered blocks.
Note: Our system stacks blocks one on top of each other to create towerarchitectures that are then going to be ensembled. You can set the minimal andmaximal depth allowed in the config to 1 which will change the system to searchover which block perform best for the problem - I.e., your blocks can be nowan implementation of full classifiers and the system will choose the best one.
Creating a training stand alone binary without writing a main
Now, let's assume you have the data class, but you don't want to write a main
function to run it.
We created a simple way to create a main
that will just train a dataset and isconfigurable via flags.
To create it, you need to follow two steps:
You need to register your data provider.
You need to call a help function to create a build rule.
Example:Suppose you have a provider, then you need to register it via a decorator wedefine it as follows:
```python@data.registerprovider(lookupname='csvdataprovider', init_args={})class Provider(data.Provider): """A csv data provider."""
def init(self):```
The above code can be found in model_search/data/csv_data_for_binary.py
.
Next, once you have such library (data provider defined in a .py file andregistered), you can supply this library to a help build function an it willcreate a binary rule as follows:
buildmodel_search_oss_binary( name = "csv_data_binary", dataset_dep = ":csv_data_for_binary",)
You can also add a test automatically to test integration of your provider withthe system as follows:
buildmodel_search_oss_test( name = "csv_data_for_binary_test", dataset_dep = ":csv_data_for_binary", problem_type = "dnn", extra_args = [ "--filename=$${TEST_SRCDIR}/model_search/data/testdata/csv_random_data.csv", ], test_data = [ "//model_search/data/testdata:csv_random_data", ],)
The above function will create a runable binary. The snippets are taken from thefollowing file: model_search/data/BUILD
.The binary is configurable by the flags in model_search/oss_trainer_lib.py
.
Distributed Runs
Our system can run a distributed search - I.e., run many search trainer inparallel.
How does it work?
You need to run your binary on multiple machines. Additionally, you need tomake one change to configure the bookkeeping of the search.
On a single machine, the bookkeeping is done via a file. For a distributedsystem however, we need a database.
In order to point our system to the database, you need to set the flags in thefile:
model_search/metadata/ml_metadata_db.py
to point to your database.
Once you have done so, the binaries created from the previous section willconnect to this database and an async search will begin.
Cloud AutoML
Want to try higher performance AutoML without writing code? Try:https://cloud.google.com/automl-tables
To restore the repository download the bundle
wget https://archive.org/download/github.com-google-model_search_-_2021-02-22_13-34-03/google-model_search_-_2021-02-22_13-34-03.bundle
and run: git clone google-model_search_-_2021-02-22_13-34-03.bundle
Source: https://github.com/google/model_search
Uploader: google
Upload date: 2021-02-22
- Addeddate
- 2021-02-23 01:38:15
- Identifier
- github.com-google-model_search_-_2021-02-22_13-34-03
- Originalurl
-
https://github.com/google/model_search
- Pushed_date
- 2021-02-22 13:34:03
- Scanner
- Internet Archive Python library 1.9.9
- Uploaded_with
- iagitup - v1.6.2
- Year
- 2021
comment
Reviews
Subject: Creative Common atribution standard internasional v4.0
05/10/2021 Creative Common standard internasional Lisensi CC by-SA 4.0