tf.contrib.learn.RunConfig.__init__()

tf.contrib.learn.RunConfig.__init__(master=None, task=None, num_ps_replicas=None, num_cores=0, log_device_placement=False, gpu_memory_fraction=1, cluster_spec=None, tf_random_seed=None, save_summary_steps=100, save_checkpoints_secs=600, keep_checkpoint_max=5, keep_checkpoint_every_n_hours=10000, job_name=None, is_chief=None, evaluation_master='')

Constructor.

If set to None, master, task, num_ps_replicas, cluster_spec, job_name, and is_chief are set based on the TF_CONFIG environment variable, if the pertinent information is present; otherwise, the defaults listed in the Args section apply.

The TF_CONFIG environment variable is a JSON object with two relevant attributes: task and cluster_spec. cluster_spec is a JSON serialized version of the Python dict described in server_lib.py. task has two attributes: type and index, where type can be any of the task types in the cluster_spec. When TF_CONFIG contains said information, the following properties are set on this class:

  • job_name is set to [task][type]
  • task is set to [task][index]
  • cluster_spec is parsed from [cluster]
  • 'master' is determined by looking up job_name and task in the cluster_spec.
  • num_ps_replicas is set by counting the number of nodes listed in the ps job of cluster_spec.
  • is_chief: true when job_name == "master" and task == 0.

Example: cluster = {'ps': ['host1:2222', 'host2:2222'], 'worker': ['host3:2222', 'host4:2222', 'host5:2222']} os.environ['TF_CONFIG'] = json.dumps({ {'cluster': cluster, 'task': {'type': 'worker', 'index': 1}}}) config = RunConfig() assert config.master == 'host4:2222' assert config.task == 1 assert config.num_ps_replicas == 2 assert config.cluster_spec == server_lib.ClusterSpec(cluster) assert config.job_name == 'worker' assert not config.is_chief

Args:
  • master: TensorFlow master. Defaults to empty string for local.
  • task: Task id of the replica running the training (default: 0).
  • num_ps_replicas: Number of parameter server tasks to use (default: 0).
  • num_cores: Number of cores to be used. If 0, the system picks an appropriate number (default: 0).
  • log_device_placement: Log the op placement to devices (default: False).
  • gpu_memory_fraction: Fraction of GPU memory used by the process on each GPU uniformly on the same machine.
  • cluster_spec: a tf.train.ClusterSpec object that describes the cluster in the case of distributed computation. If missing, reasonable assumptions are made for the addresses of jobs.
  • tf_random_seed: Random seed for TensorFlow initializers. Setting this value allows consistency between reruns.
  • save_summary_steps: Save summaries every this many steps.
  • save_checkpoints_secs: Save checkpoints every this many seconds.
  • keep_checkpoint_max: The maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, all checkpoint files are kept. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.)
  • keep_checkpoint_every_n_hours: Number of hours between each checkpoint to be saved. The default value of 10,000 hours effectively disables the feature.
  • job_name: the type of task, e.g., 'ps', 'worker', etc. The job_name must exist in the cluster_spec.jobs.
  • is_chief: whether or not this task (as identified by the other parameters) should be the chief task.
  • evaluation_master: the master on which to perform evaluation.
Raises:
  • ValueError: if num_ps_replicas and cluster_spec are set (cluster_spec may fome from the TF_CONFIG environment variable).
doc_TensorFlow
2016-10-14 13:06:52
Comments
Leave a Comment

Please login to continue.