tf.contrib.learn.RunConfig.__init__(master=None, task=None, num_ps_replicas=None, num_cores=0, log_device_placement=False, gpu_memory_fraction=1, cluster_spec=None, tf_random_seed=None, save_summary_steps=100, save_checkpoints_secs=600, keep_checkpoint_max=5, keep_checkpoint_every_n_hours=10000, job_name=None, is_chief=None, evaluation_master='')
Constructor.
If set to None, master, task, num_ps_replicas, cluster_spec, job_name, and is_chief are set based on the TF_CONFIG environment variable, if the pertinent information is present; otherwise, the defaults listed in the Args section apply.
The TF_CONFIG environment variable is a JSON object with two relevant attributes: task and cluster_spec. cluster_spec is a JSON serialized version of the Python dict described in server_lib.py. task has two attributes: type and index, where type can be any of the task types in the cluster_spec. When TF_CONFIG contains said information, the following properties are set on this class:
-
job_nameis set to [task][type] -
taskis set to [task][index] -
cluster_specis parsed from [cluster] - 'master' is determined by looking up
job_nameandtaskin the cluster_spec. -
num_ps_replicasis set by counting the number of nodes listed in thepsjob ofcluster_spec. -
is_chief: true whenjob_name== "master" andtask== 0.
Example: cluster = {'ps': ['host1:2222', 'host2:2222'],
'worker': ['host3:2222', 'host4:2222', 'host5:2222']}
os.environ['TF_CONFIG'] = json.dumps({
{'cluster': cluster,
'task': {'type': 'worker', 'index': 1}}})
config = RunConfig()
assert config.master == 'host4:2222'
assert config.task == 1
assert config.num_ps_replicas == 2
assert config.cluster_spec == server_lib.ClusterSpec(cluster)
assert config.job_name == 'worker'
assert not config.is_chief
Args:
-
master: TensorFlow master. Defaults to empty string for local. -
task: Task id of the replica running the training (default: 0). -
num_ps_replicas: Number of parameter server tasks to use (default: 0). -
num_cores: Number of cores to be used. If 0, the system picks an appropriate number (default: 0). -
log_device_placement: Log the op placement to devices (default: False). -
gpu_memory_fraction: Fraction of GPU memory used by the process on each GPU uniformly on the same machine. -
cluster_spec: atf.train.ClusterSpecobject that describes the cluster in the case of distributed computation. If missing, reasonable assumptions are made for the addresses of jobs. -
tf_random_seed: Random seed for TensorFlow initializers. Setting this value allows consistency between reruns. -
save_summary_steps: Save summaries every this many steps. -
save_checkpoints_secs: Save checkpoints every this many seconds. -
keep_checkpoint_max: The maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, all checkpoint files are kept. Defaults to 5 (that is, the 5 most recent checkpoint files are kept.) -
keep_checkpoint_every_n_hours: Number of hours between each checkpoint to be saved. The default value of 10,000 hours effectively disables the feature. -
job_name: the type of task, e.g., 'ps', 'worker', etc. Thejob_namemust exist in thecluster_spec.jobs. -
is_chief: whether or not this task (as identified by the other parameters) should be the chief task. -
evaluation_master: the master on which to perform evaluation.
Raises:
-
ValueError: if num_ps_replicas and cluster_spec are set (cluster_spec may fome from the TF_CONFIG environment variable).
Please login to continue.