Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
140 views
in Technique[技术] by (71.8m points)

python - RLLib tunes PPOTrainer but not A2CTrainer

I am making a comparison between both kind of algorithms against the CartPole environment. Having the imports as:

import ray
from ray import tune
from ray.rllib import agents
ray.init() # Skip or set to ignore if already called

Running this works perfectly:

experiment = tune.run(
    agents.ppo.PPOTrainer,
    config={
        "env": "CartPole-v1",
        "num_gpus": 1,
        "num_workers": 0,
        "num_envs_per_worker": 50,
        "rollout_fragment_length": 100,
        "train_batch_size": 5000,
        "sgd_minibatch_size": 500,
        "num_sgd_iter": 10,
        "entropy_coeff": 0.01,
        "lr_schedule": [
              [0, 0.0005],
              [10000000, 0.000000000001],
        ],
        "lambda": 0.95,
        "kl_coeff": 0.5,
        "clip_param": 0.1,
        "vf_share_layers": False,
    },
    metric="episode_reward_mean",
    mode="max",
    stop={"training_iteration": 100},
    checkpoint_at_end=True,
)

But when I do the same with the A2C Agent:

experiment = tune.run(
    agents.a3c.A2CTrainer,
    config={
        "env": "CartPole-v1",
        "num_gpus": 1,
        "num_workers": 0,
        "num_envs_per_worker": 50,
        "rollout_fragment_length": 100,
        "train_batch_size": 5000,
        "sgd_minibatch_size": 500,
        "num_sgd_iter": 10,
        "entropy_coeff": 0.01,
        "lr_schedule": [
              [0, 0.0005],
              [10000000, 0.000000000001],
        ],
        "lambda": 0.95,
        "kl_coeff": 0.5,
        "clip_param": 0.1,
        "vf_share_layers": False,
    },
    metric="episode_reward_mean",
    mode="max",
    stop={"training_iteration": 100},
    checkpoint_at_end=True,
)

It returns this exception:

---------------------------------------------------------------------------
TuneError                                 Traceback (most recent call last)
<ipython-input-9-6680e67f9343> in <module>()
     23     mode="max",
     24     stop={"training_iteration": 100},
---> 25     checkpoint_at_end=True,
     26 )

/usr/local/lib/python3.6/dist-packages/ray/tune/tune.py in run(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, loggers, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, queue_trials, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, ray_auto_init, run_errored_only, global_checkpoint_period, with_server, upload_dir, sync_to_cloud, sync_to_driver, sync_on_checkpoint)
    432     if incomplete_trials:
    433         if raise_on_failed_trial:
--> 434             raise TuneError("Trials did not complete", incomplete_trials)
    435         else:
    436             logger.error("Trials did not complete: %s", incomplete_trials)

TuneError: ('Trials did not complete', [A2C_CartPole-v1_6acda_00000])

Can anybody tell me whats going on? I don't know if it has something to do with the versions of the libraries that I'm using or I've coded something wrong. Is this a common issue?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.6k users

...