[RLlib] Attention Net prep PR #3. #12450

sven1977 · 2020-11-26T15:27:36Z

The current attention net trajectory view PR (#11729) is too large (>1000 lines added).
Therefore, I'm moving smaller preparatory and cleanup changes in ~2 pre-PRs. This is the third one of these. Only review it once this 2nd one here (#12449) has been merged.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ntion_nets_prep_3

…ntion_nets_prep_2

…ntion_nets_prep_3 � Conflicts: � rllib/policy/view_requirement.py

…ntion_nets_prep_3

ericl

@sven1977 main question is the input dict option. Maybe I'm missing something, but it doesn't seem like a necessary change. Correct me if this is wrong.

ericl · 2020-12-01T21:14:26Z

rllib/agents/ppo/ppo_tf_policy.py

+    # (abs_pos=-1). It's only used if the trajectory is not finished yet and we
+    # have to rely on the last value function output as a reward estimation.
+    return {
+        "_value_input_dict": ViewRequirement(is_input_dict=True, abs_pos=-1)


Wondering if there are better names for this

# Can specify either abs_index or shift abs_index=-1 shift=x

I think the notion of "index" is clearer in Python (-1 index means end). Also, we get to keep shift.

Fair enough, I'll rename these two:
abs_pos -> index
data_rel_pos -> shift

On the input_dict:
It's not necessary for this very PR, but as this is a preparatory PR (to make the attention PR smaller) I decided to already add this here. The attention net PR needs this feature to be able to not have to boiler-plate/hardcode the attention logic inside e.g. PPO's postprocessing fn (this function should not have to worry about the model being an RNN or attention net, it should not need to know).

ericl · 2020-12-01T21:18:54Z

src/ray/raylet/node_manager.cc

+            object_manager_.FreeObjects(object_ids,
+                                        /*local_only=*/false);
+          },
+          on_objects_spilled),


? revert change

ericl · 2020-12-01T21:20:40Z

rllib/policy/view_requirement.py

            used_for_training (bool): Whether the data will be used for
                training. If False, the column will not be copied into the
                final train batch.
+            is_input_dict (bool): Whether the "view" of this requirement is an


This seems odd, what is the reason we need it? Are there any cleaner alternatives?

This option will be necessary for attention nets. We shouldn't have attention net or RNN-specific code in the postprocessing fn (e.g. of PPO). Instead, it's like saying: "I need an input_dict here, provide one given the model's requirements".

ericl · 2020-12-01T21:28:55Z

rllib/agents/ppo/ppo_tf_policy.py

-                return self.model.value_function()[0]
+            # Input dict is provided to us automatically via the policy-defined
+            # "view". It's a single-timestep (last one in trajectory)
+            # input_dict.


What is the reason for this change? Could we get the previous code to work without adding this if branch?

see above: We want the input dict to be determined by what the model needs as inputs.

sven1977 · 2020-12-04T16:58:32Z

rllib/agents/ppo/appo_tf_policy.py

-            use_critic=policy.config["use_critic"])
-    else:
-        batch = sample_batch
+        sample_batch = postprocess_ppo_gae(policy, sample_batch,


Re-use PPO's function.
We should probably do the same for A3C and PG.

sven1977 · 2020-12-04T16:59:34Z

rllib/agents/ppo/ppo_tf_policy.py

-                               sample_batch[SampleBatch.ACTIONS][-1],
-                               sample_batch[SampleBatch.REWARDS][-1],
-                               *next_state)
+        # Input dict is provided to us automatically via the Model's


Ask Model for the input dict (from the given SampleBatc) at index=-1.

in prep for attention nets which have special requirements for this (different from RNNs, different from non-recursive models).

removes boilerplate input-dict creating code.

Is the comment inaccurate now?

No, this is still valid.

sven1977 · 2020-12-04T17:00:45Z

rllib/agents/qmix/model.py

        self.fc2 = nn.Linear(self.rnn_hidden_dim, num_outputs)
        self.n_agents = model_config["n_agents"]

-        self.inference_view_requirements.update({


make sure this stays backward-compatible even w/o specifying this here.

sven1977 · 2020-12-04T17:01:22Z

rllib/evaluation/collectors/simple_list_collector.py

                })
        self.buffers[SampleBatch.OBS].append(init_obs)
-        self.buffers[SampleBatch.EPS_ID].append(episode_id)
+        self.episode_id = episode_id


Don't have to "collect" these. They are always the same for the same agent anyways.

sven1977 · 2020-12-04T17:01:49Z

rllib/evaluation/collectors/simple_list_collector.py


-    def build(self, view_requirements: Dict[str, ViewRequirement]) -> \
-            SampleBatch:
+    def build(self, view_requirements: Dict[str, ViewRequirement],


Maybe rename this to model_view_requirements for clarity.

Also: Model.inference_view_requirements -> Model.view_requirements.

sven1977 · 2020-12-04T17:02:35Z

rllib/evaluation/collectors/simple_list_collector.py

            # Python primitive or dict (e.g. INFOs).
            if isinstance(data, (int, float, bool, str, dict)):
-                self.buffers[col] = [0 for _ in range(shift)]
+                self.buffers[col] = [data for _ in range(shift)]


Important for custom initial state values. Cannot assume always 0 here.

sven1977 · 2020-12-04T17:03:50Z

rllib/evaluation/collectors/simple_list_collector.py

-            #         not view_requirements[view_col].used_for_training:
-            #     continue
-            self.buffers[view_col].extend(data)
+            # 1) If col is not in view_requirements, we must have a direct


I think this solves it:

if - after postprocessing - some column is not in the view-reqs, we must deal with a base-Policy child (w/o auto-view-requirement handling) -> leave as is

if we do have it in the view reqs AND used_for_training is False -> we must have gone through auto-detection, so it's save to remove it here (this column won't be needed for training).

sven1977 · 2020-12-04T17:04:25Z

rllib/evaluation/collectors/simple_list_collector.py

-                        data_col: view_req.space.sample()
-                    })
-                data_list.append(buffers[k][data_col][time_indices])
+                if data_col == SampleBatch.EPS_ID:


same as above: episode_id is always the same within one agent's collector. No need to collect an extra buffer here.

sven1977 · 2020-12-04T17:05:23Z

rllib/evaluation/rollout_worker.py

            fake_sampler: bool = False,
            spaces: Optional[Dict[PolicyID, Tuple[gym.spaces.Space,
                                                  gym.spaces.Space]]] = None,
+            _use_trajectory_view_api: bool = True,


Pass this in explicitly now into RolloutWorker (was derived from policy_config before, which is problematic as this could be a partial config dict)

sven1977 · 2020-12-04T17:06:15Z

rllib/evaluation/rollout_worker.py

+        # inherited from base `Policy` class. At this point here, the Policy
+        # must have it's Model (if any) defined and ready to output an initial
+        # state.
+        for pol in self.policy_map.values():


Do the auto internal state -> view req here after policy has been created. This covers direct child Policies of the base Policy class, which don't have an auto-view-req mechanism.

sven1977 · 2020-12-04T17:07:13Z

rllib/models/modelv2.py

        return self.time_major is True

+    # TODO: (sven) Experimental method.
+    def get_input_dict(self, sample_batch,


The Model is able to create an input_dict for a single-step forward pass from an agent's trajectory batch.

sven1977 · 2020-12-04T17:07:24Z

rllib/policy/dynamic_tf_policy.py

                    action_distribution=action_dist,
                    timestep=timestep,
                    explore=explore)
-            if self.config["_use_trajectory_view_api"]:


Not needed.

sven1977 · 2020-12-04T17:07:42Z

rllib/policy/dynamic_tf_policy.py

            mo = re.match("state_in_(\d+)", view_col)
            if mo is not None:
                input_dict[view_col] = self._state_inputs[int(mo.group(1))]
-                dummy_batch[view_col] = np.zeros_like(


Better do all these in one call below.

sven1977 · 2020-12-04T17:08:20Z

rllib/policy/dynamic_tf_policy.py

        batch_for_postproc = UsageTrackingDict(sb)
        batch_for_postproc.count = sb.count
        logger.info("Testing `postprocess_trajectory` w/ dummy batch.")
+        self.exploration.postprocess_trajectory(


Have to also call exploration's postprocessing (may access fields in the batch we need to track; e.g. curiosity).

sven1977 · 2020-12-04T17:09:18Z

rllib/policy/eager_tf_policy.py

            #  Just like torch Policy does.
            self._optimizer = optimizers[0] if optimizers else None

+            self._initialize_loss_from_dummy_batch(


Moved this here for consistency (same behavior as TorchPolicy). Also fixes a problem with curiosity where we do need the optimizer before loss init.

sven1977 · 2020-12-04T17:10:27Z

rllib/policy/policy.py


    def _update_model_inference_view_requirements_from_init_state(self):
-        """Uses this Model's initial state to auto-add necessary ViewReqs.
+        """Uses Model's (or this Policy's) init state to add needed ViewReqs.


Make this more robust against Policies that don't have a model, but do return something from get_initial_state().

sven1977 · 2020-12-04T17:10:55Z

rllib/policy/torch_policy_template.py

 from ray.rllib.policy.policy import Policy, LEARNER_STATS_KEY
 from ray.rllib.policy.sample_batch import SampleBatch
 from ray.rllib.policy.torch_policy import TorchPolicy
-from ray.rllib.policy.view_requirement import ViewRequirement


Not needed anymore in any Policy templates.

sven1977 · 2020-12-04T17:11:24Z

rllib/tests/test_multi_agent_env.py

-                            infos={},
-                            new_obs=obs_batch[0])
-                    batch = builder.build_and_reset(episode=None)
+                    env_id = episodes[0].env_id


Changed test case to use new SampleCollector.

ericl · 2020-12-06T01:28:02Z

rllib/agents/ppo/ppo_tf_policy.py

-                               sample_batch[SampleBatch.ACTIONS][-1],
-                               sample_batch[SampleBatch.REWARDS][-1],
-                               *next_state)
+        # Input dict is provided to us automatically via the Model's


Is the comment inaccurate now?

ericl · 2020-12-06T01:30:00Z

rllib/evaluation/collectors/simple_list_collector.py

                view-col to data-col in them).
+            inference_view_requirements (Dict[str, ViewRequirement]: The view
+                requirements dict needed to build an input dict for a ModelV2
+                forward call.


This argument doesn't seem to be used, can we remove it?

ericl · 2020-12-06T01:32:07Z

Looks good, but please resolve comments before merging.

sven1977 · 2020-12-06T09:59:38Z

👍 Will do.

…ntion_nets_prep_3

sven1977 added 21 commits November 26, 2020 14:24

WIP.

b5a4bc1

Fix.

0437680

WIP.

86911d9

WIP.

222f0a9

Fix.

5e269c4

Fix.

787810d

WIP.

7113c32

Merge branch 'attention_nets_prep_0' into attention_nets_prep_2

2040c93

Fixes and LINT.

f79faf7

Merge branch 'master' of https://github.com/ray-project/ray into atte…

9320694

…ntion_nets_prep_3

Merge branch 'master' of https://github.com/ray-project/ray into atte…

b5a31b3

…ntion_nets_prep_2

Merge branch 'attention_nets_prep_2' into attention_nets_prep_3

045a6f2

Fixes and LINT.

1d8fb50

WIP.

6769a2b

Fixes and LINT.

7176066

WIP.

ab077f6

Merge branch 'master' of https://github.com/ray-project/ray into atte…

bc084a2

…ntion_nets_prep_2

Merge branch 'master' of https://github.com/ray-project/ray into atte…

7241d82

…ntion_nets_prep_2

merge

26839ba

Merge branch 'attention_nets_prep_2' into attention_nets_prep_3

9178a8c

Merge branch 'master' of https://github.com/ray-project/ray into atte…

8e4565a

…ntion_nets_prep_3 � Conflicts: � rllib/policy/view_requirement.py

sven1977 changed the title ~~[ONHOLD RLlib] Attention Net prep PR #3.~~ [RLlib] Attention Net prep PR #3. Dec 1, 2020

sven1977 requested a review from ericl December 1, 2020 08:48

sven1977 assigned ericl Dec 1, 2020

sven1977 added 4 commits December 1, 2020 17:35

Fix.

8225693

Merge branch 'master' of https://github.com/ray-project/ray into atte…

6825f4e

…ntion_nets_prep_3

Merge branch 'master' of https://github.com/ray-project/ray into atte…

02aad5a

…ntion_nets_prep_3

Fixes.

e469f5e

ericl requested changes Dec 1, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 1, 2020

Fixes curiosity tests.

86d8ab2

sven1977 commented Dec 4, 2020

View reviewed changes

sven1977 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 4, 2020

ericl approved these changes Dec 6, 2020

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 6, 2020

sven1977 added 3 commits December 6, 2020 21:05

Merge branch 'master' of https://github.com/ray-project/ray into atte…

18d8fbf

…ntion_nets_prep_3

WIP and LINT.

8b0e483

Fix and LINT.

8753231

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Dec 7, 2020

sven1977 merged commit 99c81c6 into ray-project:master Dec 7, 2020

sven1977 deleted the attention_nets_prep_3 branch March 27, 2021 11:39

[RLlib] Attention Net prep PR #3. #12450

[RLlib] Attention Net prep PR #3. #12450

Uh oh!

Conversation

sven1977 commented Nov 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

ericl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericl commented Dec 6, 2020

Uh oh!

sven1977 commented Dec 6, 2020

sven1977 commented Nov 26, 2020 •

edited

Loading