Skip to content

Conversation

@electriclilies
Copy link
Contributor

@electriclilies electriclilies commented Jan 7, 2022

I propose that the JSON deserializer defaults fields to null.

The motivation for this change is increased backwards compatibility. I need to add virtual device to the VisitAttrs of most Relay nodes for #9826. Unfortunately, this means that JSON that does not have the virtual device in it is invalid, and we have to manually add "virtual_device_": "0" to many of the nodes in JSON to get it to import correctly.

After this change, if a JSON node does not contain the virtual_device_ field, it will still import the JSON correctly. By default it will set the value of the field to "0", which represents null. See the test that I added in test_json_compact.py for an example.

@mbs-octoml @jroesch @tqchen @manupa-arm @areusch PTAL if you are interested! Thanks!

@tqchen
Copy link
Member

tqchen commented Jan 7, 2022

Thanks lily. Json serializable behavior touches globally across the project across all object system, so such changes carries more weight would need more deliberation.

Let us analyze the overall pros and cons of the set of approaches.

The advantage of introducing such implicit behavior is of course when a field default to null and having such implicit behavior of the loader(that defaults to null) will automatically loads the staled version.

On the other hand, introducing implicit behavior can also bring other consequences. For example, let us assume that a community members want to introduce a new field, but the default value of the filed is not a null value(say incomplete type, or some value derived by other existing fields).
Having a default loading behavior would then mean that the load still happens "successfully", but then later we get a segmentation fault because null was not a valid solution. Similarly, in another real updates that happened before, imagine we want to rename a field from "name" to "myname". The implicit behavior was expecting an optional "myname", but the serialized field is name, and the implicit behavior simply set "myname" to null. This will again leads to a segfault some at some later time pt.

In both cases, an explicit error message indicating the incompletion of the serialized field will make a more clear error message and prevent can possible future failures that are harder to debug. Of course the price is that we get more explicit error messages and needs to deal with them.

The particular case of backward compatibility can be resolved by having an explicit upgrader, which could contain a few more lines of python code with a clear intention to the particular object type of interest. The alternative have smaller impact surface when comparing to the implicit behavior and will not cause unintentional extra behaviors as examples listed above. Because of the explicit customization, the upgrader can be customized in a way that also handles the above examples that implicit behavior may not handle correctly.

@electriclilies
Copy link
Contributor Author

electriclilies commented Jan 7, 2022

OK, thanks for the thoughts.

What do you think about just letting the virtual_device_ have a default? So, I'd add something like this:

if key == "virtual_device_":
    return "0"
else: 
   LOG(FATAL) << "... "

If I don't do that, I can definitely implement the converter, but I just want to note that I'd probably need to rev the TVM version, either to 0.81 or 0.90 (not sure what our convention for revving the version is) so we can convert the 0.80 graphs that we rely on.

@electriclilies
Copy link
Contributor Author

electriclilies commented Jan 7, 2022

Another possible solution is adding a HasValue method and a list of defaults.

Map<string, string> defualt_values = ["virtual_device_", "0"]

bool HasValue(const char* key) {
  return jnode_->attrs.find(key);
}

Then, we can replace calls to GetValue with:

if (!HasValue(key)  && key in default_values) {
   return default_values[key];
} else {
   LOG(FATAL) << "JSONReader: cannot find field";
}

@tqchen
Copy link
Member

tqchen commented Jan 7, 2022

Patch up specialized handling for a particular object and key is indeed one solution to the particular problem as well.

One would need to think about the implication though, since the system was originally designed to support arbitrary object, and adding specializations to objects would couple the serializer with the specific object, which are supposed to grow independently(except for a few key cases like common containers).
The upgrader is designed as a legacy path that separates from the main ones, with a goal to keep the overall architecture clean and separated from legacy behavior. As for the upgrader, right now we are in 0.9.dev period, and the version in the current system is v0.9.dev0. So likely adding the upgrading logic in a 08 to 09 upgrader can would serve the purpose.

So no updates to the version is needed assuming it is an upgrade from 0.8 version to v0.9 dev cycle, and also invoke it for code . For the code that get updated during v0.9 dev period, we can also invoke it when error happens in v0.9.dev, that checks if the field exists and if not add "0"(effectively the same proposed algorithm implemented in the json_compact.py), this will cover the upgrading with the same logic without having to bump the version.

@electriclilies
Copy link
Contributor Author

Alright, I'll do the updater and open a new PR. Closing this now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants