-
Notifications
You must be signed in to change notification settings - Fork 209
Data object encoding #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data object encoding #692
Conversation
|
FYI: I haven't put any guards in yet to make this work for older versions of ruby (before |
|
I'll update this to pass for earlier versions of ruby. But I'm also looking for feedback on the approach. The more I think about it, the more I'm thinking this should use |
3110d41 to
3a845fd
Compare
|
For what it's worth, I've updated the PR to:
|
2292f0b to
325b073
Compare
|
Sorry. I thought I had it passing before, but I must have made a mistake in my earlier tests with ruby 2.5 (perhaps I had somehow disabled warnings locally)? I've changed the guards from |
|
Any thoughts about this approach? (Is it too close to the 3.4.0 release to hope for getting support for |
|
It would be possible to call JRuby's internal API from Ruby, but since we already have an extension for this library we can just add it there. This feature and the tests provided are guarded to only happen on Ruby 3.2+, so JRuby 9.4 would not be affected (it is 3.1 compatible) and JRuby 10 (3.4 compatible) is unreleased so we have some time to add JRuby support. |
|
@tenderlove Thanks. I did briefly consider abusing @headius I think that I may have screwed up the ruby 3.2 guard, the last time I updated my branch. I need to rebase anyway, so I'll fix it and double check that the tests pass under older ruby versions. |
This sets the ivars _before_ calling initialize, which feels wrong. But Data doesn't give us any mechanism for setting the members other than 1) initialize, or 2) drop down into the C API. Since initialize freezes the object, we need to set the ivars before that. I think this is a reasonable compromise—if users need better handling, they can implement their own `encode_with` and `init_with`. But it will lead to unhappy surprises for some users. Alternatively, we could use the C API, similarly to Marshal. Psych _is_ already using the C API for path2class and build_exception. This would be the least surprising behavior for users, I think.
325b073 to
3573fb3
Compare
|
@tenderlove Okay, I rebased on the latest I also changed the tests to use |
| VALUE args = rb_ary_new2(1); | ||
| rb_ary_push(args, attrs); | ||
| rb_struct_initialize(data, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very brittle, and probably only works on CRuby due to this bug or similar: https://bugs.ruby-lang.org/issues/18632
i.e. this is passing [{ ... }] to rb_struct_initialize(), i.e. a positional Hash, not keyword arguments.
Normal behavior would be to treat that Hash as the value of the first member, and don't assign other members. But it seems to accidentally work.
It would be a lot safer to pass the values as a simple Array instead, and query the members from the class by calling struct_or_data_class.members.
From truffleruby/truffleruby#4012 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to repro this in new C API specs but I actually can't, they fail on CRuby too:
it "accepts a single Hash in an Array" do
@klass = @s.rb_data_define(nil, "a", "b", "c")
data = @klass.allocate
@s.rb_struct_initialize(data, [{ a: 1, b: 2, c: 3 }]).should == nil
data.a.should == 1
data.b.should == 2
data.c.should == 3
endgives
ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [x86_64-linux]
2)
C-API Data function rb_struct_initialize accepts a single Hash in an Array FAILED
Expected {:a=>1, :b=>2, :c=>3} == 1
to be truthy but was false
From experimenting by tweaking those specs it only works when using keyword_init: true explicitly (like Struct.new(:a, :b, :c, keyword_init: true)) , and just never works for Data.
Maybe it "works" due to init_struct getting kwargs and accidentally propagating that flag to other C calls or so (just a guess).
Looks like it in https://github.com/ruby/ruby/blob/48c7f349f68846e10d60ae77ad299a38ee014479/struct.c#L766
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In marshal.c https://github.com/ruby/ruby/blob/48c7f349f68846e10d60ae77ad299a38ee014479/marshal.c#L2150-L2176
CRuby uses a Hash only if rb_struct_s_keyword_init(), in all other cases it uses an Array of values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since for this use case in Psych we only care about Data, not Struct, we should pass an Array of values (same as what Marshal does for Data, as far as I can tell rb_struct_s_keyword_init() is always false for Data classes).
@nevans Would you have time to fix this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thought is it would really be good to ask for rb_data_initialize() upstream at https://bugs.ruby-lang.org/, because reusing rb_struct_initialize() for Data feels like a pretty huge hack, given it doesn't run any function related to Data, only rb_struct_initialize_m() which is not meant to be used with Data.
Of course that can't be used now, but would enable a cleaner way in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix in #749
…s and not a Hash
* rb_struct_initialize() does not accept a Hash, and it's very brittle
to pass `[{...}]` and to rely on that C function using rb_keyword_given_p().
It basically worked accidentally, by having **members in the caller of the caller.
Such logic when Struct#initialize is defined in Ruby (as in TruffleRuby) is basically impossible to implement,
because it's incorrectly treating positional arguments as keyword arguments.
* rb_struct_initialize() is used in CRuby to set members of Data instances in marshal.c (there is no rb_data_initialize() yet).
There, the code passes an Array of members values for Data (and for Struct which are not `keyword_init: true`):
https://github.com/ruby/ruby/blob/48c7f349f68846e10d60ae77ad299a38ee014479/marshal.c#L2150-L2176
So we should do the same in psych.
* Rename to init_data since it's only used for Data.
* See ruby#692 (comment).
I thought maybe we could just not serialize ivars of Data instances and not try to set them, because the user-defined IOW, yeah I think we need |
This adds basic encoding/decoding for ruby 3.2
Dataobjects, using!ruby/data:ClassName(a mapping of member names to member values) and!ruby/data-with-ivars:ClassNametags (a mapping withmembersandivarsmappings). LikeMarshal, this usesrb_struct_initializeto assign member values to newDataobjects.Originally (in the first commit), I updated
Visitor::ToRubyto simply calldata.send(:initialize, **members). But#initializefreezes the object, so we can't set instance variables afterwards.The second commit sets instance variables before calling
#initialize, which I think will lead to unhappy surprises for some users. Unfortunately,Datadoesn't give us any other mechanism for setting the members... except for the C API.The third commit works around the above issue by copying how Marshal loads
Dataobjects, delegatingToRuby#init_struct(obj, members)torb_struct_initialize. I think this is the least surprising behavior for users. This also bypasses any user-defined#initializemethod (which may be seen as a feature or a bug).