Skip to content

Conversation

mnorfolk03
Copy link

Updated simdjson to the latest version.

Now fields are identified as json ("x.y") rather than the previous slash method ("x/y"), thus the tests have been updated to accommodate this.

Additionally, added a GitHub workflow to run the tests.

@mnorfolk03
Copy link
Author

@microsoft-github-policy-service agree

@dongx-psu
Copy link
Collaborator

I will do a code review in a week.

Dong

@dongx-psu
Copy link
Collaborator

Did a quick pass on the code. LGTM.

@mnorfolk03 Can you run some basic performance regression tests for at least one dataset? For example, run an experiment on a large enough Github dataset and see how it compares with the old version. When it is done, please plot the figure and put it here for reference.

@badrishc Could you please glance over the code as well? Also, we are switching the CI/CD to Github Actions for better integration.

@mnorfolk03
Copy link
Author

image

Here is the data requested. This version will perform a bit worse when batchsize = 1, but it makes sense to switch to simdjson::ondemand, because it will perform better with larger batch sizes. Also note, that simdjson::dom, will perform worse than the ondemand.

@dongx-psu
Copy link
Collaborator

I think the code looks fine. @badrishc Can you check if it looks good to you? If so, I will merge it back to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants