Replies: 4 comments 7 replies
-
It would be good if you could make specific issues when you notice underspecification somewhere, that would be appreciated, we would want to clarify everything there, including edge cases (cc @gouttegd). We keep a number of test cases here which we run (validate, convert) with every change to the schema using CI: https://github.com/mapping-commons/sssom/tree/master/examples/schema We can add additional test cases of course but lets do that independently for every single issue like "quotes in TSV". |
Beta Was this translation helpful? Give feedback.
-
I'd prefer one common repository with examples of:
And every SSSOM implementation to run against these examples. Some edge cases I am not sure about:
|
Beta Was this translation helpful? Give feedback.
-
The TSV rules of the SSSOM specification don’t come out of nowhere. They have been chosen because they correspond to behaviours that are very common among CSV/TSV parsers – in particular, they correspond to the default behaviours of both Pandas’ parsers (used under the hood by SSSOM-Py) and Jackson’s parsers (used under the hood by SSSOM-Java). In fact I would greatly encourage you to also rely on a well-designed, well-tested CSV/TSV parser library (don’t know what is available in the Javascript world, but I would hope there are such libraries) instead of trying to come up with your own parser. CSV/TSV is a much trickier format than most people initially assume, and writing a good parser for it is not trivial. |
Beta Was this translation helpful? Give feedback.
-
sssom-js now includes several test cases in directory |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've started a SSSOM/TSV parser in JavaScript but the format is quite nasty in some details such as "Multi-valued slots with a single value" and escaped tabulator characters. It's also not clear to me whether values starting or ending with a
"
only are errors or unescaped values and what to do with columns of unknown name and rows of different number of columns. I've looked into source code of sssom-py and sssom-java but these seem to delegate the parsing to an existing CSV library and I would not be surprised if these libraries don't strictly follow SSSSOM/TSV specification for edge cases!tl;dr: a set of test cases is required, in particular for edge cases
Beta Was this translation helpful? Give feedback.
All reactions