Feature/optional lowercase #2

MRudolph · 2014-10-13T14:10:01Z

This pull request extends the command line tool for training and evaluation (iitb.Segment.Segment) by making downcasing of tokens optional.

This seems to be a destructive action, because it's done before the features are generated.
Some languages (e.g. german) depend on capitalisation for distinguishing words, so this might be a valuable resource which should not removed.

For not breaking existing setups, there are new methods which can handle the optional downcasing.
It's on by default, but can switched off by adding "lowercase=false" to the configuration.

Tests are included and succeed (they are modified copies of the tests for the original tests).

Running the applications with the sample dataset also seems to work fine.

…hout lowercasing

witgo · 2014-10-21T14:19:30Z

@MRudolph
Sorry for late reply.
This is a big change, I need more testing, may need more time to review the code

witgo · 2014-10-21T14:25:04Z

test/iitb/Segment/DataCruncherReadRowVarColTest.java

nit: Four spaces.

MRudolph added 3 commits August 28, 2014 15:13

Merge branch 'feature/separator' into develop

e7ec80d

changed datacruncher (and depended classes) to support processing wit…

e47759f

…hout lowercasing

fixed test

aa05c72

witgo reviewed Oct 21, 2014
View reviewed changes

test/iitb/Segment/DataCruncherReadRowVarColTest.java

Copy link

Owner

witgo Oct 21, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Four spaces.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/optional lowercase #2

Feature/optional lowercase #2

Uh oh!

MRudolph commented Oct 13, 2014

Uh oh!

witgo commented Oct 21, 2014

Uh oh!

witgo Oct 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature/optional lowercase #2

Are you sure you want to change the base?

Feature/optional lowercase #2

Uh oh!

Conversation

MRudolph commented Oct 13, 2014

Uh oh!

witgo commented Oct 21, 2014

Uh oh!

witgo Oct 21, 2014

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants