Skip to content

aykae/compling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

compling

A compilation of computational linguistics projects by A.K. Rai.

SentGen

SentGen is a computational linguistics program that is meant to stochastically generate gramatically correct English sentences. The generator works by taking in a specified Engilsh grammar in the form of phrase structure rules, and building a tree structure whose root is a 'Sentence' node and whose leaves are English words (terminal symbols). Optional rules are followed randomly to produce some variations in sentence structure.

The English word data was generated by unifying the Google 10k most common English word dataset with a 370k Part of Speech tagged English dataset. This program is a computational implementation of the Phrase Structure theory presented in UCLA Prof. Bruce Hayes' Introductory Linguistics. The grammar used in the program is also directly from this text. Work still needs to be done to add proper articles, prepositions, and inflection. Semantics are completely ignored, so the sentences will likely not make sense.

Try it out yourself here.



About

A compilation of computational linguistics projects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published