This program is a Python XML-RPC server that accepts an English word and returns a continuous value (from 0 to 1, inclusive) on how complex that word is seen to second-language English speakers.
This program is a Python XML-RPC server that accepts an English
word and returns a continuous value (from 0 to 1, inclusive) on how complex that
word is seen to second-language English speakers.
It uses a Decision Tree Regressor (implemented by scikit-learn) to perform its
work. The model uses five different features of a word:
This work is based off of a machine learning system submitted to a natural
language processing workshop, called the Semantic Evaluation Exercises
International Workshop on Semantic Evaluation 2016 (SemEval-2016). More
specifically, this system was submitted to compete in Task 11, Complex Word
Identification. It ranked 5 out of 40 systems according to its G-score–the
harmonic mean between accuracy and recall–on a test set.
The machine learning system comes already trained on the data provided by Task
11, so you don’t have to worry about finding data to train it with.
Final Paper Submitted to SemEval-2016: TBD
SemEval-2016 Task 11 Description: http://alt.qcri.org/semeval2016/task11/
Related Work: https://hmcsimplification.wordpress.com/author/mauryquijada/