Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

licensing problem #44

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

This log describes all the changes made to ERRANT since its release.

## v2.2.2 (14-08-20)

1. Added a copy of the NLTK Lancaster stemmer to `errant.en.lancaster` and removed the NLTK dependency. It was overkill to require the entire NLTK package just for this stemmer so we now bundle it with ERRANT.

2. Replaced the deprecated `tokens_from_list` function from spaCy v1 with the `Doc` function from spaCy v2 in `Annotator.parse`.

## v2.2.1 (17-05-20)

Fixed key error in the classifier for rare spaCy 2 POS tags: _SP, BES, HVS.
Expand Down
3 changes: 2 additions & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# MIT License

Copyright (c) 2020 Omri Abend, Leshem Choshen, Matanel Oren
Copyright (c) 2017 Christopher Bryant, Mariano Felice

Permission is hereby granted, free of charge, to any person obtaining a copy
Expand All @@ -18,4 +19,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.
199 changes: 101 additions & 98 deletions README.md

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions demo/cor.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
This is a great sentence .
Can you see the sea from where you live ?
There is no need to say that we are highly motivated .
This sentence contains no errors .
4 changes: 0 additions & 4 deletions demo/cor1.txt

This file was deleted.

4 changes: 0 additions & 4 deletions demo/cor2.txt

This file was deleted.

7 changes: 3 additions & 4 deletions demo/orig.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
This are a great sentences .
This are the most great sentences .
Can you seen the sea from where live you .
Let us discuss about all softwares problems you 've been having recently .
This sentence contains no errors .

There is no needing to say that they say we are highly motivation .
This sentence contains no errors .
25 changes: 0 additions & 25 deletions demo/out.m2

This file was deleted.

18 changes: 18 additions & 0 deletions demo/out_combined.m2
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
S This are the most great sentences .
A 1 2|||R:VERB:SVA|||is|||REQUIRED|||-NONE-|||0
A 2 4|||R:DET:MW|||a|||REQUIRED|||-NONE-|||0
A 5 6|||R:NOUN:NUM|||sentence|||REQUIRED|||-NONE-|||0

S Can you seen the sea from where live you .
A 2 3|||R:VERB:WC|||see|||REQUIRED|||-NONE-|||0
A 7 9|||R:WO|||you live|||REQUIRED|||-NONE-|||0
A 9 10|||R:PUNCT:WC|||?|||REQUIRED|||-NONE-|||0

S There is no needing to say that they say we are highly motivation .
A 3 4|||R:VERB->NOUN|||need|||REQUIRED|||-NONE-|||0
A 7 9|||U:VERB||||||REQUIRED|||-NONE-|||0
A 12 13|||R:NOUN->ADJ|||motivated|||REQUIRED|||-NONE-|||0

S This sentence contains no errors .
A -1 -1|||noop|||-NONE-|||REQUIRED|||-NONE-|||0

18 changes: 18 additions & 0 deletions demo/out_errant.m2
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
S This are the most great sentences .
A 1 2|||R:VERB:SVA|||is|||REQUIRED|||-NONE-|||0
A 2 4|||R:OTHER|||a|||REQUIRED|||-NONE-|||0
A 5 6|||R:NOUN:NUM|||sentence|||REQUIRED|||-NONE-|||0

S Can you seen the sea from where live you .
A 2 3|||R:VERB|||see|||REQUIRED|||-NONE-|||0
A 7 9|||R:WO|||you live|||REQUIRED|||-NONE-|||0
A 9 10|||R:PUNCT|||?|||REQUIRED|||-NONE-|||0

S There is no needing to say that they say we are highly motivation .
A 3 4|||R:MORPH|||need|||REQUIRED|||-NONE-|||0
A 7 9|||U:OTHER||||||REQUIRED|||-NONE-|||0
A 12 13|||R:MORPH|||motivated|||REQUIRED|||-NONE-|||0

S This sentence contains no errors .
A -1 -1|||noop|||-NONE-|||REQUIRED|||-NONE-|||0

18 changes: 18 additions & 0 deletions demo/out_sercl.m2
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
S This are the most great sentences .
A 1 2|||AUX->AUX|||is|||REQUIRED|||-NONE-|||0
A 2 4|||DET->DET|||a|||REQUIRED|||-NONE-|||0
A 5 6|||NOUN->NOUN|||sentence|||REQUIRED|||-NONE-|||0

S Can you seen the sea from where live you .
A 2 3|||VERB->VERB|||see|||REQUIRED|||-NONE-|||0
A 7 9|||VERB->VERB|||you live|||REQUIRED|||-NONE-|||0
A 9 10|||PUNCT->PUNCT|||?|||REQUIRED|||-NONE-|||0

S There is no needing to say that they say we are highly motivation .
A 3 4|||VERB->NOUN|||need|||REQUIRED|||-NONE-|||0
A 7 9|||VERB->None||||||REQUIRED|||-NONE-|||0
A 12 13|||NOUN->ADJ|||motivated|||REQUIRED|||-NONE-|||0

S This sentence contains no errors .
A -1 -1|||noop|||-NONE-|||REQUIRED|||-NONE-|||0

22 changes: 18 additions & 4 deletions demo/readme.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,21 @@
## ERRANT Demo
## SERRANT Demo

Assuming you have read the main readme and installed ERRANT successfully, you can try running it on the sample text in this directory to make sure it's running properly:
Assuming you have read the main readme and installed SERRANT successfully, you can try running it on the sample text in this directory to make sure it's running properly:

`errant_parallel -orig orig.txt -cor cor1.txt cor2.txt -out test.m2`
#### Annotated by ERRANT:

This should produce a file called `test.m2` which is identical to `out.m2`.
`serrant_parallel -orig orig.txt -cor cor.txt -out test_errant.m2 -annotator errant`

This should produce a file called `test_errant.m2` which is identical to `out_errant.m2`.

#### Annotated by SerCl:

`serrant_parallel -orig orig.txt -cor cor.txt -out test_sercl.m2 -annotator sercl`

This should produce a file called `test_sercl.m2` which is identical to `out_sercl.m2`.

#### Our combination of both:

`serrant_parallel -orig orig.txt -cor cor.txt -out test_combined.m2 -annotator combined`

This should produce a file called `test_combined.m2` which is identical to `out_combined.m2`.
27 changes: 0 additions & 27 deletions errant/__init__.py

This file was deleted.

33 changes: 33 additions & 0 deletions serrant/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from importlib import import_module
import spacy
from serrant.annotator import Annotator

# SERRANT version
__version__ = '1.0'
# compatible to ERRANT version 2.2.2

# Load an ERRANT Annotator object for a given language
def load(lang, nlp=None):
# Make sure the language is supported
supported = {"en"}
if lang not in supported:
raise Exception("%s is an unsupported or unknown language" % lang)

# Load spacy
model_per_lang={"en":"en_core_web_sm"}
nlp = nlp or spacy.load(model_per_lang[lang], disable=["ner"])

# Load language edit merger
merger = import_module("serrant.%s.merger" % lang)

# Load language edit classifier
classifier = import_module("serrant.%s.classifier" % lang)
# Load sercl (syntactic classifier)
syntax_classifier = import_module("serrant.syntactic_classifier")
# Load combiner
combiner = import_module("serrant.%s.sercl_errant_combine" % lang)
# The English classifier needs spacy
if lang == "en": classifier.nlp = nlp

# Return a configured ERRANT annotator
return Annotator(lang, nlp, merger, classifier, syntax_classifier, combiner)
2 changes: 1 addition & 1 deletion errant/alignment.py → serrant/alignment.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from itertools import groupby
import Levenshtein
import spacy.parts_of_speech as POS
from errant.edit import Edit
from serrant.edit import Edit

class Alignment:
# Protected class resource
Expand Down
79 changes: 64 additions & 15 deletions errant/annotator.py → serrant/annotator.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,24 @@
from errant.alignment import Alignment
from errant.edit import Edit
from serrant.alignment import Alignment
from serrant.edit import Edit
from copy import copy
from spacy.tokens import Doc


# Main ERRANT Annotator class
class Annotator:
class Annotator:

# Input 1: A string language id: e.g. "en"
# Input 2: A spacy processing object for the language
# Input 3: A merging module for the language
# Input 4: A classifier module for the language
def __init__(self, lang, nlp=None, merger=None, classifier=None):
def __init__(self, lang, nlp=None, merger=None, classifier=None, syntax_classifier=None,
classification_combiner=None):
self.lang = lang
self.nlp = nlp
self.merger = merger
self.classifier = classifier
self.errant_classifier = classifier
self.syntax_classifier = syntax_classifier
self.combiner = classification_combiner

# Input 1: A text string
# Input 2: A flag for word tokenisation
Expand All @@ -21,7 +27,7 @@ def parse(self, text, tokenise=False):
if tokenise:
text = self.nlp(text)
else:
text = self.nlp.tokenizer.tokens_from_list(text.split())
text = Doc(self.nlp.vocab, text.split())
self.nlp.tagger(text)
self.nlp.parser(text)
return text
Expand Down Expand Up @@ -56,29 +62,65 @@ def merge(self, alignment, merging="rules"):
return edits

# Input: An Edit object
# Output: The same Edit object with an updated error type
def classify(self, edit):
return self.classifier.classify(edit)
# Output: The same Edit object with an updated error type by errant
def classify_by_errant(self, edit):
return self.errant_classifier.classify(edit)

# Input: An Edit object
# Output: The same Edit object with an updated error type by sercl
def classify_syntactically(self, edit):
return self.syntax_classifier.classify(edit)

# Input 1: An original text string parsed by spacy
# Input 2: A corrected text string parsed by spacy
# Input 3: A flag for standard Levenshtein alignment
# Input 4: A flag for merging strategy
# Output: A list of automatically extracted, typed Edit objects
def annotate(self, orig, cor, lev=False, merging="rules"):
# Output: A list of automatically extracted, typed Edit objects by errant
def errant_annotate(self, orig, cor, lev=False, merging="rules"):
alignment = self.align(orig, cor, lev)
edits = self.merge(alignment, merging)
for edit in edits:
edit = self.classify_by_errant(edit)
return edits

# Input 1: An original text string parsed by spacy
# Input 2: A corrected text string parsed by spacy
# Input 3: A flag for standard Levenshtein alignment
# Input 4: A flag for merging strategy
# Output: A list of automatically extracted, typed Edit objects by sercl
def syntax_annotate(self, orig, cor, lev=False, merging="rules"):
alignment = self.align(orig, cor, lev)
edits = self.merge(alignment, merging)
for edit in edits:
edit = self.classify(edit)
edit = self.classify_syntactically(edit)
return edits

# Input 1: An original text string parsed by spacy
# Input 2: A corrected text string parsed by spacy
# Input 3: A flag for standard Levenshtein alignment
# Input 4: A flag for merging strategy
# Input 5: A flag for annotating strategy
# Output: A list of automatically extracted, typed Edit objects
def annotate(self, orig, cor, lev=False, merging="rules", annotator='combined'):
errant_edits = self.errant_annotate(orig, cor, lev, merging)
sercl_edits = self.syntax_annotate(orig, cor, lev, merging)

assert len(errant_edits) == len(sercl_edits)
if self.combiner is None or annotator == 'errant':
return errant_edits
if annotator == 'sercl':
return sercl_edits
return [self.combiner.classification_combiner(errant_edit, sercl_edit) for errant_edit, sercl_edit in
zip(errant_edits, sercl_edits)]

# Input 1: An original text string parsed by spacy
# Input 2: A corrected text string parsed by spacy
# Input 3: A token span edit list; [o_start, o_end, c_start, c_end, (cat)]
# Input 4: A flag for gold edit minimisation; e.g. [a b -> a c] = [b -> c]
# Input 5: A flag to preserve the old error category (i.e. turn off classifier)
# Input 5: A flag for annotating strategy (if old_cat==False)
# Output: An Edit object
def import_edit(self, orig, cor, edit, min=True, old_cat=False):
def import_edit(self, orig, cor, edit, min=True, old_cat=False, annotator='combined'):
# Undefined error type
if len(edit) == 4:
edit = Edit(orig, cor, edit)
Expand All @@ -93,6 +135,13 @@ def import_edit(self, orig, cor, edit, min=True, old_cat=False):
if min:
edit = edit.minimise()
# Classify edit
if not old_cat:
edit = self.classify(edit)
if not old_cat:
errant_edit = self.classify_by_errant(copy(edit))
sercl_edit = self.classify_syntactically(copy(edit))
if self.combiner is None or annotator == 'errant':
edit = errant_edit
elif annotator == 'sercl':
edit = sercl_edit
else:
edit = self.combiner.classification_combiner(errant_edit, sercl_edit)
return edit
File renamed without changes.
File renamed without changes.
Loading