Releases · chrisjbryant/errant

04 Nov 07:15

chrisjbryant

v3.0.0

115dc23

ERRANT v3.0.0 Latest

Latest

v3.0.0 (04-11-23)

Finally updated ERRANT to support Spacy 3!
- I specifically tested Spacy 3.2 - 3.7 and found a negligible difference in performance on the BEA19 dev set.
- This update also comes with an unexpected 10-20% speed gain.
Added a .gitignore file. #39
Renamed master branch to main.

Assets 2

14 Apr 13:43

chrisjbryant

v2.3.3

1864370

ERRANT v2.3.3

v2.3.3 (14-04-22)

Missed one case of changing Levenshtein to rapidfuzz... Now fixed.

Assets 2

14 Apr 09:56

chrisjbryant

v2.3.2

ea07b4a

ERRANT v2.3.2

v2.3.2 (14-04-22)

Add more details to verbose ERRANT scoring. #29
Simplified the new rapidfuzz functions. #35

Assets 2

13 Apr 10:52

chrisjbryant

v2.3.1

7f41822

ERRANT v2.3.1

v2.3.1 (13-04-22)

Replaced the dependency on python-Levenshtein with rapidfuzz to overcome a licensing conflict. ERRANT and its dependencies now all use the MIT license. This change has a negligible effect on a tiny number of alignments. #34

Assets 2

15 Jul 16:48

chrisjbryant

v2.3.0

9111c6c

ERRANT v2.3.0

v2.3.0 (15-07-021)

Added some new rules to reduce the number of OTHER-type 1:1 edits and classify them as something else. Specifically, there are now ~40% fewer 1:1 OTHER edits and ~15% fewer n:n OTHER edits overall (tested on the FCE and W&I training sets combined). The changes are as follows:
- A possessive suffix at the start of a merge sequence is now always split:
Example people life -> people 's lives

Old life -> 's lives (R:OTHER)

New ε -> 's (M:NOUN:POSS), life -> lives (R:NOUN:NUM)
- NUM <-> DET edits are now classified as R:DET; e.g. one (cat) -> a (cat). Thanks to @katkorre!
- Changed the string similarity score in the classifier from the Levenshtein ratio to the normalised Levenshtein distance based on the length of the longest input string. This is because we felt some ratio scores were unintuitive; e.g. smt -> something has a ratio score of 0.5 despite the insertion of 6 characters (the new normalised score is 0.33).
- The non-word spelling error rules were updated slightly to take the new normalised Levenshtein score into account. Additionally, dissimilar strings are now classified based on the POS tag of the correction rather than as OTHER; e.g. amougnht -> number (R:NOUN).
- The new normalised Levenshtein score is also used to classify many of the remaining 1:1 replacement edits that were previously classified as OTHER. Many of these are real-word spelling errors (e.g. their <-> there), but there are also some morphological errors (e.g. health -> healthy) and POS-based errors (e.g. transport -> travel). Note that these rules are a little complex and depend on both the similarity score and the length of the original and corrected strings. For example, form -> from (R:SPELL) and eventually -> finally (R:ADV) both have the same similarity score of 0.5 yet are differentiated as different error types based on their string lengths.
Various minor updates:
- out_m2 in parallel_to_m2.py and m2_to_m2.py is now opened and closed properly. #20
- Fixed a bracketing error that deleted a valid edit in rare circumstances. #26 #28
- Updated the English wordlist.
- Minor changes to the readme.
- Tidied up some code comments.

Example	people life -> people 's lives
Old	life -> 's lives (R:OTHER)
New	ε -> 's (M:NOUN:POSS), life -> lives (R:NOUN:NUM)

Assets 2

12 Feb 14:21

chrisjbryant

v2.2.3

6c0d521

ERRANT v2.2.3

v2.2.3 (12-02-21)

Changed the dependency version requirements in setup.py since ERRANT v2.2.x is not compatible with spaCy 3.

Assets 2

14 Aug 21:33

chrisjbryant

v2.2.2

2a08f30

ERRANT v2.2.2

This is the first github version release. For all previous changes, refer to the CHANGELOG.md.

v2.2.2 (14-08-20)

Added a copy of the NLTK Lancaster stemmer to errant.en.lancaster and removed the NLTK dependency. It was overkill to require the entire NLTK package just for this stemmer so we now bundle it with ERRANT.
Replaced the deprecated tokens_from_list function from spaCy v1 with the Doc function from spaCy v2 in Annotator.parse.

Assets 2

02 Sep 13:16

chrisjbryant

v2.2.1

9992e0a

ERRANT v2.2.1

v2.2.1 (17-05-20)

Fixed key error in the classifier for rare spaCy 2 POS tags: _SP, BES, HVS.

Assets 2

02 Sep 13:16

chrisjbryant

v2.2.0

1a56544

ERRANT v2.2.0

v2.2.0 (06-05-20)

ERRANT now works with spaCy v2.2. It is 4x slower, but this change was necessary to make it work on Python 3.7.
SpaCy 2 uses slightly different POS tags to spaCy 1 (e.g. auxiliary verbs are now tagged AUX rather than VERB) so I updated some of the merging rules to maintain performance.

Assets 2

02 Sep 13:15

chrisjbryant

v2.1.0

e1e6066

ERRANT v2.1.0

v2.1.0 (09-01-20)

The character level cost in the sentence alignment function is now computed by the much faster python-Levenshtein library instead of python's native difflib.SequenceMatcher. This makes ERRANT 3x faster!
Various minor updates:

Updated the English wordlist.
Fixed a broken rule for classifying contraction errors.
Changed a condition in the calculation of transposition errors to be more intuitive.
Partially updated the ERRANT POS tag map to match the updated Universal POS tag map. Specifically, EX now maps to PRON rather than ADV, LS maps to X rather than PUNCT, and CONJ has been renamed CCONJ. I did not change the mapping of RP from PART to ADP yet because this breaks several rules involving phrasal verbs.
Added an errant.__version__ attribute.
Added a warning about using ERRANT with spaCy 2.
Tidied some code in the classifier.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0 (04-11-23)

v2.3.3 (14-04-22)

v2.3.2 (14-04-22)

v2.3.1 (13-04-22)

v2.3.0 (15-07-021)

v2.2.3 (12-02-21)

v2.2.2 (14-08-20)

v2.2.1 (17-05-20)

v2.2.0 (06-05-20)

v2.1.0 (09-01-20)

Releases: chrisjbryant/errant

ERRANT v3.0.0

v3.0.0 (04-11-23)

ERRANT v2.3.3

v2.3.3 (14-04-22)

ERRANT v2.3.2

v2.3.2 (14-04-22)

ERRANT v2.3.1

v2.3.1 (13-04-22)

ERRANT v2.3.0

v2.3.0 (15-07-021)

ERRANT v2.2.3

v2.2.3 (12-02-21)

ERRANT v2.2.2

v2.2.2 (14-08-20)

ERRANT v2.2.1

v2.2.1 (17-05-20)

ERRANT v2.2.0

v2.2.0 (06-05-20)

ERRANT v2.1.0

v2.1.0 (09-01-20)