Releases: LanguageMachines/foliautils
Releases · LanguageMachines/foliautils
v0.12
v0.11
-
Updated and added some tests
-
started moving common code to a separate file and build a library
(libfoliautils)- hemp detection is one of them
-
FoLiA-stats:
- added possibility to read a list of directories + file-names to process
into separate output directories. (could be generalized to other programs) - better hemp detection
- added possibility to read a list of directories + file-names to process
-
FoLiA-correct:
- use same hemp detection as FoLiA-stats
-
FoLiA-abby:
- support more flavors
-
FoLiA-clean:
- avoid removing the last remaining tekt on nodes
- cleaning of tokenization now works
v0.10
[Ko vd Sloot]
- fixed icu:namespace issues
- added FoLiA-abby, an ABBY to FoLiA convertor
- src/FoLiA-abby.cxx, src/FoLiA-page.cxx, src/FoLiA-pm.cxx:
- Allow 'none' value for --prefix
- src/FoLiA-page.cxx, src/FoLiA-hocr.cxx: fixed Alignment info
- src/FoLiA-correct.cxx:
- fixed a problem with correction of the last word of a trigram.
- fix correction of paragraphs with only deeper text
- The --rank option accepts more flavors of files
- src/FoLiA-stats.cxx:
- added a --detokenize option
- several minor fixes, refactorings etc.
- updated tests
v0.9.2
Bug fix release:
- append small prefixes to output filenames, to ALWAYS avoid names starting with
a numeric value.
'FPM-' for FoLiA-pm. 'FP-' for FoLiA-page, 'FH-' for FoLiA-hocr
Can bet set witth --prefix - FoLiA-stats.cxx:
- added --collect to usage() and 'man' page
- FoLiA-correct:
- added --inputclass and --outputclass parameters (must be different)
- Don't crash on empty text.
v0.9.1
v0.9
[Ko vd Sloot]
- FoLiA-stats.cxx:
- added a --collect option, to create files with all n-grams
together - clearer message in FoLiA-stats when no results were found
- extract text from deeper nodes, if needed
- fixed out-of-bounds problem
- now fails when every input file fails
- added a --collect option, to create files with all n-grams
- FoLiA-txt:
- now fails when every input file fails
- avoid xml:id's starting with a number. Add "id-" in front.
- added more tests
[Maarten van Gompel]
- added codemeta.json
v0.8
- added -R option to FoLiA-collect
- FoLiA-collect now can work in parallel (-t option)
- modernized configuration, whit better Max OSX support (including OpenMP)
- all modules end with an exit code now.
- added more tests to 'make check'
- added output of Type-Token Ratio's (also in degrees)
- several bugfixes.
- code cleanup and refactoring, some speedup too
v0.7
[ko vd Sloot]
- updated and expanded tests
- fixed offset calculations in FoLiA-hocr, FoLiA-page.cxx
and FoLiA-alto. We use unicode points now. (needed for folia v1.5 and above) - Changed 'modes' in FoLiA-stats, to be a bit more comprehensible
- fixed problem with metadatatype when 'foreign-data' is present
- enhanced FoLiA-clean. Still not done...
- switched to dynamic OMP scheduling in most programs.
(which process files with probably big differences in processing time) - small bugfixes.
- general cleanup and refactoring
[Maarten van Gompel]
- Added and improved FoLiA-wordtranslate.cxx
v0.6
foliautils 0.6 04-04-2017
This is an intermediate release!!
Work on some tools is developing rapidly. next releases won't take long.
For now, backward compatibility is still maintained mostly.
[Ko van der Sloot]
- uses libfolia 1.7 now!
- FoLiA-correct now uses an other output file naming scheme (breaks backward compitablity)
- FoLiA-langcat now has a --tags parameter to select which nodes are searched
- FoLiA-stats:
- a new --separator option is added
- added a --max-ngram option.
- added a --languages option for multiple languages
- now we have a --aggregate option for multiple language statistics
- fixed a bug in total counts
- added a first version of FoLiA-clean program. Cleans up tests/tags in FoLiA files.
- FoLiA-correct:
- output statistics
- verbosity option improved
- added and improved a lot of tests
v0.5
- based on libfolia 1.5 or higher
- use recent ucto with textcat support
- use ISO 639-3 language names
- lot's of code refactoring
- improved tests
- bug fixes in FoLiA-correct unigram correction
- extended and improved FoLiA-pm a lot
- changed default values for '--lang' and '--class' in FoLiA-stats (issue #3)
- FoLiA-alto can now work without a Didl too (issue #2)
- numerous additions...