Skip to content

Releases: LanguageMachines/foliautils

v0.12

29 May 10:36
Compare
Choose a tag to compare

Released for FoLiA 2.0

v0.11

15 May 13:02
Compare
Choose a tag to compare
  • Updated and added some tests

  • started moving common code to a separate file and build a library
    (libfoliautils)

    • hemp detection is one of them
  • FoLiA-stats:

    • added possibility to read a list of directories + file-names to process
      into separate output directories. (could be generalized to other programs)
    • better hemp detection
  • FoLiA-correct:

    • use same hemp detection as FoLiA-stats
  • FoLiA-abby:

    • support more flavors
  • FoLiA-clean:

    • avoid removing the last remaining tekt on nodes
    • cleaning of tokenization now works

v0.10

29 Nov 14:15
Compare
Choose a tag to compare

[Ko vd Sloot]

  • fixed icu:namespace issues
  • added FoLiA-abby, an ABBY to FoLiA convertor
  • src/FoLiA-abby.cxx, src/FoLiA-page.cxx, src/FoLiA-pm.cxx:
    • Allow 'none' value for --prefix
  • src/FoLiA-page.cxx, src/FoLiA-hocr.cxx: fixed Alignment info
  • src/FoLiA-correct.cxx:
    • fixed a problem with correction of the last word of a trigram.
    • fix correction of paragraphs with only deeper text
    • The --rank option accepts more flavors of files
  • src/FoLiA-stats.cxx:
    • added a --detokenize option
  • several minor fixes, refactorings etc.
  • updated tests

v0.9.2

05 Jun 12:12
Compare
Choose a tag to compare

Bug fix release:

  • append small prefixes to output filenames, to ALWAYS avoid names starting with
    a numeric value.
    'FPM-' for FoLiA-pm. 'FP-' for FoLiA-page, 'FH-' for FoLiA-hocr
    Can bet set witth --prefix
  • FoLiA-stats.cxx:
    • added --collect to usage() and 'man' page
  • FoLiA-correct:
    • added --inputclass and --outputclass parameters (must be different)
    • Don't crash on empty text.

v0.9.1

17 May 09:43
Compare
Choose a tag to compare

Bug Fix release:

  • the tests directory wasn't included in the release

v0.9

16 May 15:53
Compare
Choose a tag to compare

[Ko vd Sloot]

  • FoLiA-stats.cxx:
    • added a --collect option, to create files with all n-grams
      together
    • clearer message in FoLiA-stats when no results were found
    • extract text from deeper nodes, if needed
    • fixed out-of-bounds problem
    • now fails when every input file fails
  • FoLiA-txt:
    • now fails when every input file fails
  • avoid xml:id's starting with a number. Add "id-" in front.
  • added more tests

[Maarten van Gompel]

  • added codemeta.json

v0.8

19 Feb 14:17
Compare
Choose a tag to compare
  • added -R option to FoLiA-collect
  • FoLiA-collect now can work in parallel (-t option)
  • modernized configuration, whit better Max OSX support (including OpenMP)
  • all modules end with an exit code now.
  • added more tests to 'make check'
  • added output of Type-Token Ratio's (also in degrees)
  • several bugfixes.
  • code cleanup and refactoring, some speedup too

v0.7

24 Oct 11:00
Compare
Choose a tag to compare

[ko vd Sloot]

  • updated and expanded tests
  • fixed offset calculations in FoLiA-hocr, FoLiA-page.cxx
    and FoLiA-alto. We use unicode points now. (needed for folia v1.5 and above)
  • Changed 'modes' in FoLiA-stats, to be a bit more comprehensible
  • fixed problem with metadatatype when 'foreign-data' is present
  • enhanced FoLiA-clean. Still not done...
  • switched to dynamic OMP scheduling in most programs.
    (which process files with probably big differences in processing time)
  • small bugfixes.
  • general cleanup and refactoring

[Maarten van Gompel]

  • Added and improved FoLiA-wordtranslate.cxx

v0.6

04 Apr 09:54
Compare
Choose a tag to compare
v0.6 Pre-release
Pre-release

foliautils 0.6 04-04-2017
This is an intermediate release!!
Work on some tools is developing rapidly. next releases won't take long.
For now, backward compatibility is still maintained mostly.

[Ko van der Sloot]

  • uses libfolia 1.7 now!
  • FoLiA-correct now uses an other output file naming scheme (breaks backward compitablity)
  • FoLiA-langcat now has a --tags parameter to select which nodes are searched
  • FoLiA-stats:
    • a new --separator option is added
    • added a --max-ngram option.
    • added a --languages option for multiple languages
    • now we have a --aggregate option for multiple language statistics
    • fixed a bug in total counts
  • added a first version of FoLiA-clean program. Cleans up tests/tags in FoLiA files.
  • FoLiA-correct:
    • output statistics
    • verbosity option improved
  • added and improved a lot of tests

v0.5

17 Jan 09:28
Compare
Choose a tag to compare
  • based on libfolia 1.5 or higher
    • use recent ucto with textcat support
    • use ISO 639-3 language names
    • lot's of code refactoring
    • improved tests
    • bug fixes in FoLiA-correct unigram correction
    • extended and improved FoLiA-pm a lot
    • changed default values for '--lang' and '--class' in FoLiA-stats (issue #3)
    • FoLiA-alto can now work without a Didl too (issue #2)
    • numerous additions...