The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Changes for version 0.014 - 2022-07-08

  • isWORDCHAR_utf8_safe() / toLOWER_utf8_safe() are actually available since Perl v5.26 (Stanislaw Pusep)
  • eg/benchmark.pl improvements (Stanislaw Pusep)

Documentation

compute cosine similarity between two documents
uses MinHash & SpeedyFx to compare large text data
efficiently count unique tokens from a file

Modules

tokenize/hash large amount of strings efficiently

Examples