The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Changes for version 0.06

  • some changes to handle Unicode more or less properly: normalization, unicode classes in regular expressions
  • speed optimizations
  • synced algorithm with current PHP version
  • changed tests to use empirically found threshold
  • data update

Documentation

download newer data for tokenizer

Modules

tokenizer for OpenCorpora project
download newer data for tokenizer
represents a file with vectors