The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

DTA::CAB::Analyzer::Morph::SMOR - morphological analysis via Gfsm automata, for SMOR-style transducers (e.g. Zmorge)

SYNOPSIS

 use DTA::CAB::Analyzer::Morph::SMOR;
 
 $morph = DTA::CAB::Analyzer::Morph::SMOR->new(%args);
 $morph->analyze($tok);

DESCRIPTION

DTA::CAB::Analyzer::Morph::SMOR is a subclass of DTA::CAB::Analyzer::Morph::Helsinki::DE suitable for use with SMOR-style transducers, including zmorge transducers as produced by the SMORLemma grammar.

To produce a GFSM transducer (zmorge.gfst) and vocabulary (zmorge.lab) suitable for use with this module from one of the binary SFST-format transducers available from https://pub.cl.uzh.ch/users/sennrich/zmorge/, do something like the following (in debian at least):

 sudo apt-get install sfst unzip wget sed gawk
 wget https://pub.cl.uzh.ch/users/sennrich/zmorge/transducers/zmorge-20150315-smor_newlemma.a.zip
 unzip zmorge-20150315-smor_newlemma.a.zip
 fst-print zmorge-20150315-smor_newlemma.a | sed 's/ /_/g;' > zmorge.tfst
 cat zmorge.tfst \
   | awk -F$'\t' '{ if (NF >= 4) { print $3 "\n" $4 } }' \
   | sed 's/^<>$//;' \
   | sort -u \
   | sed 's/^$/<>/;' \
   | awk '{print $1 "\t" NR-1}' \
   > zmorge.lab
  gfsmcompile -z0 -l zmorge.lab zmorge.tfst | gfsminvert -z0 | gfsmarcsort -l -F zmorge.gfst

You can then test the compiled transducer with this module by calling e.g.:

 dta-cab-analyze.perl -ac=Morph::SMOR -ao=fstFile=zmorge.gfst -ao=labFile=zmorge.lab -fc=text -w Vermittlungsgespräche

which should produce something like the following output:

 Vermittlungsgespräche
        +[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Acc>][<Pl>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Dat>][<Sg>][<Old>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Gen>][<Pl>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Nom>][<Pl>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Acc>][<Pl>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Dat>][<Sg>][<Old>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Gen>][<Pl>] <0>
        +[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Nom>][<Pl>] <0>

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2021 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 105:

Non-ASCII character seen before =encoding in 'Vermittlungsgespräche'. Assuming UTF-8