The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Word::Segmenter::Chinese::Lite - Split Chinese into words

SYNOPSIS

  use Word::Segmenter::Chinese::Lite qw(wscl_seg wscl_set_mode);

  my @result = wscl_seg("中华人民共和国成立了oyeah");
  foreach (@result)
  {
    print $_, "\n";
  }
  # got:
  # 中华人民共和国
  # 成立
  # 了
  # oyeah

  wscl_set_mode("obigram");
  my @result = wscl_seg("中华人民共和国成立了");
  foreach (@result)
  {
    print $_, "\n";
  }
  # got:
  # 中华
  # 华人
  # 人民
  # 民共
  # 共和
  # 和国
  # 国成
  # 成立
  # 立了
  # 了

  wscl_set_mode("unigram");
  my @result = wscl_seg("中华人民共和国");
  foreach (@result)
  {
    print $_, "\n";
  }
  # got:
  # 中
  # 华
  # 人
  # 民
  # 共
  # 和
  # 国

METHODS

wscl_set_mode($mode)

Optional.

You can choose modes below.

"dict" : Default. 词典分词,本模块自带词典。

"unigram" : 一元分词。

"obigram" : Overlapping Bigram. 交叉二元分词。

wscl_seg($chinese_article, $max_word_length)

Main method.

Input a chinese article which want to de splited.

Output a list.

$chinese_article -- must be utf8 encoding

$max_word_length -- Optional

EXPORT

no method will be exported by default.

AUTHOR

Chen Gang, <yikuyiku.com@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2014 by Chen Gang

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.16.2 or, at your option, any later version of Perl 5 you may have available.