NAME

Lingua::TFIDF::WordSegmenter::SplitBySpace - Simple word segmenter suitable for most european languages

VERSION

version 0.01

SYNOPSIS

  use Lingua::TFIDF::WordSegmenter::SplitBySpace;
  
  my $segmenter = Lingua::TFIDF::WordSegmenter::SplitBySpace->new(
    lower_case => 1,
    remove_punctuations => 1,
    stop_words => [qw/i you he she it they a the am are is was were/],
  );
  my $iter = $segmenter->segment('Humpty Dumpty sat on wall, ...');
  while (defined(my $word = $iter->())) { ... }

DESCRIPTION

This class is a simple word segmenter. Like Text::TFIDF, this class segments a sentence into words by spliting by spaces.

METHODS

new([ lower_case => 0 ] [, remove_punctuations => 0 ] [, stop_words => [] ])

Constructor. Takes some optional parameters:

lower_case

Set off by default. Convert all the words into lower cases.

remove_punctuations

Set off by default. Removes punctuation characters (e.g., commas, periods, quotes, question marks and exclamation marks) from head and tail of segmented words. Note that punctuations at inside of a word (e.g., "King's") will be remain unchanged.

stop_words

Specifies words you want to exclude from segmented words. This is useful for removing functional words.

Note that stop word filtering will be performed after lower_case and remove_punctuations options are processed. So, for example, if you enable lower_case option and want to exclude "I" from result, you should supply the stop word list as ['i'].

segment($document | \$document)

Executes word segmentation on given $document and returns an word iterator.

AUTHOR

Koichi SATOH <sekia@cpan.org>

COPYRIGHT AND LICENSE

This is free software, licensed under:

  The MIT (X11) License

To install Lingua::TFIDF, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::TFIDF

CPAN shell

perl -MCPAN -e shell
install Lingua::TFIDF

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)