.\" Automatically generated by Pod::Man 2.27 (Pod::Simple 3.28) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' . ds C` . ds C' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .\" .\" Avoid warning from groff about undefined register 'F'. .de IX .. .nr rF 0 .if \n(.g .if rF .nr rF 1 .if (\n(rF:(\n(.g==0)) \{ . if \nF \{ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . if !\nF==2 \{ . nr % 0 . nr F 2 . \} . \} .\} .rr rF .\" ======================================================================== .\" .IX Title "HTML::FormatText::WithLinks 3" .TH HTML::FormatText::WithLinks 3 "2015-01-04" "perl v5.16.3" "User Contributed Perl Documentation" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" HTML::FormatText::WithLinks \- HTML to text conversion with links as footnotes .SH "SYNOPSIS" .IX Header "SYNOPSIS" .Vb 1 \& use HTML::FormatText::WithLinks; \& \& my $f = HTML::FormatText::WithLinks\->new(); \& \& my $html = qq( \& \&
\&\& Some html with a link \&
\& \& \& ); \& \& my $text = $f\->parse($html); \& \& print $text; \& \& # results in something like \& \& Some html with a [1]link \& \& 1. http://example.com/ \& \& my $f2 = HTML::FormatText::WithLinks\->new( \& before_link => \*(Aq\*(Aq, \& after_link => \*(Aq [%l]\*(Aq, \& footnote => \*(Aq\*(Aq \& ); \& \& $text = $f2\->parse($html); \& print $text; \& \& # results in something like \& \& Some html with a link [http://example.com/] \& \& my $f3 = HTML::FormatText::WithLinks\->new( \& link_num_generator => sub { \& return "*" x (shift() + 1); \& }, \& footnote => \*(Aq[%n] %l\*(Aq \& ); \& \& $text = $f3\->parse($html); \& print $text; \& \& # results in something like \& \& Some html with a [*]link \& \& [*] http://example.com/ .Ve .SH "DESCRIPTION" .IX Header "DESCRIPTION" HTML::FormatText::WithLinks takes \s-1HTML\s0 and turns it into plain text but prints all the links in the \s-1HTML\s0 as footnotes. By default, it attempts to mimic the format of the lynx text based web browser's \-\-dump option. .SH "METHODS" .IX Header "METHODS" .SS "new" .IX Subsection "new" .Vb 1 \& my $f = HTML::FormatText::WithLinks\->new( %options ); .Ve .PP Returns a new instance. It accepts all the options of HTML::FormatText plus .IP "base" 4 .IX Item "base" a base option. This should be set to a \s-1URI\s0 which will be used to turn any relative URIs on the \s-1HTML\s0 to absolute ones. .IP "doc_overrides_base" 4 .IX Item "doc_overrides_base" If a base element is found in the document and it has an href attribute then setting doc_overrides_base to true will cause the document's base to be used. This defaults to false. .IP "before_link (default: '[%n]')" 4 .IX Item "before_link (default: '[%n]')" .PD 0 .IP "after_link (default: '')" 4 .IX Item "after_link (default: '')" .ie n .IP "footnote (default: '[%n] %l')" 4 .el .IP "footnote (default: '[%n] \f(CW%l\fR')" 4 .IX Item "footnote (default: '[%n] %l')" .PD a string to print before a link (i.e. when the is found), after link has ended (i.e. when then is found) and when printing out footnotes. .Sp \&\*(L"%n\*(R" will be replaced by the link number, \*(L"%l\*(R" will be replaced by the link itself. .Sp If footnote is set to '', no footnotes will be printed. .IP "link_num_generator (default: sub { return \fIshift()\fR + 1 })" 4 .IX Item "link_num_generator (default: sub { return shift() + 1 })" link_num_generator is a sub that returns the value to be printed for a given link number. The internal store starts numbering at 0. .IP "with_emphasis" 4 .IX Item "with_emphasis" If set to 1 then italicised text will be surrounded by \f(CW\*(C`/\*(C'\fR and bolded text by \&\f(CW\*(C`_\*(C'\fR. You can change these markers by using the \f(CW\*(C`italic_marker\*(C'\fR and \&\f(CW\*(C`bold_marker\*(C'\fR options. .IP "unique_links" 4 .IX Item "unique_links" If set to 1 then will only generate 1 footnote per unique \s-1URI\s0 as oppose to the default behaviour which is to generate a footnote per \s-1URI.\s0 .IP "anchor_links" 4 .IX Item "anchor_links" If set to 0 then links pointing to local anchors will be skipped. The default behaviour is to include all links. .IP "skip_linked_urls" 4 .IX Item "skip_linked_urls" If set to 1, then links where the text equals the href value will be skipped. The default behaviour is to include all links. .SS "parse" .IX Subsection "parse" .Vb 1 \& my $text = $f\->parse($html); .Ve .PP Takes some \s-1HTML\s0 and returns it as text. Returns undef on error. .PP Will also return undef if you pass it undef. Returns an empty string if passed an empty string. .SS "parse_file" .IX Subsection "parse_file" .Vb 1 \& my $text = $f\->parse_file($filename); .Ve .PP Takes a filename and returns the contents of the file as plain text. Returns undef on error. .SS "error" .IX Subsection "error" .Vb 1 \& $f\->error(); .Ve .PP Returns the last error that occurred. In practice this is likely to be either a warning that parse_file couldn't find the file or that HTML::TreeBuilder failed. .SH "CAVEATS" .IX Header "CAVEATS" When passing \s-1HTML\s0 fragments the results may be a little unpredictable. I've tried to work round the most egregious of the issues but any unexpected results are welcome. .PP Also note that if for some reason there is an a tag in the document that does not have an href attribute then it will be quietly ignored. If this is really a problem for anyone then let me know and I'll see if I can think of a sensible thing to do in this case. .SH "AUTHOR" .IX Header "AUTHOR" Struan Donald.