<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[Heck's  Blog]]></title> 
<link>https://www.heckjj.com/index.php</link> 
<description><![CDATA[一瞬间的决定，往往可以改变很多，事实上，让自己成功的往往不是知识，是精神！ 如果你总是为自己找借口，那只好让成功推迟。执行力，今天！]]></description> 
<language>zh-cn</language> 
<copyright><![CDATA[Heck's  Blog]]></copyright>
<item>
<link>https://www.heckjj.com/java-swing-convert-2-html/</link>
<title><![CDATA[java中把HTML转化成纯文本]]></title> 
<author>Heck &lt;@hecks.tk&gt;</author>
<category><![CDATA[编程杂谈]]></category>
<pubDate>Mon, 20 Sep 2010 18:03:10 +0000</pubDate> 
<guid>https://www.heckjj.com/java-swing-convert-2-html/</guid> 
<description>
<![CDATA[ 
	&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-family: 微软雅黑;">我们在很多时候你可能需要在程序中解析HTML文件或者字符串并从中抽取出文本内容来，要实现这个功能有很多工具可用。今天介绍的这个示例是使用swing中的一个类HTMLEditorKit.ParserCallback来实现这个功能。下面是一段示例代码：</span><br/><textarea name="code" class="java" rows="15" cols="100">import java.io.*;
import java.net.*;
import javax.swing.text.html.*;
import javax.swing.text.html.parser.*;
/**
 *&nbsp;&nbsp; http://www.hecks.tk
 *
 */
public class HtmlProcessor extends HTMLEditorKit.ParserCallback &#123;
&nbsp;&nbsp;StringBuffer textBuffer;
&nbsp;&nbsp;Reader reader;

&nbsp;&nbsp;public HtmlProcessor() &#123;
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public HtmlProcessor(Reader r) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;reader = r;
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public void parse() throws IOException &#123;
&nbsp;&nbsp;&nbsp;&nbsp;textBuffer = new StringBuffer();
&nbsp;&nbsp;&nbsp;&nbsp;ParserDelegator parserDelegator = new ParserDelegator();
&nbsp;&nbsp;&nbsp;&nbsp;parserDelegator.parse(reader, this, true);
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public void handleText(char[] text, int pos) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;textBuffer.append(text);
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public StringBuffer getTextBuffer() &#123;
&nbsp;&nbsp;&nbsp;&nbsp;return textBuffer;
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public void setTextBuffer(StringBuffer textBuffer) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;this.textBuffer = textBuffer;
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public static String htmlToPlainText(String html) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;StringReader sr = new StringReader(html);

&nbsp;&nbsp;&nbsp;&nbsp;HtmlProcessor d = new HtmlProcessor(sr);
&nbsp;&nbsp;&nbsp;&nbsp;try &#123;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;d.parse();
&nbsp;&nbsp;&nbsp;&nbsp;&#125; catch (IOException e) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// TODO Auto-generated catch block
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace();
&nbsp;&nbsp;&nbsp;&nbsp;&#125;

&nbsp;&nbsp;&nbsp;&nbsp;sr.close();
&nbsp;&nbsp;&nbsp;&nbsp;String result = d.getTextBuffer().toString();
&nbsp;&nbsp;&nbsp;&nbsp;sr = null;
&nbsp;&nbsp;&nbsp;&nbsp;d = null;
&nbsp;&nbsp;&nbsp;&nbsp;return result;
&nbsp;&nbsp;&#125;

&nbsp;&nbsp;public static void main(String[] argv) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;try &#123;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// the HTML to convert
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;URL toRead;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (argv.length == 1)
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;toRead = new URL(argv[0]);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;else
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;toRead = new URL("http://www.hecks.tk");

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;BufferedReader in = new BufferedReader(new InputStreamReader(toRead
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.openStream()));
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;HtmlProcessor processor = new HtmlProcessor(in);
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;processor.parse();
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in.close();

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println(processor.getTextBuffer());
&nbsp;&nbsp;&nbsp;&nbsp;&#125; catch (Exception e) &#123;
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;e.printStackTrace();
&nbsp;&nbsp;&nbsp;&nbsp;&#125;
&nbsp;&nbsp;&#125;
&#125;</textarea><br/> <span style="font-family: 微软雅黑;">其中：静态方法htmlToPlainText可以用来将html转化为纯文本，main函数示例了如何将一个网页的内容转化成为纯文本.</span><br/><br/>Tags - <a href="https://www.heckjj.com/tags/java/" rel="tag">java</a> , <a href="https://www.heckjj.com/tags/%25E8%25BD%25AC%25E5%258C%2596/" rel="tag">转化</a> , <a href="https://www.heckjj.com/tags/html/" rel="tag">html</a>
]]>
</description>
</item><item>
<link>https://www.heckjj.com/java-swing-convert-2-html/#blogcomment</link>
<title><![CDATA[[评论] java中把HTML转化成纯文本]]></title> 
<author> &lt;user@domain.com&gt;</author>
<category><![CDATA[评论]]></category>
<pubDate>Thu, 01 Jan 1970 00:00:00 +0000</pubDate> 
<guid>https://www.heckjj.com/java-swing-convert-2-html/#blogcomment</guid> 
<description>
<![CDATA[ 
	
]]>
</description>
</item>
</channel>
</rss>