欢迎来到Heck's Blog,专业承接拿站、企业建站、仿站、网上商城架构、门户网站搭建、空间域名注册、软件定制等项目。关注网络安全,因为专注,所以专业,懂得放弃,才能收获。有事请发邮件至i@heckjj.com,请记住本站网址:http://www.heckjj.com,多谢。
10月25
其实这是一个web搜索的基本程序,从命令行输入搜索条件,起始的URL、处理url的最大数、要搜索的字符串,它就会逐个对Internet上的URL进行实时搜索,查找并输出匹配搜索条件的页面。 这个程序的原型来自《java编程艺术》,为了更好的分析,Heck去掉了其中的GUI部分,并稍作修改以适用jdk1.5。以这个程序为基础,可以写出在互联网上搜索诸如图像、邮件、网页下载之类的“爬虫”。
先请看程序运行的过程:
D:\java>javac SearchCrawler.java
D:\java>java SearchCrawler http://127.0.0.1:8080/webhome/index.jsp 20 java
Start searching...
result:
searchString=java
http://127.0.0.1:8080/webhome/index.jsp
http://127.0.0.1:8080/webhome/reply.jsp
http://127.0.0.1:8080/webhome/learn.jsp
http://127.0.0.1:8080/webhome/download.jsp
http://127.0.0.1:8080/webhome/article.jsp
http://127.0.0.1:8080/webhome/HeckGUIOverview.htm
http://127.0.0.1:8080/webhome/myexample/Proxooldoc/index.html
http://127.0.0.1:8080/webhome/view.jsp?id=301
http://127.0.0.1:8080/webhome/view.jsp?id=297
http://127.0.0.1:8080/webhome/view.jsp?id=291
http://127.0.0.1:8080/webhome/view.jsp?id=286
http://127.0.0.1:8080/webhome/view.jsp?id=285
http://127.0.0.1:8080/webhome/view.jsp?id=284
http://127.0.0.1:8080/webhome/view.jsp?id=276
http://127.0.0.1:8080/webhome/view.jsp?id=272
先请看程序运行的过程:
D:\java>javac SearchCrawler.java
D:\java>java SearchCrawler http://127.0.0.1:8080/webhome/index.jsp 20 java
Start searching...
result:
searchString=java
http://127.0.0.1:8080/webhome/index.jsp
http://127.0.0.1:8080/webhome/reply.jsp
http://127.0.0.1:8080/webhome/learn.jsp
http://127.0.0.1:8080/webhome/download.jsp
http://127.0.0.1:8080/webhome/article.jsp
http://127.0.0.1:8080/webhome/HeckGUIOverview.htm
http://127.0.0.1:8080/webhome/myexample/Proxooldoc/index.html
http://127.0.0.1:8080/webhome/view.jsp?id=301
http://127.0.0.1:8080/webhome/view.jsp?id=297
http://127.0.0.1:8080/webhome/view.jsp?id=291
http://127.0.0.1:8080/webhome/view.jsp?id=286
http://127.0.0.1:8080/webhome/view.jsp?id=285
http://127.0.0.1:8080/webhome/view.jsp?id=284
http://127.0.0.1:8080/webhome/view.jsp?id=276
http://127.0.0.1:8080/webhome/view.jsp?id=272





