我的第一个开源框架,webspider
最近要写一个爬虫,本来打算用Python来写的,但是好久没用python写过东西了,加上最近的.NET项目都在LINUX上运行了,所以,打算用.NET来写,网上搜索了很多.NET的爬虫框架,运行起都或多或少有点问题,不那么顺手,就自己写了一个,很简单的。
开源地址:https://github.com/BruceProject/WebSpider
My first open source project,WebSpider ! welcome to commit ,issue
How to use?
怎么使用?
#demo
static void Main(string[] args)
{
CrawlerConfig config = new CrawlerConfig();
config.Encoding = Encoding.GetEncoding("GB2312");
Crawler spider = new Crawler(config,"http://www.baidu.com/s?wd=webspider");
spider.CanCrawlEvent += Spider_CanCrawlEvent;
spider.CanCrawLinksEvent += Spider_CanCrawLinksEvent;
spider.PageCrawlCompletedEvent += Spider_PageCrawlCompletedEvent;
spider.AllCrawlCompletedEvent += Spider_AllCrawlCompletedEvent;
spider.Start();
Console.Read();
}
//All url crawl completed
//所有URL执行结束
private static void Spider_AllCrawlCompletedEvent(object sender, AllCrawlCompletedArgs e)
{
Console.WriteLine("completed");
}
//a url crawl completed,Support csquery what's a framwork of operating dom like jquery
//当一个Url抓取完成时执行,支持csquery,一个可以像JQUERY一样操作dom的框架
private static void Spider_PageCrawlCompletedEvent(object sender, PageCrawlCompletedArgs e)
{
var title = e.CQDocument.Select(".art_h1").Text();
}
private static bool Spider_CanCrawLinksEvent(string url)
{
return true;
}
private static bool Spider_CanCrawlEvent(string url)
{
return true;
}
#AND other
if u have any questioin ,welcome to issue.
============ 欢迎各位老板打赏~ ===========
与本文相关的文章
- · github连接超时:Connection closed by remote host
- · 解决jenkins git@github.com: Permission denied (publickey)
- · Fwd: 关于爬虫技术法律问题咨询
- · Windows下配置Git多账号github码云
- · mac一台电脑配置多个github帐号
- · [转]用python爬取指定用户微博图片及内容,并进行微博分类及使用习惯分析
- · The instance of entity type ‘Customer’ cannot be tracked because another instance with the same key value for {‘Id’} is already being tracked.
- · .NET8实时更新nginx ip地址归属地
- · 解决.NET Blazor子组件不刷新问题
- · .NET8如何在普通类库中引用 Microsoft.AspNetCore
- · .NET8 Mysql SSL error
- · ASP.NET Core MVC的Razor视图渲染中文乱码的问题