解析WEB中所有URL
- static void Main(string[] args)
- {
- HttpWebHelper httpWebHelper = new HttpWebHelper();
- string webCodeStr = "utf-8";
- string referer = @"http://www.cnblogs.com/";
- Encoding webcode = Encoding.GetEncoding(webCodeStr);
- string htmlText = httpWebHelper.SimpleDoPostWrapper(referer, "get", webcode, null, referer);
- string[] urls = GetHtmlHrefUrlList(htmlText);
- }
- public static string[] GetHtmlHrefUrlList(string sHtmlText)
- {
- Regex regHref = new Regex(@"(?is)<a(?:(?!href=).)*href=(['""]?)(?<url>[^""\s>]*)\1[^>]*>(?<text>(?:(?!</?a\b).)*)</a>", RegexOptions.IgnoreCase);
- MatchCollection matches = regHref.Matches(sHtmlText);
- int i = 0;
- string[] sUrlList = new string[matches.Count];
- foreach (Match match in matches)
- sUrlList[i++] = match.Groups["url"].Value;
- return sUrlList;
- }
这样就可以了,HttpWebHelper是我封装了HttpWebRequest的一个类,取到内容以后用这个正则就可以返回所有的url
============ 欢迎各位老板打赏~ ===========
与本文相关的文章
- · The instance of entity type ‘Customer’ cannot be tracked because another instance with the same key value for {‘Id’} is already being tracked.
- · .NET8实时更新nginx ip地址归属地
- · 解决.NET Blazor子组件不刷新问题
- · .NET8如何在普通类库中引用 Microsoft.AspNetCore
- · .NET8 Mysql SSL error
- · ASP.NET Core MVC的Razor视图渲染中文乱码的问题
- · .NETCORE 依赖注入服务生命周期
- · asp.net zero改mysql
- · .NET5面试汇总
- · .Net连接Mysql数据库的Convert Zero Datetime日期问题
- · vue使用element-ui中的Message 、MessageBox 、Notification
- · Asp.Net Core Filter 深入浅出的那些事-AOP