GSA新建一project时,会有几种形式,第二种就是Add Url+Anchor Text。
问题来了,现在的想法是:从sitemap.xml中获得大量网站地址如:
现在想用Family-decorate-ximei.php 中的 Family decorate ximei 锚文本,在GSA中发链接。那么生成的新链接形式就应该是这样的:
- http://www.sample.com/Family_case_baijingongguan.php#Family case baijingonguan
- http://www.sample.com/Family_decorate.php#Family decorate
- http://www.sample.com/Family-decorate-54suo.php#Family decorate 54suo
- http://www.sample.com/Family-decorate-boyuewan.php#Family decorate boyuewan
- http://www.sample.com/Family-decorate-case.php#Family decorate case
- http://www.sample.com/Family-decorate-ximei.php#Family decorate ximei
如何实现??直接想到的办法就是用正则表达式提取一下,然后再与网站链接串链起来,保存到新的文件中去。看起来挺简单的,就用XHE搞定,代码如下
<?php
$xhe_host ="127.0.0.1:7000";
// The following code is required to properly run XWeb Human Emulator
require("../Templates/xweb_human_emulator.php");
//根据链接地址取得anchor text
function get_name_by_url($regex,$url,$str,$suffixStr) {
$newStr = "" ;
$allstrs = preg_match_all($regex,str_replace($suffixStr,"",str_replace($url,"",$str)),$matches, PREG_PATTERN_ORDER);
foreach($matches[0] as $allstr) {
$newStr .= $allstr." " ;
}
return $name = trim($newStr) ;
}
//$path 保存原链接的文件
//$newFile 保存新链接的文件
$path = "C:\\Users\\userName\\Desktop\\proxy 1.txt";
$newFile = "C:\\Users\\userName\\Desktop\\newFile.txt" ;
//取得链接总数
$num = $textfile->get_lines_count($path,$timeout=COMMAND_TIME);
//按行读取,然后保存
for($i=0;$i<$num;$i++) {
$str = $textfile->get_line_from_file($path,$rand=0,$i,$timeout=COMMAND_TIME);
$url="http://www.sample.com/ ;
$regex="/[a-z]+/" ;
$$suffixStr="php" ;
$name = get_name_by_url($regex,$url,$str,$suffixStr);
$str = trim($str)."#".$name."\r\n" ;
$textfile->add_string_to_file($newFile,$str,$timeout=COMMAND_TIME);
if($i%100==0){
echo "Adding --->".$str ;
echo "Finish Num: ".$i."<br>";
}
}
// Quit
$app->quit();
?>
搞定,直接在GSA中导入上面保存的文件newfile.txt就可以了。