i wanted to collect random urls from net… and this post talks about my current dirty way of doing so.
(1) Visited http://www.adddirectoryeasy.com/ and collected the source pages and executes following command;
(2) grep “a href=” web-sites.txt | grep “target=\”_blink\”" | awk ‘{ print $2 }’ | grep -v src | cut -d’=’ -f2 | cut -d’”‘ -f2 | uniq | wc -l > websites.txt