xml sitemap to url list command line tool in python

I am much into technical SEO. Testing broken links with xenu is my routine work. So, I wanted to import list of website urls from sitemap.xml to feed xenu software.

So, I tried to implement it in my favorite scripting language Python. Pyquery is python dom parsing tool similar to jquery for nodejs.

Let’s install required dependency first.

Verify pyquery setup

Now let’s look into pyquery basics.

In jQuery you select dom node as follow

In pyQuery you will select dom node as follow

So above statement will create pyquey object and assign it to variable jQuery.

Now let’s code sitemap grabber and parser.

Sample sitemap format

Grabbing and parsing url with pyquery

above statement will parse sitemap.xml node value.

Now lets do small python file manipulation tutorial for list of urls saving.

Above statement will open file and write urls from node.

Finally let’s make it command line tool

