wp2md

by dreikanter

dreikanter / wp2md

A script to convert Wordpress XML dump to markdown files

214 Stars 29 Forks Last release: Not found GNU General Public License v3.0 58 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

WordPress to Markdown Exporter

Update: I don't have much time to maintain this project, but I would really appreciate community help. If you looking for an open source project to contribute, it's a great opportunity. Pull request a very appreciated by me and migrating WordPress users.

A python script to convert WordPress XML dump to a set of plain text/markdown files. Intended to be used for migration from WordPress to public-static website generator, but could also be helpful as general purpose WordPress content processor.

Installation

The script could be installed by command:

pip install git+https://github.com/dreikanter/wp2md

It will install wp2md and the following dependencies:

Usage

Export WordPress data to XML file (Tools → Export → All content):

WordPress content export

And then run the following command:

wp2md -d /export/path/ wordpress-dump.xml

Where

/export/path/
is the directory where post and page files will be generated, and
wordpress-dump.xml
is the XML file exported by WordPress.

Use

--help
parameter to see the complete list of command line options:
usage: wp2md [options] source

Export WordPress XML dump to markdown files

positional arguments: source source XML dump exported from WordPress

optional arguments: -h, --help show this help message and exit -v verbose logging -l FILE log to file -d PATH destination path for generated files -u FMT date/time parsing format -o FMT and parsing format -f FMT date/time fields format for exported data -p FMT date prefix format for generated files -m preprocess content with Markdown (helpful for MD input) -n LEN post name (slug) length limit for file naming -r generate reference links instead of inline -ps PATH post files path (see docs for variable names) -pg PATH page files path -dr PATH draft files path -url keep absolute URLs in hrefs and image srcs -b URL base URL to subtract from hrefs (default is the root)

The output

The script generates a separate file for each post, page and draft, and groups it by configurable directory structure. By default posts are grouped by year-named directories and pages are just stored to the output folder.

Exported files

But you could specify different directory structure and file naming pattern using

-ps
,
-pg
and
-dr
parameters for posts, pages and drafts respectively. For example
-ps {year}/{month}/{day}/{title}.md
will produce date-based subfolders for blog posts.

Each exported file has a straightforward structure intended for further processing with public-static website generator. It has an INI-like formatted header followed by markdown-formatted post (or page) contents:

title: Я.Субботник в Санкт-Петербурге, 3 декабря
link: http://paradigm.ru/yandex-subbotni
creator: admin
description: 
post_id: 635
post_date: 2011-11-23 22:10:35
post_date_gmt: 2011-11-23 19:10:35
comment_status: open
post_name: yandex-subbotnik
status: publish
post_type: post

Я.Субботник в Санкт-Петербурге, 3 декабря

Я.Субботник в Санкт-Петербурге пройдет 3 декабря в офисе Яндекса. ...

If the post contains comments, they will be included below.

See also

Copyright and licensing

Copyright © 2013 by Alex Musayev.
License: GNU (see LICENSE).

Project home: https://github.com/dreikanter/wp2md.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.