Discovery of WordPress websites using wp-json

WordPress exposes a new REST API since version 4.7. This API can be exploited to retrieve potential confidential information.

Discovery of WordPress websites using wp-json

Wow! This has been a while again since I posted for the last time! I have a big project which occupied me for the last months, and I probably won’t be able to publish about it in full… But I’m preparing some posts about precise technical points I discovered during this project.

In the meantime, a remote code execution popped on Drupal (CVE-2018-7600) and got me to analyse some PHP code. The exploit has been at last released last week and only works for Drupal 8 at the time I write this post.

I wanted to take a break from the last 2 subjects and the Drupal case got me to ask myself which information I could get from WP JSON API. Actually this is a pretty old question I wanted to answer.

The WordPress JSON REST API

When WordPress 4.7 was released, it implemented a new feature activated by default: the WordPress JSON REST API.

In fact, this feature already existed in previous versions of WordPress thanks to the plugin rest-api. And WordPress 4.4 prepared the field by implementing the oembed part of the API allowing to embed posts or pages on other sites.

This API is accessible from the /wp-json URL. You can find an example at the following address: https://demo.wp-api.org/wp-json/

Moreover the following parts of the article are based on the documentation available here : https://developer.wordpress.org/rest-api/

This API allows to retrieve a lot of information. According to the documentation, we can list users, posts, pages, tags, comments, categories, media and more. With the proper authorization, we can also perform modifying actions as post, edit or delete actions. When supported, some plugins can also add their own endpoints to the API.

But the main part that got my attention is the list part. As we can list users, pages or posts is there any chance that we could obtain confidential information? The answer is yes, and I got it from my own blog (yes right here).

Retrieving information from the API

The base endpoint just lists information about the blog, the capabilities of the API, the endpoints and a light documentation of the endpoints.

WP Json Base Endpoint
WP Json Base Endpoint

This seems to be some harmless data, and it most of the cases it is, but it allows to know more about the target because it explicitly details each endpoint. And this is also valid for plugins (like the broker plugin above). Following these specifications we can push the attack further.

I will not detail each information I managed to get but here are 2 examples which could cause problem in the case an author or admin doesn’t pay attention or is guided with a false sense of security.

The first one affected me (before I close the wp-json interface 😀) and had an impact because of the tag mechanism of WordPress.

This tag system allows to add tags (I see them as keywords for the post) to a post. But when you create the tags (which can be bulk-created in the post edit form), they are immediatly public on the API even if the post is not published yet. This does not happen for media for example, they stay private as long as the post has not been published.

Tag disclosed before post publication
Tag disclosed before post publication

In my case, this was impactful because I have some post drafts about the big project I mentionned before and the tags gave all the necessary information to know what I intended to speak about. But, obviously, the knowledge of the tag did not allow to retrieve the draft posts.

I tested another case, when you upload a media not attached to any post or page. In this case, the default behaviour is to make it public. But with no reference in the application, it is complicated to guess where the file is and even if the file exists.

But not with WP JSON API :

Media not linked to any post or page
Media not linked to any post or page in WP-JSON

So in fact, WP JSON API might not be responsible by design of serious data leak, but any little misuse of WordPress by an author or an administrator can introduce confidential data accessible by this API.

Crawling the API for fun and profit

Analyzing the API can be a bit fastidious, so I wrote a Python tool to scrape all available information from the WP JSON API. I published the source code on Github. Don’t hesitate to give me some feedback on it.

The repo address: https://github.com/MickaelWalter/wp-json-scraper

This tool allows you to enumerate all pubicly available information on the WordPress instance. It also allows you to extract some data like all public posts or pages and thus facilitates exhaustive scraping of the application.

Conclusion

The WP JSON API gives a huge amount of information about the WordPress installed. Although this could be deactivated, it is active by default since WordPress 4.7. It doesn’t leak confidential data by itself by any misuse of the application (even little ones) can lead to information disclosure.

So, either it should be deactivated, either you must pay attention of what data is available and take measures to limit the disclosure. You should also, if you let the interface open, periodically query it to verify if no data leaks from it.