![]() |
WP Extractor | Simple WordPress Posts & Pages Extractor in Python - Printable Version +- Post4VPS Forum | Free VPS Provider (https://post4vps.com) +-- Forum: Geek World (https://post4vps.com/Forum-Geek-World) +--- Forum: Scripting & Programming (https://post4vps.com/Forum-Scripting-Programming) +--- Thread: WP Extractor | Simple WordPress Posts & Pages Extractor in Python (/Thread-WP-Extractor-Simple-WordPress-Posts-Pages-Extractor-in-Python) |
WP Extractor | Simple WordPress Posts & Pages Extractor in Python - Manal - 11-08-2020 WPExtractor - WordPress Blog Post Extractor in JSON Format WPExtractor is a python-based tool specifically made for Artificial Intelligence-based projects to make datasets. This helps to collect data from blogs which can be used to train bot in many useful ways. Features
Usage Usage: Code: python main.py -u https://csrockers.in By default, it will fetch posts from the website. To fetch pages, use the following. Code: python main.py -u https://fulltimehosting.net --pages Credits Manal Shaikh & Somil Gumber. RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - tbelldesignco - 11-09-2020 Ooh I am curious about this further. I am working on creating an app for my business and I could possibly use something like this to feed content in JSON structs to my app. Thank you! I will defiantly be looking into this tool further! RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - tiwil - 11-12-2020 Just looking at the script. I never thought that Wordpress have that /wp-json/ directory. I tested with my Wordpress website and it shows all of my pages and posts. Nice works, Manal! This one can help us to show the posts for a non-WP website. Out of topic, but, how did you find the URL to that JSON? RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - tbelldesignco - 11-16-2020 I just want to confirm how amazing this tool is - I’ve currently have my app fetching content from my site and populating dynamic content into the app, I will have to get screenshots up when I’m back on my MacBook. RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - LightDestory - 11-26-2020 Nice projects, it is sad when an open source software became an abandonware. You did a very nice job picking up it and reviving it. I did a very fast read of the code and you seems to be using wp-json stuff, it is a good choice but why don't use the sitemap? Moreover you could replace the ifs inside the error handling with a swtich, it does the same thing but it is more "appropriate" for that situation. RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - fitkoh - 11-26-2020 This looks like an interesting tool. I'm not familiar with python, but I can see it's value. Unfortunately I wasn't able to get it working (remember, python newbie). First, I installed python. Quote:~$ sudo apt-get install python no problems there. Then I try to run main.py as recommended in your readme. Quote:~$ python main.py -u https://url.url Okay so something is missing. I do some googling and find first I need to install pip. Quote:~$ sudo apt-get install python3-pip that one looks good. So lets try to install request... Quote:~$ sudo pip3 install requests So I have requests installed but for some reason my python installation isn't recognizing it. I'm thinking maybe the problem is due to multiple versions of python installed but I'm not sure which one to keep or if there's a safe way to remove one without breaking the other. Perhaps it'd be simpler to reinstall and start from scratch? (I'm working on a dev server so no worries about losing anything). RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - Manal - 11-27-2020 Try using Code: python3 main.py -u https://url.url The tool is based on python3, and I recommend you install requests using pip3, as you installed. (11-26-2020, 05:18 PM)fitkoh Wrote: This looks like an interesting tool. I'm not familiar with python, but I can see it's value. Unfortunately I wasn't able to get it working (remember, python newbie). (11-26-2020, 12:47 PM)LightDestory Wrote: Nice projects, it is sad when an open source software became an abandonware. You did a very nice job picking up it and reviving it. Noted for next update. The reason why I didn't use sitemap is sitemap may often missout on posts that are outside the website. This tool aims at everything that is not draft or password from the main content. And I just started learning Python. This is just a pet project :3 RE: WP Extractor | Simple WordPress Posts & Pages Extractor in Python - Sn1F3rt - 11-27-2020 Great project @Manal! It works perfectly. Moreover, it's open-source, I took a look at the source code too. ![]() I'm myself a Python Developer and glad to see you are starting with Python, you won't regret it. ![]() |