I am trying to use wikiprep with support of MediaWiki::DumpFile to parse wikipedia dump. The problem is it doesn't work. wikiprep -format composite -compress -f enwiki.xml.0001.gz Can't locate object method "namespaces" via package "MediaWi
I am making a search results page on my website, i am trying to make it detect when a company is searched and get a Wikipedia script from it. But i only want it to show the code if the search is a company similar to google and bing Google search for
I am trying to scrap some data from Wikipedia from 100 pages (approx.). (the pages have same format). Wikipedia has made its API available which gives the content in XML format or I can directly get the data from the page using jsoup. Which method sh
I have seen some posts in Wikipedia are updates in the next few minutes after that happens Example: Highest individual scores in ODI , I Observed that post updated in few minutes after Rohit hit is 264 http://en.wikipedia.org/wiki/List_of_One_Day_Int
How to extract the infobox data for a Wikipedia page using DBPedia? It would be great if any one can directly provide me with the query I can run at the DBPedia end-point to get the infobox contents as a key-value pair of property-value. For example,
Update: I have changed the encoding to with open("../data/enwiki-20131202-pagelinks.sql", encoding="ISO-8859-1") ...and the program is now chewing through the file without complaint. Maybe the SQL dumps aren't UTF-8 and don't contain such literals, a
I'm trying to download image dump of wikipedia to put it in my local wikimedia. Where is the place to download this please? After googling, I don't see any simple answer. It will be also good if there is a way to only take thumbnail dump of these ima
I would like to make a mysql database with every wikipedia article id and it's category id (most general category). I saw that wikipedia gives an entire dump, and a few others like links between categories. Also I saw there is mediawiki but I can't m
I copy/pasted the Wikipedia page http://en.wikipedia.org/wiki/User:BocasThunder/Playa_Verde to another wiki as http://panamanana.com/wiki/index.php?title=Playa_Verde. But the settlement box on the upper right doesn't show on the second site. Why does
I am testing out BigQuery in Google API, and would like to run some queries on Wikipedia full text dump. Google sample data doesn't include full text dump (only revision history). There are are few sources for Wikipedia dump, such as this one on Amaz
Can't find if it is possible to download images from Wikipedia with Mediawiki API? --------------Solutions------------- No, it is not possible to get the images via the API. Images in a MediaWiki are stored just in folders, not in a database and are
I would like to print a wikipedia page as-is with the header and the sidebar. By default, when you print them, articles are styled specially for the print medium. I am making material for a programming course and I specifically DON'T want that. check
Within Infobox at wikipedia some attributes values are also inside curly braces {{}}.. Some time they have lins also.. I need values inside the braces, which is displayed on wikipedia web page. I read these are templates also.. Can anyone give me som
I know there are many questions on this topic but After 6 hours of try-this-and-try-that-tool, I still can't find a single tool that takes wikitext of the form '<center>Welcome to the world's foremost open content<br><big><big
i was using mediawiki opensearch api for wikipedia opensearch like http://en.wikipedia.org/w/api.php?action=opensearch&search=a&limit=10&namespace=0&format=jsonfm eg: for query string "a" which returns [ "a", [ "Animal", "Association
I need to design a program that finds certain four or five word phrases across the entire wikipedia collection of articles (yes, I know it's lot of pages, and I don't need answers calling me an idiot for doing this). I haven't programmed much stuff l
Is there a way we can get all the subcategories of some category? I mean to say if I want to store only pages of category Computer Science and all its subcategories. I hope I am clear enough. --------------Solutions------------- Subcategories in Medi
I want to index xml files of Wikipedia into Solr. But I am getting an error, it is unable to index. Solr has a specific format for xml files. I changed the schema.xml and data-config.xml files to suit the tags of the wikipedia files. Still it is unab
I have downloaded Wikipedia latest dump and parsed it a MySQL Database. Now i have Database table that contains only title and content.My requirement is to extract all biography contents from this table.So i want a dump file that have all biography t
I'm writing a simple query to find urls on commons.wikimedia.org but i can't seem to get around which specific sanitizing rules i should use to get the exact name files used there. Eg.: The flag of Ivory Coast is listed in french as Drapeau_de_la_Côt