Last week I spent some time working on WikipediaHI activity for Sugar Desktop Environment. I must say it is one of the awesome activities I have come across. The best part is that it can serve you with data in offline mode. That is even if don’t have internet connection which is otherwise required to access Wikipedia online, then also your WikipediaHI activity will serve your purpose.
There are lot many developers and contributors who are working in collaborative form on such awesome stuff who continuously inspire you to take up new things and create something that can be used by others in the world. Sugar developers and contributors are epitome of such group.
I came across few of such developers, Anish Mangal and Gonzalo Odiard, two of them whose contributions are significant for Sugar. I took up the task of creating WikipediaHI using Wikipedia dump for Hindi available for free. I followed the steps specified on this page[ hosted by Gonzalo] for creating Wikipedia activity in your own language.
I will quickly explain the steps I took to create WikipediaHI:
1) Downloaded the Wikipedia dump file for Hindi:
NOTE: [ Make sure you pick the valid latest file from here : http://dumps.wikimedia.org/hiwiki/ this location will show you listing as per dates. Pick the latest dump and proceed further.]
and downloaded WikipediaBase from this link
2) Created “hi” directory for HINDI under WikipediaBase directory and moved the downloaded dump to this folder.
3) Extracted contents of this file using:
bzip2 -d hiwiki-20121225-pages-articles.xml.bz2
4) Processed the dump using page parser:
The result of this operation will generate these files:
5) Then you can include selective articles or all articles from this dump to your activity by using this command:
* Make sure you have favorites.txt and blacklist.txt filled with appropriate keywords.
Now if you want to include all articles use this command:
6) Then proceed to create the index for these articles:
7) In order to test the index created in previous step you can use this command:
8) Next step is to expand the templates of articles :
9) Go back to hi directory and re-create the index :
mv hiwiki-20121225-pages-articles.xml.processed_expanded hiwiki-20121225-pages-articles.xml.processed
10) Download the images for the articles you selected:
if you want to download the images for pages you selected in previous step:
11) Create files specific to language:
(a)activity/activity.info.lang : activity info file for you language activity
(b)activity/activity-wikipedia-lang.svg : activity icon for your language
(c)activity_lang.py : activity file for your language
(d)static/about_lang.html : about page for wikipedia in your language.
(e)static/index_lang.html : index page for wikipedia in your language. This is the page displayed when activity is launched. So its important for you to know the articles included in the search.db ( generated when index is created) for you to create the index page.
12) Create the XO file for wikipedia in your language:
I went through the search.db file to identify the articles present in it and create the index page accordingly.
This gave me an idea to write some script that can generate index page(part or whole) to be used as home page for activity using search.db[ Stay tuned for next blog on this idea]
Here you go.. you can see WikipediaHI
On launching this, you can see the index page listing the articles you can view offline using WikipediaHI
If you want to play with WikipediaHI, you can download it : WikipediaHI-35.xo
I must thank Gonzalo for his amazing help and guidance in getting this done. I have to mention here that Wikipedia
changed its XML format in their dumps which resulted in error when I was creating the index. I took Gonzalo’s help to get it resolved.
Thanks to Anish, who motivated me to pick this up and guided me to complete it.
Thanks guys !! 😀
About the Author: Kartik is a Graduate Student at Carnegie Mellon University specializing in Mobile computing, Machine Learning, Natural Language Processing. Worked at LinkedIn before going to CMU.To know more about me : http://linkedin.com/in/kartikperisetla
If you also wish to showcase your blog here,please see GBlog for guest blog writing on GeeksforGeeks
- 6 Best Practices to Perform a Cybersecurity Audit
- Tips to Make a Career as a Game Developer
- 8 Most Important Steps To Follow in System Design Round of Interviews
- What's Different In GSoC 2021 - Eligibility, Timeline, Stipend
- Previous Solved CS Papers Year wise - GATE / UGC / ISRO
- Importance of Sudo GATE CS 2021 Test Series
- 14 Most Common Network Protocols And Their Vulnerabilities
- Does Company Culture Matter in a Software Engineer Job?
- Hyperloop Technology
- Introduction to Android Jetpack
- Why New Developers Should Work in a Startup?
- Future of Cybersecurity
- 7 Best R Packages for Machine Learning
- System Design of Uber App - Uber System Architecture