“StudentDaily” is an iOS news app. I built this to solve public account problems in Wechat users.
Wechat is a social network app that connects people. It has almost 0.6 billion users all around the world. Besides traditional text, voice or video chat, Wechat provides an Official Public Account Platform that allows personal users or media to post news there for subscription. The problem is that there is no way for readers to read all posts at one time from different public accounts. The readers will have to visit an account and read all news, and switch to another account. Also, among millions of the public accounts, it is difficult to find those with values. Moreover, Wechat has not open APIs for search engines, and there are no stable URLs for posts of these accounts, causing tremendous troubles in scrapping.
I built a crawling script that is able to crawl data from Sogou Wechat search engine, which is the only partner with Wechat now, and analysis the XML and JSON data, and use semantic analysis to retrieve clear text from the data, and then export using self-built APIs. In addition, because Sogou will ban IP who made too many requests, I also use proxy to handle the crawling. The crawling and processing tools I used include PhantomJS, Scrapy, and BeautifulSoup 4.
I also registered “studentdaily.org” domain and built the APIs for app. The visitor can register an account and subscribe their favorite accounts in the app and read them seamlessly everyday.
This app can be downloaded from App Store.
The source code can be found here.