How to: download most popular Android apps!

By | March 1, 2017

Downloading the top apps of Android can be useful for many tasks. For example, one might want to automate the installation of the most trending apps, or even for researchers who want to study app internals or store metadata (e.g., number of downloads).

A few existing projects exist for this purpose:

  • egirault/googleplay-api : provides a python api to search, browse and download apps from Google Play (currently not maintained but working for our purposes)
  • facundoolano/google-play-scraper :a Node.js scraper to get data from Google Play, can expose the APIs via REST and is recent.

While the first would suffice for our demo, the second provides a very interesting and more up-to-date set of APIs. So let’s start scripting!

Since the Node.js scraper exposes its API on the port 3000 lets start by checking if the port is currently open:

if [ $(netstat -lp 2> /dev/null | grep 3000 | wc -l) -gt 0 ]
then
    echo "Port 3000 already in use by $(netstat -lp 2> /dev/null | grep 3000 | awk '{print $7}')"
    exit
fi

Now lets create a new folder to place our intermediate and final results and start installing the dependencies:

mkdir top-apps 2> /dev/null
mkdir top-apps/apps 2> /dev/null
cd top-apps

read -r -p "Install dependencies? [y/N] " response
if [[ "$response" =~ ^([yY][eE][sS]|[yY])+$ ]]
then

    echo "Installing nodejs, jq and git..."
    sudo apt-get install nodejs jq git # jq is used for json processing

    echo "Installing protobuf..."
    sudo pip install protobuf

    echo "Installing nodejs module to scrap app data from google play.."
    npm install google-play-scraper

    echo "Cloning web-api.."
    git clone https://github.com/facundoolano/google-play-api.git web-api

    echo "Cloning python api... (allows downloads)"
    git clone https://github.com/egirault/googleplay-api py-api

    echo "Installing the web-api..."
    cd web-api
    npm install
    cd ..
fi

Now, since the googleplay-api requires your credentials to download apps, lets read these from the console:

if [ -z "$ANDROID_ID" ]; then read -r -p "ANDROID ID: " ANDROID_ID; fi
if [ -z "$GOOGLE_LOGIN" ]; then read -r -p "GOOGLE LOGIN: " GOOGLE_LOGIN; fi
unset GOOGLE_PASSWORD
set +a #avoid exporting
read -rsp "GOOGLE PASSWORD: " GOOGLE_PASSWORD
echo ""

Note that while GOOGLE_LOGIN and GOOGLE_PASSWORD can be from any google account, the ANDROID_ID must be retrieved from a device associated to that account. There are two ways to retrieve this ID:

  • Connect your mobile device to your pc and type in the terminal: adb shell settings get secure android_id
  • Using the following app: https://play.google.com/store/apps/details?id=com.evozi.deviceid&hl=en

We now need to generate the configuration file required by the googleplay-api. Since we do not want to have a configuration file laying around with our password, lets retrieve it via standard input:

echo "import sys" > py-api/config.py
echo "SEPARATOR = \";\"" >> py-api/config.py
echo "LANG = \"en_US\"" >> py-api/config.py
echo "ANDROID_ID = \"$ANDROID_ID\"" >> py-api/config.py
echo "GOOGLE_LOGIN = \"$GOOGLE_LOGIN\"" >> py-api/config.py
echo "GOOGLE_PASSWORD = sys.stdin" >> py-api/config.py
echo "AUTH_TOKEN = None" >> py-api/config.py

With everything setup we will start the scraper web server, wait for it to boot and retrieve information regarding the most popular apps:

cd web-api
npm start & # starts the Node.js web server and respective REST API
WEB_API_PID=$! #stores the process id to kill later
echo "Web-api pid $WEB_API_PID"
cd ..

while [ $(netstat -lp 2> /dev/null | grep 3000 | wc -l) -lt 1 ]
do
   echo "Waiting for web api boot..."
   sleep 1
done

echo "Downloading top apps json..."
wget localhost:3000/api/apps -O top.json

If later you check this top.json file you will see that it actually contains some information regarding the Google Play url and app details.
Now lets process this json file to retrieve the app (apk) names and download them via googleplay-api:

echo "Downloading apps from json..."
cat top.json | jq -r '.results | .[] | .appId' | xargs -I {} sh -c "printf \"%s\n\" \"$GOOGLE_PASSWORD\" | python py-api/download.py {} apps/{}.apk" # printf output does not show on ps

Finally kill the running web API so it stops consuming your resources:

echo "Killing web api..."
kill $WEB_API_PID
echo "Done"

In top-apps/apps you should have the top 50 most popular apps of the Google Play market! Enjoy! šŸ˜‰

You can find the full snippet hereeeee!

Top apps across categories

Similarly one can get the top apps across categories:

cd top-apps/web-api
npm start & # starts the Node.js web server and respective REST API
WEB_API_PID=$! #stores the process id to kill later
echo "Web-api pid $WEB_API_PID"
cd ../..

while [ $(netstat -lp 2> /dev/null | awk '{if($4 == "[::]:3000"){print $0}}' | wc -l) -lt 1 ]
do
   echo "Waiting for web api boot..."
   sleep 1
done

cat categories.txt | xargs -I{} bash -c 'wget localhost:3000/api/apps/?collection=topselling_free\&category={}\&country=en -O {}.json'

echo "" > apps.txt

echo "Downloading apps from json..."
cat categories.txt | xargs -I{} bash -c "cat {}.json | jq -r '.results | .[] | .appId' " >> apps.txt # printf output does not show on ps

echo "Killing web api..."
echo $WEB_API_PID

kill $WEB_API_PID
echo "Done"

Just make sure to create the “categories.txt” file beforehand. These were taken from the facundoolano repo, just create the file with the following content:

ANDROID_WEAR
ART_AND_DESIGN
AUTO_AND_VEHICLES
BEAUTY
BOOKS_AND_REFERENCE
BUSINESS
COMICS
COMMUNICATION
DATING
EDUCATION
ENTERTAINMENT
EVENTS
FINANCE
FOOD_AND_DRINK
HEALTH_AND_FITNESS
HOUSE_AND_HOME
LIBRARIES_AND_DEMO
LIFESTYLE
MAPS_AND_NAVIGATION
MEDICAL
MUSIC_AND_AUDIO
NEWS_AND_MAGAZINES
PARENTING
PERSONALIZATION
PHOTOGRAPHY
PRODUCTIVITY
SHOPPING
SOCIAL
SPORTS
TOOLS
TRAVEL_AND_LOCAL
VIDEO_PLAYERS
WEATHER
APP_WIDGETS
GAME
GAME_ACTION
GAME_ADVENTURE
GAME_ARCADE
GAME_BOARD
GAME_CARD
GAME_CASINO
GAME_CASUAL
GAME_EDUCATIONAL
GAME_MUSIC
GAME_PUZZLE
GAME_RACING
GAME_ROLE_PLAYING
GAME_SIMULATION
GAME_SPORTS
GAME_STRATEGY
GAME_TRIVIA
GAME_WORD
FAMILY
FAMILY_ACTION
FAMILY_BRAINGAMES
FAMILY_CREATE
FAMILY_EDUCATION
FAMILY_MUSICVIDEO
FAMILY_PRETEND

After executing you should have a file named “apps.txt” with the list of the top apks (likely over 2.5K apps) that you can use the girault tool to download.

Common Problems

The google play repositories used are far from perfect, for example, egirault’s repo is no longer maintained but it works for downloading apps.

If you get an error like “Too many values to unpack”, I have posted the fix in: https://github.com/egirault/googleplay-api/issues/73

Authentication for downloading can occasionally fail as well but should work if you retry.

The port checking in this post is incomplete, the “grep 3000” should be “awk ‘{if($4 == “[::]:3000″){print $0}}'” to prevent ports like 23000 from being considered.

Leave a Reply