ANLY 501 Protfolio Data Gathering

Fall 2020, Bo Yang

Downloaded Data

For this topic, a few super useful datasets for both YouTube and Bilibili were found from the below websites:
https://www.kaggle.com/datasnaek/youtube-new
https://www.dropbox.com/s/zra8ct97jxqv9sb/bilibili.zip?dl=0
https://sites.google.com/view/bilibilidataset
These two websites already gave enough data to start this project. It contains data from two years ago and also the most recent data. Below are two pictures that could best describe the formation of these two datasets.



Data Using API

YouTube API was also used to gather specific data if needed. For example, during this project, if someday, certain type of video got the most views, gathering a list of the video id and search its details through API will be very easy in both python and R. Below is one example of getting data through API.
The website that I got the API is : https://console.developers.google.com/apis/dashboard?project=anly501-portfolio&supportedpurview=project

This picture illustartes my csv file if I use python to gather my data.
This picture illustrates my csv file if I use R to gather the data.

Below are the links of my complete datasets.
https://bellayang.georgetown.domains/USvideos.csv
https://bellayang.georgetown.domains/videos.txt

You could find all my codes below.
https://bellayang.georgetown.domains/youtube.py
https://bellayang.georgetown.domains/Assignment1.Rmd