ANLY 560 Time Series Project EDA

Spring 2022, Bo Yang

by151@georgetown.edu

Plotting Time Series







This graph illustrates the overall time series of the maximum and minimum weather temperature for ohio between 2019 and 2021. The specific time range is because I would also like to find out if COVID has changed weather temperature. From the plot, we could see that there is a clear yearly seasonality which is reasonable since the temperature in summer is always higher than temperature in winter.







Lag Plots





The two lag plots seperately illustrate maximum temperature and minimum temperature. The purpose of a lag plot is to show relationship between the lags. From both plot, we could see that there is a clear linear correlation , especially in lag 1. It's also reasonable since the weather isn't likely going to drop many degrees in a short period of time. All temperature drop slowly or increase step by step.

Moreover, we could tell from two lag plots, the relationship for both max and min temperature have similar pattern and similar results. Therefore, from now on, I will only focus on maximum temperature to do future exploration.






Decomposition









Based on the decomposition graph, there wasn't a clear trend since weather is always up and down, but there is a clear seasonallity. Moreover, this is a additive model. We will know more after the acf and pacf plots.











ACF and PACF Plots



While we are looking at the time plot of the data, the ACF plot is also useful for identifying non-stationary time series. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Also, for non-stationary data, the value of r1 is often large and positive. Therefore, we could tell that our time series doesn't have stationarity.


Augmented Dickey-Fuller Test





The Augumented Dickey-Fuller test of both maximum temperature and minimum temperature are shown on the left. From the result, we could see that the p-value are all greater than 0.05 and they are both very high. Therefore, we fail to reject the null hypothesis. The time series is not stationary.











Detrend and Log Transformation





There are a few ways that could make our non-stationary time series into stationary. And two methods I did were detrend and log transformation. From the plots on the left, I could see that for differencing model, the value is relatively constant. But for log transformation, the value goes from large to small and then large again.

















Here is a overall comparison plot between the original time series and the two new method transformation. We could already tell that there is a significant difference after differencing. However, ACF, PACF and ADF test/plots still need to be run to make sure that they are stationary.













We could see from both multifigure acf and pacf plots that both first-order differencing and log transformation are showing clear stationarity.





The results of the ADF test for our temperature time series after differencing and log-transformation to remove the trend and heteroskedasticity are shown. We could see that both p-values are all lower than 0.01 which is significant for us to reject the null hypothesis. Both transformations are now stationary. Moreover, the negative Dickey-Fuller number is also a strong evidence to reject the null hypothesis.













Moving Average Smoothing





In the moving average smoothing plot on the left side, there are three windows that I chose: 7, 31, and 365 which represents a week, a month, and a year. Since my data is weather/temperature related, I feel like choosing these types of window could give us a more straightforward result.

The 7-MA plot that representing weekly smoothing of temperature over time which basically has the same flow as the original data, there didn't seems like a clear smooth over the data, therefore, we might want to see the result from a bigger window. The 31-MA plot that representing monthly smoothing is showing all the ups and downs of the temperature in the given time period. And the 365-MA plot which representing yearly smoothing is basically a straight line and it cannot show any pattern of the weather, but it is showing average temperature over time.






Below are the link of all the R code that produce these graphs.
EDA code