Split the datetime column to filter through different time variable
data2 <-separate(file, col = datetime, into =c("Date", "Time"), sep =" ")data <-separate(data2, col = Date, into =c("Year", "Month", "Day"), sep ="-")
Basic descriptive statistic
data |>summary()
Year Month Day Time
Length:24394 Length:24394 Length:24394 Length:24394
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
temp_wew rh
Min. :15.00 Min. :30.40
1st Qu.:20.00 1st Qu.:70.50
Median :21.50 Median :85.00
Mean :22.52 Mean :81.07
3rd Qu.:25.50 3rd Qu.:94.20
Max. :32.50 Max. :99.90
NA's :1583
# Distribution of TemperatureTemperature <- file$temp_wewhist(Temperature)
Temperature distribution is normally distributed, but there is appearance of almost two peaks.
# Distributio of Humidityrelative_humidity <- file$rhhist(relative_humidity)
Humidity distribution is not normally distributed over the year. higher humidity is prevalent
Temperature Distribution indicating different month of the year
file |>ggplot(aes(x= datetime, y = temp_wew, colour = month)) +geom_point()+geom_hline(yintercept =23) +labs(title ="Daily Distribution of Temperature",x ="Day Time",y ="Average Temperature (°C)",colour ="Month" )
The Colors represents different months, starting from Feb to Dec. There is trend showing a bell distribution. However, some months (summer) appear to experience higher temperature above 23, while some are below.
For better visualization The monthly average temperature over the year, signify a bell shape.
data |>group_by(Month) |>summarise(AverageT =mean(temp_wew), AverageH =mean(rh, na.rm =TRUE)) |>ggplot(aes(x = Month,y =AverageT, group =1)) +geom_line(colour ="blue") +geom_point() +labs(title ="Monthly Distribution of Temperature",x ="Month of the year",y ="Average Temperature (°C)" )
BoxPlot emphasis on the bell shape and the means of monthly temperature. There is very few data available for Dec
data |>ggplot(aes(y = temp_wew, x = Month))+geom_boxplot() +labs(title ="Monthly Distribution of Temperature",x ="Month of the year",y ="Average Temperature (°C)" )
Humidty Distribuion indicating different month of the year
file |>ggplot(aes(x= datetime, y = rh, colour = month)) +geom_point() +labs(title ="Daily Distribution of Temperature",x ="Day Time",y ="Average Temperature (°C)",colour ="Month" )
Warning: Removed 1583 rows containing missing values or values outside the scale range
(`geom_point()`).
For Humidity, the pattern is different.. higher humidity towards the end of the year and higher variance at the mid months.
data |>group_by(Month) |>summarise(AverageT =mean(temp_wew), AverageH =mean(rh, na.rm =TRUE)) |>ggplot(aes(x = Month,y =AverageH, group =1)) +geom_line(colour ="green") +geom_point() +labs(title ="Monthly Distribution of Relative Humidity",x ="Month of the year",y ="Average Relative Humidity (%)" )
Plotting the mean of the distribution, it shows increasing trend of humidity.
data |>ggplot(aes(y = rh, x = Month))+geom_boxplot() +labs(title ="Monthly Distribution of Relative Humidity",x ="Month of the year",y ="Average Relative Humidity (%)" )
Warning: Removed 1583 rows containing non-finite outside the scale range
(`stat_boxplot()`).
The box_plot emphasis the increasing trends of the rh mean over the year.
Distribution during the day (Hourly)
The average hourly temperature of the day, shows rising and falling pattern Temperature decreasing from early hour(00:00) and to lowest temperature at 06:00 and beginning to rise from that hour to the highest temperature at region between 16-19:00 and after which it started to decrease again.
data |>group_by(Time) |>summarise(AverageT =mean(temp_wew), AverageH =mean(rh, na.rm =TRUE)) |>ggplot(aes(x = Time,y =AverageT, group =1)) +geom_line(colour ="blue") +geom_point() +labs(title ="Hourly Distribution of Temperature",x ="Day Time (Hour)",y ="Average Temperature (°C)" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Filtering through different months to see changes in the distribution, the pattern still follow a similar trend.
data |>filter(Month =="07") |>group_by(Time) |>summarise(AverageT =mean(temp_wew), AverageH =mean(rh, na.rm =TRUE)) |>ggplot(aes(x = Time,y =AverageT, group =1)) +geom_line(colour ="blue") +geom_point() +labs(title ="Hourly Distribution of Temperature",x ="Day Time (Hour)",y ="Average Temperature (°C)" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Hour humidity districution of the day reveal a U shape. with the highest rh at around 6:00 and lowest around 16:00 of the day. This is opposite to temperature distributuion
data |>group_by(Time) |>summarise(AverageT =mean(temp_wew), AverageH =mean(rh, na.rm =TRUE)) |>ggplot(aes(x = Time,y =AverageH, group =1)) +geom_line(colour ="red") +geom_point() +labs(title ="Hourly Distribution of Relative Humidity",x ="Day Time (Hour)",y ="Average Relative Humidity (%)" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Filtering through the months, The distribution is more stable compare to the temperature pattern of the year.
data |>filter(Month =="09") |>group_by(Time) |>summarise(AverageT =mean(temp_wew), AverageH =mean(rh, na.rm =TRUE)) |>ggplot(aes(x = Time,y =AverageH, group =1)) +geom_line(colour ="red") +geom_point() +labs(title ="Hourly Distribution of Relative Humidity",x ="Day Time (Hour)",y ="Average Relative Humidity (%)" ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Daily Max and Min distribution
The above graph represent, daily max and minimum temperature.Both follow the same trends. Red points are the maximum temperature of the day and the blue points are the minimum
data2 |>group_by(Date) |>summarise(Tmax =max(temp_wew), Tmin =min(temp_wew)) |>ggplot() +geom_point(aes(x= Date,y = Tmax), color ="red") +geom_point(aes(x= Date,y = Tmin), color ="blue") +labs(title ="Distribution of Maximum and Minimum Daily Temperature",x ="Day Time",y ="Temperature (°C)",caption ="Red points indicates the daily maximum temperature while the blue is the minimum temperature" )
The above graph represent, daily max and minimum humidity. No interesting pattern, months between the year experience varying rh, low and high, unlike the extreme months with higher rh. Red points are the maximum temperature of the day and the blue points are the minimum
data2 |>group_by(Date) |>summarise(Hmax =max(rh), Hmin =min(rh)) |>ggplot() +geom_point(aes(x= Date,y = Hmax), color ="red") +geom_point(aes(x= Date,y = Hmin, group =1), color ="blue") +labs(title ="Distribution of Maximum and Minimum Daily Relative Humidity",x ="Day Time",y ="Relative humidity (%)",caption ="Red points indicates the daily maximum Relative humidity while the blue is the minimum humidity" )
Warning: Removed 71 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 71 rows containing missing values or values outside the scale range
(`geom_point()`).
Daily change (Hmax-Hmin) in temperature over the period.
data2 |>group_by(Date) |>summarise(ddT =max(temp_wew)-min(temp_wew), ddH =max(rh)-min(rh)) |>ggplot(aes(x= Date, y = ddT, group =1)) +geom_point() +labs(title ="Distribution of Daily Change in Temperature (Max - Min)",x ="Day Time",y ="Change in Temperature (°C)", )
Daily change (Tmax-Tmin) in humidity over the period. The pattern is like a bell-shape, this explain the differences in the rh variation, with higher var situated at the middle months showing a higher daily rh changes, than the left and right rea/extreme months
data2 |>group_by(Date) |>summarise(ddT =max(temp_wew)-min(temp_wew), ddH =max(rh)-min(rh)) |>ggplot(aes(x= Date, y = ddH, group =1)) +geom_point() +labs(title ="Distribution of Daily Change in Humidity (Max - Min)",x ="Day Time",y ="Change in Humidity (%)", )
Warning: Removed 71 rows containing missing values or values outside the scale range
(`geom_point()`).