Analysis with T&H data

Simple stat and plots

Data and Packages

file <- read.csv("tempch4.txt")
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Take a glimpse

glimpse(file)
Rows: 24,394
Columns: 3
$ datetime <chr> "2016-02-03 11:00:00", "2016-02-03 12:00:00", "2016-02-03 13:…
$ temp_wew <dbl> 19.5, 20.5, 20.5, 20.0, 19.5, 18.5, 18.0, 17.5, 17.5, 17.0, 1…
$ rh       <dbl> 60.1, 57.6, 55.0, 53.9, 53.8, 55.0, 59.6, 61.5, 66.3, 66.9, 6…

Split the datetime column to filter through different time variable

data2 <- separate(file, col = datetime, into = c("Date", "Time"), sep = " ")
data <- separate(data2, col = Date, into = c("Year", "Month", "Day"), sep = "-")

Basic descriptive statistic

data |> 
summary()
     Year              Month               Day                Time          
 Length:24394       Length:24394       Length:24394       Length:24394      
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
    temp_wew           rh       
 Min.   :15.00   Min.   :30.40  
 1st Qu.:20.00   1st Qu.:70.50  
 Median :21.50   Median :85.00  
 Mean   :22.52   Mean   :81.07  
 3rd Qu.:25.50   3rd Qu.:94.20  
 Max.   :32.50   Max.   :99.90  
                 NA's   :1583   

Generating the variance and Standard deviation

data |> 
  summarise(Var_Temp = var(temp_wew), Var_Hum = var(rh, na.rm = TRUE), Std_Temp = sd(temp_wew), Std_Hum = sd(rh, na.rm = TRUE))
  Var_Temp  Var_Hum Std_Temp  Std_Hum
1 11.76777 253.5742 3.430418 15.92401
file["month"] <- data[["Month"]]

Distribution of the Temperature and Humidity

# Distribution of Temperature
Temperature <- file$temp_wew
hist(Temperature)

Temperature distribution is normally distributed, but there is appearance of almost two peaks.

# Distributio of Humidity
relative_humidity <- file$rh
hist(relative_humidity)

Humidity distribution is not normally distributed over the year. higher humidity is prevalent

Temperature Distribution indicating different month of the year

file |> 
  ggplot(aes(x= datetime, y = temp_wew, colour = month)) +
  geom_point()+
  geom_hline(yintercept = 23) +
  labs(
    title = "Daily Distribution of Temperature",
      x = "Day Time",
      y = "Average Temperature (°C)",
      colour = "Month"
  )

The Colors represents different months, starting from Feb to Dec. There is trend showing a bell distribution. However, some months (summer) appear to experience higher temperature above 23, while some are below.

For better visualization The monthly average temperature over the year, signify a bell shape.

  data |> 
  group_by(Month) |> 
  summarise(AverageT = mean(temp_wew), AverageH = mean(rh, na.rm = TRUE)) |> 
    ggplot(aes(x = Month,y =AverageT, group = 1)) +
  geom_line(colour = "blue") +
  geom_point() +
  labs(
    title = "Monthly Distribution of Temperature",
      x = "Month of the year",
      y = "Average Temperature (°C)"
  )

BoxPlot emphasis on the bell shape and the means of monthly temperature. There is very few data available for Dec

data |> 
  ggplot(aes(y = temp_wew, x = Month))+
  geom_boxplot()  +
  labs(
    title = "Monthly Distribution of Temperature",
      x = "Month of the year",
      y = "Average Temperature (°C)"
  )

Humidty Distribuion indicating different month of the year

file |> 
  ggplot(aes(x= datetime, y = rh, colour = month)) +
  geom_point() +
  labs(
    title = "Daily Distribution of Temperature",
      x = "Day Time",
      y = "Average Temperature (°C)",
      colour = "Month"
  )
Warning: Removed 1583 rows containing missing values or values outside the scale range
(`geom_point()`).

For Humidity, the pattern is different.. higher humidity towards the end of the year and higher variance at the mid months.

  data |> 
  group_by(Month) |> 
  summarise(AverageT = mean(temp_wew), AverageH = mean(rh, na.rm = TRUE)) |> 
    ggplot(aes(x = Month,y =AverageH, group = 1)) +
  geom_line(colour = "green") +
  geom_point()  +
  labs(
    title = "Monthly Distribution of Relative Humidity",
      x = "Month of the year",
      y = "Average Relative Humidity (%)"
  )

Plotting the mean of the distribution, it shows increasing trend of humidity.

data |> 
  ggplot(aes(y = rh, x = Month))+
  geom_boxplot()  +
  labs(
    title = "Monthly Distribution of Relative Humidity",
      x = "Month of the year",
      y = "Average Relative Humidity (%)"
  )
Warning: Removed 1583 rows containing non-finite outside the scale range
(`stat_boxplot()`).

The box_plot emphasis the increasing trends of the rh mean over the year.

Distribution during the day (Hourly)

The average hourly temperature of the day, shows rising and falling pattern Temperature decreasing from early hour(00:00) and to lowest temperature at 06:00 and beginning to rise from that hour to the highest temperature at region between 16-19:00 and after which it started to decrease again.

data |> 
  group_by(Time) |> 
  summarise(AverageT = mean(temp_wew), AverageH = mean(rh, na.rm = TRUE)) |> 
  ggplot(aes(x = Time,y =AverageT, group = 1)) +
  geom_line(colour = "blue") +
  geom_point() +
  labs(
    title = "Hourly Distribution of Temperature",
      x = "Day Time (Hour)",
      y = "Average Temperature (°C)"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Filtering through different months to see changes in the distribution, the pattern still follow a similar trend.

data |> 
  filter(Month == "07") |> 
  group_by(Time) |> 
  summarise(AverageT = mean(temp_wew), AverageH = mean(rh, na.rm = TRUE)) |> 
  ggplot(aes(x = Time,y =AverageT, group = 1)) +
  geom_line(colour = "blue") +
  geom_point() +
  labs(
    title = "Hourly Distribution of Temperature",
      x = "Day Time (Hour)",
      y = "Average Temperature (°C)"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Hour humidity districution of the day reveal a U shape. with the highest rh at around 6:00 and lowest around 16:00 of the day. This is opposite to temperature distributuion

data |> 
  group_by(Time) |> 
  summarise(AverageT = mean(temp_wew), AverageH = mean(rh, na.rm = TRUE)) |> 
  ggplot(aes(x = Time,y =AverageH, group = 1)) +
  geom_line(colour = "red") +
  geom_point()  +
  labs(
    title = "Hourly Distribution of Relative Humidity",
      x = "Day Time (Hour)",
      y = "Average Relative Humidity (%)"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Filtering through the months, The distribution is more stable compare to the temperature pattern of the year.

data |> 
  filter(Month == "09") |> 
  group_by(Time) |> 
  summarise(AverageT = mean(temp_wew), AverageH = mean(rh, na.rm = TRUE)) |> 
  ggplot(aes(x = Time,y =AverageH, group = 1)) +
  geom_line(colour = "red") +
  geom_point() +   
  labs(
    title = "Hourly Distribution of Relative Humidity",
      x = "Day Time (Hour)",
      y = "Average Relative Humidity (%)"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Daily Max and Min distribution

The above graph represent, daily max and minimum temperature.Both follow the same trends. Red points are the maximum temperature of the day and the blue points are the minimum

  data2 |> 
  group_by(Date) |> 
  summarise(Tmax = max(temp_wew), Tmin = min(temp_wew)) |> 
  ggplot() +
  geom_point(aes(x= Date,y = Tmax), color = "red") +
  geom_point(aes(x= Date,y = Tmin), color = "blue")  +
  labs(
    title = "Distribution of Maximum and Minimum Daily Temperature",
      x = "Day Time",
      y = "Temperature (°C)",
    caption = "Red points indicates the daily maximum temperature
     while the blue is the minimum temperature"
  )

The above graph represent, daily max and minimum humidity. No interesting pattern, months between the year experience varying rh, low and high, unlike the extreme months with higher rh. Red points are the maximum temperature of the day and the blue points are the minimum

data2 |> 
  group_by(Date) |> 
  summarise(Hmax = max(rh), Hmin = min(rh)) |> 
  ggplot() +
  geom_point(aes(x= Date,y = Hmax), color = "red") +
  geom_point(aes(x= Date,y = Hmin, group = 1), color = "blue")   +
  labs(
    title = "Distribution of Maximum and Minimum Daily Relative Humidity",
      x = "Day Time",
      y = "Relative humidity (%)",
    caption = "Red points indicates the daily maximum Relative humidity
     while the blue is the minimum humidity"
  )
Warning: Removed 71 rows containing missing values or values outside the scale range
(`geom_point()`).
Removed 71 rows containing missing values or values outside the scale range
(`geom_point()`).

Daily change (Hmax-Hmin) in temperature over the period.

data2 |> 
  group_by(Date) |> 
  summarise(ddT = max(temp_wew)-min(temp_wew), ddH = max(rh)-min(rh)) |> 
  ggplot(aes(x= Date, y = ddT, group = 1)) +
  geom_point()   +
  labs(
    title = "Distribution of Daily Change in Temperature (Max - Min)",
      x = "Day Time",
      y = "Change in Temperature (°C)",
  )

Daily change (Tmax-Tmin) in humidity over the period. The pattern is like a bell-shape, this explain the differences in the rh variation, with higher var situated at the middle months showing a higher daily rh changes, than the left and right rea/extreme months

data2 |> 
  group_by(Date) |> 
  summarise(ddT = max(temp_wew)-min(temp_wew), ddH = max(rh)-min(rh)) |> 
  ggplot(aes(x= Date, y = ddH, group = 1)) +
  geom_point()  +
  labs(
    title = "Distribution of Daily Change in Humidity (Max - Min)",
      x = "Day Time",
      y = "Change in Humidity (%)",
  )
Warning: Removed 71 rows containing missing values or values outside the scale range
(`geom_point()`).