I am new to Hopsworks & I’m making the air quality prediction project. I even took some reference from the GitHub airquality example but I have some doubts.
I want feature store to contain:
- Historical data (all cleaned and preprocessed data as CSV -has 4 columns; date, temp, humidity, and pm 2.5)
- Current data (My API that works every day at 0900 GMT to extract last 2 days data, of temp and humidity)
- And we predict AQI for the next say 3 days and show it on streamlit UI.
- Next day, this observation moves to historical data, and the new data comes at 0900 GMT again and the process repeats.
I created feature group as follows:
df = pd.read_csv("data.csv")
def convert_date_to_unix(x):
dt_obj = datetime.datetime.strptime(str(x), '%Y-%m-%d')
dt_obj = int(dt_obj.timestamp() * 1000)
return dt_obj
df.date = df.date.apply(convert_date_to_unix)
import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()
aqi_fg = fs.get_or_create_feature_group(
name="aqi",
version=1,
primary_key=["date"],
description="AQI dataset",
online_enabled = True
)
aqi_fg.insert(df)
Now how do I do the second step of live data every day? And how do I transfer it to historical data every day so a new observation can come in live data?
I’m really confused. Appreciate the help. Thank you!