Error in Jupyter notebook of Feature Store demo on hopsworks.ai

sebastian.frank · August 7, 2020, 12:13pm

Hi all,

I’ve signed up on the 30-days free Hopsworks demo trial on hopsworks.ai and obtain an error in the Feature Store demo/tour in the Jupyter notebook “FeatureStoreQuickStart.ipynb”. The last cell which launches the ML experiment produces the following error “… RecursionError: maximum recursion depth exceeded …”.

Any ideas how to fix this?

Thanks and BR,

Sebastian

Robin_Andersson · August 7, 2020, 1:39pm

Hi Sebastian!

Did you alter the code in any way? I can try to reproduce the issue.

Best,
Robin

Robin_Andersson · August 7, 2020, 2:48pm

Hey again! So I figured out the issue. The train_fn function in that example does not import tensorflow inside the function. Code that is executed outside the train_fn is run on the Spark Driver, so the import imported tensorflow on the driver and then it tried to serialize it over the network to the Spark Executors which failed. Hence the pickling issue.

I updated the notebook to get it working, you can find it here: https://github.com/robzor92/hops-examples/blob/pickling_issue/notebooks/featurestore/FeatureStoreQuickStart.ipynb

sebastian.frank · August 10, 2020, 9:06am

Thank you very much Robin for your help! I will check out your updated notebook asap.

Praveena · October 6, 2020, 1:11am

Hi All,

I’ve signed up on the 30-days free Hopsworks demo trial on hopsworks.ai and obtain an error in the Feature Store demo/tour in the Jupyter notebook “FeatureStoreQuickStart.ipynb”.
I’m getting an error while executing the 1st cell in jupyter notebook as below.
**YARN Diagnostics: **
[Tue Oct 06 00:57:38 +0000 2020] Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2432, vCores:1, gpus:0>; Queue Resource Limit for AM = <memory:0, vCores:0, gpus:0>; User AM Resource Limit of the queue = <memory:0, vCores:0, gpus:0>; Queue AM Resource Usage = <memory:2048, vCores:1, gpus:0>; .

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

Could you please help me out with this.

Thanks,
Praveena

Steffen · October 7, 2020, 6:35am

Hi,

Thank you for reporting. I identified an issue with our trial cluster and resolved it. Could you try again? let me know if you encounter any more issues!

Praveena · October 8, 2020, 1:48am

Thanks Steffen… It’s working now…

Also i have another issue… I’m trying to upload a sample csv file and create df on top of it. But i’m not sure where to upload that csv.

Secondly, when i ingested some data manually into df and tried creating featuregroup on top of it. It is throwing below error.

Could you please look into this.

Thanks,
Praveena S

Steffen · October 8, 2020, 6:24am

Upload the csv into the Resource folder like this: https://hopsworks.readthedocs.io/en/latest/user_guide/hopsworks/uploadData.html

Then you can read it like that in Python:

# read the data csv into a Spark DataFrame
from hops import hdfs
csv_path = hdfs.abs_path('') + 'Resources/' + 'my.csv'
raw_df = spark.read.csv(csv_path, header=True) # Use header False if needed

I tried to reproduce the the feature store write error but it worked fine for me. Could you run the FeatureStoreQuickStart.py up to the point when it writes to the feature store and confirm whether that works or not. In the meantime, I’ll look deeper into the logs.

UPDATE: I’m seeing some issues regarding your project in the logs. While I’m investigating how we can fix these, you could attempt to create a new project and run your code there.

Steffen · October 8, 2020, 8:30am

The issues with writing to the feature store should have been resolved.

Praveena · October 8, 2020, 10:25pm

Yes Steffen… Thank you…

Praveena · October 15, 2020, 10:16am

Hi Team,

I am trying to create a feature group in feature store from UI from existing hive table which i have created in hops database but couldn’t find any relevant documentation on how to create it. Can you please suggest me how to do it.

Also please let me know if we can register features / create feature groups from UI directly rather than creating feature groups from Jupyter?

Thanks,
Praveena S

Davit_Bzhalava · October 15, 2020, 12:31pm

Hi Praveena,

From hopsworks UI go to Feature Store --> Feature Group --> New. You can register new feature group with desired schema. If you want to create Feature Group from existing hive table then opt to on demand feature group tab where you can enter SQL query.

Let me know if this works.

Cheers,
Davit

t

Praveena · October 15, 2020, 11:10pm

Hi Davit,

I have created feature group from Jupyter using below commands and its successful.
** featurestore.create_featuregroup(
houses_for_sale_features_df,
“houses_for_sale_featuregroup”,
description=“aggregate features of houses for sale per area”,
descriptive_statistics=False,
feature_correlation=False,
feature_histograms=False,
cluster_analysis=False
) **

But now what i want to try is, I have a table in hive db i.e., feature_store_hops_praveena.employee_orc and want to create a feature group on top of this table from UI rather than Jupyter and the feature group should show all the columns present inside table as features.

For this I tried giving SQL query but it is failing with below error
2020-10-15 22:45:20,403 ERROR feature_store_hops_praveena,create_featuregroup_emp_features_1602801879784,265,application_1591705641534_0866 FailoverProxyHelper: java.io.IOException: Failed on local exception: java.io.IOException: Couldn’t set up IO streams: java.lang.IllegalStateException: Shutdown in progress, cannot add a shutdownHook; Host Details : local host is: “ip-10-0-0-175/10.0.0.175”; destination host is: “ip-10-0-0-175.us-east-2.compute.internal”:8020;
2020-10-15 22:45:20,404 WARN feature_store_hops_praveena,create_featuregroup_emp_features_1602801879784,265,application_1591705641534_0866 HopsRandomStickyFailoverProxyProvider: HopsRandomStickyFailoverProxyProvider (1148816695) no new namenodes were found

And also before running the job its asking to pass “Input Arguments” which i’m not sure what to paas to it.

Same is the case while creating OnDemand features… The job is failing…

Could you please help me on this.

Thanks,
Praveena S

Davit_Bzhalava · October 16, 2020, 9:47am

Hi,

Which version of hopsworks are you using? Do you get the same error both for on demand and cached feature groups? Also did you provide schema when created feature group.

/Davit

Praveena · October 18, 2020, 10:51pm

I’m using 1.3.0 version of Hopswork. Yes i’m getting error for both on-demand and cached feature groups. I did provided the schema.

Praveena · October 19, 2020, 2:08am

Hi Davit,

We encountered a new issue when trying to create training dataset from Jupyter.

An error was encountered:
Invalid status code ‘400’ from http://10.0.0.175:8998/sessions/249/statements/2 with error payload: {“msg”:“requirement failed: Session isn’t active.”}

Is there any issue from your end?

Thanks,
Praveena S

Davit_Bzhalava · October 19, 2020, 7:58am

Hi Praveena,

This says that your spark session is not active any more. Restart Jupyter kernel and it should will work. I will try to reproduce issue regarding on demand fg.

/Davit

Praveena · October 20, 2020, 3:58am

Hi Davit,

I have created a training dataset from UI and now i want to view the data in a file. How or from where i can download this file?

Thanks,
Praveena S

Steffen · October 20, 2020, 7:18am

You can use the dateset browser: https://hopsworks.readthedocs.io/en/latest/user_guide/hopsworks/dataSetBrowser.html

Praveena · November 24, 2020, 9:16pm

Hi All,

I’m trying to create an OnDemand feature group using hive table created in hopsworks feature store db but I couldn’t create it because of below error.

Can someone please guide me on this.