Creating feature group - Server Error

Hi all,

i am applying the pdf.ai workshop by Jim Dowling.
I have created a feature group using the following script:

Get or create the ‘cqa_fg’ feature group

cqa_fg = fs.get_or_create_feature_group(
name=“cqa_fg”,
version=1,
description=‘Context-Question-Response Data’,
primary_key=[‘record_id’],
)

cqa_fg.insert(df_expanded)

On hoposworks platform, i can see that it was executed with success and the state is finished.
However, when i go to data preview, there is no data and i get an error message:

Any ideas of why i might be getting that?

THank you.

I have retraced the error and it seems to be caused by the embeddings.
The error is Unable to map type ArrayType(DoubleType,true). The embeddings column is of type array(double). Any suggestions? i am a bit stuck now!

Hi,

Can you provide a full stacktrace?

Hi Kenneth,
Can you tell me how to do that please? do you mean provide the full error message in the Logs?

Yes please paste the full error message. Also, what is the hsfs version? Are you running in python or pyspark?

Container: container_e02_1713182081581_0375_01_000002 on ip-172-16-4-15.us-east-2.compute.internal_9000_1713246812430

Log Type: prelaunch.out
Log Length: 100
Log Contents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container

Log Type: stdout
Log Length: 212
Log Contents:

WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

Log Type: stdout.txt
Log Length: 0
Container: container_e02_1713182081581_0375_01_000001 on ip-172-16-4-215.us-east-2.compute.internal_9000_1713246813417

Log Type: prelaunch.out
Log Length: 100
Log Contents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container

Log Type: stdout
Log Length: 6154
Log Contents:
2024-04-16 05:52:34,985 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 HopsworksInternalClient: Trust store path: t_certificate
2024-04-16 05:52:35,049 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 HopsworksConnectionBase: Getting information for project name: llmPdf
2024-04-16 05:52:35,064 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 ProjectApi: Sending metadata request: /hopsworks-api/api/project/getProjectInfo/llmPdf
2024-04-16 05:52:35,214 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 FeatureStoreApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/llmpdf_featurestore
2024-04-16 05:52:35,286 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 FeatureGroupApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/605255/featuregroups/documents_fg?version=1
2024-04-16 05:52:35,497 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 MainClass: Hsfs utils write options: {wait_for_job=false, initialCheckPointString=llmPdf_onlinefs,0:10321,1:9235,2:12391,3:10059,4:10240,5:11613,6:10564,7:10583,8:10219,9:11159}
2024-04-16 05:52:43,811 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 StorageConnectorApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/605255/storageconnectors/kafka_connector/byok?external=false
2024-04-16 05:52:43,851 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 KafkaApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/605255/kafka/subjects/documents_fg_1/versions/latest
2024-04-16 05:52:46,836 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 DeltaStreamerKafkaSource: About to read 364 from Kafka for topic: llmPdf_onlinefs from lastCheckpointStr Option{val=llmPdf_onlinefs,0:10321,1:9235,2:12391,3:10059,4:10240,5:11613,6:10564,7:10583,8:10219,9:11159} Offset range: llmPdf_onlinefs,0:10354,1:9280,2:12425,3:10091,4:10269,5:11648,6:10596,7:10617,8:10273,9:11195
2024-04-16 05:52:53,444 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 log: Logging initialized @22098ms to io.hops.hudi.org.eclipse.jetty.util.log.Slf4jLog
2024-04-16 05:52:53,558 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin:
__ __ _ __ __
/ /____ _ _ __ ____ _ / /()___ / // /
__ / // __ /| | / // __ // // // __ \ / // /_
/ // // // / | |/ // // // // // / / / /__ /
_
/ _,/ |/ _,/////// // //

      https://javalin.io/documentation

2024-04-16 05:52:53,559 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin: Starting Javalin …
2024-04-16 05:52:53,564 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin: You are running Javalin 4.6.7 (released October 24, 2022. Your Javalin version is 539 days old. Consider checking for a newer version.).
2024-04-16 05:52:53,635 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Server: jetty-9.4.48.v20220622; built: 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
2024-04-16 05:52:53,704 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Server: Started @22358ms
2024-04-16 05:52:53,704 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin: Listening on http://localhost:32965/
2024-04-16 05:52:53,704 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin: Javalin started in 145ms \o/

WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

2024-04-16 05:53:01,331 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 MetricRegistries: Loaded MetricRegistries class io.hops.hudi.org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
2024-04-16 05:53:19,934 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin: Stopping Javalin …
2024-04-16 05:53:19,945 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 Javalin: Javalin has stopped
2024-04-16 05:53:20,042 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 FeatureGroupApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/605255/featuregroups/688486/commits
2024-04-16 05:53:20,082 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 QueryConstructorApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/query
2024-04-16 05:53:20,248 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 QueryBase: Executing query: SELECT fg0.file_name file_name, fg0.file_link file_link, fg0.page_number page_number, fg0.paragraph paragraph, fg0.text text, fg0.embeddings embeddings, fg0.context_id context_id FROM llmpdf_featurestore.documents_fg_1 fg0
2024-04-16 05:53:20,704 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 SessionState: Hive Session ID = a7629725-5c90-4d8c-948d-1846cc91c03c
Unable to map type ArrayType(DoubleType,true)
2024-04-16 05:53:26,042 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 StatisticsApi: Sending metadata request: /hopsworks-api/api/project/609432/featurestores/605255/featuregroups/688486/statistics

Log Type: stdout.txt
Log Length: 0
Container: container_e02_1713182081581_0375_01_000003 on ip-172-16-4-215.us-east-2.compute.internal_9000_1713246813417

Log Type: prelaunch.out
Log Length: 100
Log Contents:
Setting up env variables
Setting up job resources
Copying debugging information
Launching container

Log Type: stdout
Log Length: 447
Log Contents:

WARNING: Unable to attach Serviceability Agent. You can try again with escalated privileges. Two options: a) use -Djol.tryWithSudo=true to try with sudo; b) echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope

2024-04-16 05:53:14,284 INFO llmpdf,documents_fg_1_offline_fg_materialization,693318,application_1713182081581_0375 MetricRegistries: Loaded MetricRegistries class io.hops.hudi.org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl

Log Type: stdout.txt
Log Length: 0

{
“_index”: “onlinefs_609432-2024.04.16”,
“_type”: “_doc”,
“_id”: “iHF25Y4BnZB3_wI5xJdh”,
“_score”: 1,
“_source”: {
@timestamp”: “2024-04-16T05:52:28.919Z”,
@version”: “1”,
“tags”: [
“beats_input_codec_plain_applied”,
“project_log”
],
“log_message”: “Could not commit the row for feature group”,
“service”: “onlinefs”,
“logdate”: “2024-04-16T05:52:22.053Z”,
“host”: “ip-172-16-4-33”,
“log_arguments”: {
“feature_group_id”: “688486”,
“table_name”: “llmpdf_documents_fg_1”,
“project_id”: “609432”,
“subject_id”: “678914”,
“thread_name”: “pool-3-thread-5”
},
“logger_name”: “com.logicalclocks.onlinefs.rondb.Committer”,
“priority”: “ERROR”
},
“fields”: {
@timestamp”: [
“2024-04-16T05:52:28.919Z”
],
“logdate”: [
“2024-04-16T05:52:22.053Z”
]
}
}

I am using Version: 3.7.1rc0
and i use python.

Thank you for your help kenneth!

Currently, if your feature group has embedding, you won’t be able to preview it in the UI. To check if the data is available, you can try cqa_fg.show(10, online=True).

Oh i see. thanks kenneth.

Unfortunately, that doesn’t work and generates an error. It is very long and i cannot put it all but here is part of it. I would very much appreciate your help on this:
“name”: “ProgrammingError”,
“message”: “(pymysql.err.ProgrammingError) (1146, "Table ‘llmpdf.cqa_fg_1’ doesn’t exist")\n[SQL: SELECT fg0.record_id record_id, fg0.questions questions, fg0.answers answers, fg0.context context\nFROM llmpdf.cqa_fg_1 fg0]\n(Background on this error at: Error Messages — SQLAlchemy 1.4 Documentation)”,

I just used this and it seems to work:
feature_view = fs.get_feature_view(
name=‘cqa’,
version=1,
)

good that it works!
which feature group you used to create the feature view? try to run .show(10, online=True) using that feature group.