Hi @therapon,
Currently we do not have integrations for third party Data Catalogues. We have an internal CDC (Change Data Capture) mechanism that is pluggable, so you could export/import data from other Data Catalogues/Metastores. However this mechanism is not currently exposed.
We have all the mechanisms you mentioned already and as an addition our Metastore is integrated with our File System providing eventual strong consistency. That is, when files on the file system are created/deleted, metadata is automatically created/deleted as well.
Data Discovery:
We provide full text search based on title, description, custom metadata through elasticsearch (soon to switch to opensearch). Our platform is multi-tenant, and this reflects in the search mechanism as well. You can decide if your metadata and or data should be discoverable/accessible. You can thus make metadata public for search, but the data remains private and will be available on approval from data owner.
https://hopsworks.readthedocs.io/en/latest/user_guide/hopsworks/search.html?highlight=search
Custom metadata is based on keywords and schematised tags.
https://hopsworks.readthedocs.io/en/latest/user_guide/hopsworks/tags.html?highlight=schematized%20tags
Data curation is enabled through spark data engineering as well as data validation rules:
Data Governance.
Lineage is provided for the main machine learning abtractions: feature groups (on demand/cached), training datasets, experiments, models. You can thus follow which user/application created each of these and what inputs did it use.
https://hopsworks.readthedocs.io/en/latest/hopsml/provenance.html?highlight=provenance
Access Control is enabled by our RBAC (Role based access control) based on HopsFS (our file system) ACL (access control lists)
https://hopsworks.readthedocs.io/en/latest/user_guide/hopsworks/projectMembers.html?highlight=row%20based%20access%20control
https://hopsworks.readthedocs.io/en/latest/user_guide/hopsworks/dataSetShare.html?highlight=row%20based%20access%20control
https://hopsworks.readthedocs.io/en/latest/user_guide/hopsfs/acls.html?highlight=row%20based%20access%20control
Data auditing. We have an audit log providing CRUD(create/read/update/delete) information for the featurestore data (available to users through the UI). We log REST endpoints access (available as logs).
We are currently migrating and improving documentation. Moving from the old location, to the new one. I will get back to you with another message when the pages for the relevant information to you have been moved to the new documentation.
I hope this was helpful. Let me know if you have further questions.
Regards,
Alex