Skip to main content

Croissant (metadata format)

Croissant is a metadata format design to support sharing of datasets for machine learning applications. It is a platform-agnostic schema used to standardize metadata in data repositories like Hugging Face, kaggle, Dataverse and OpenML.

Structure

Croissant builds upon schema.org, uses primarily JSON-LD, and divides metadata in four “layers”: Dataset Metadata, Resource, Structure and Semantic:

  • The Dataset Metadata layer constrains which schema.org properties should be used, including additional properties, linking together the resources (files) of the dataset with general metadata, like licensing and citation information.
  • The Resource layer describes the individual files and sets of those using two new classes, FileObject and FileSet. A FileSet may be a collection of related images.
  • The Structure layer specifies how the files are organized in the dataset. A RecordSet class describes how resources are present, configurations that may very a lot between modality. This specification facilitates interoperability of the datasets.

Source: Wikipedia

No Comments yet!

Your Email address will not be published.