Skip to content

Source Data Explanation

Weikai Huang edited this page Jul 4, 2024 · 2 revisions

Source Data Explanation

Task-Me-Anything comprises various types of source data, including:

  • 2D images
  • 3D assets
  • Real images and videos with scene graphs
  • Human annotations detailing angles, materials, colors, and shapes of 3D objects.
  • A taxonomy that reflects the relationships between different concepts within the source data.

In this document, we provide an explanation of source data in TaskMeAnything/annotations folders.

Annotations

This 4 files contains all the annotations for the source data.

  1. attribute_category.json
  2. cateid_to_concept.json
  3. cateid_to_objects.json
  4. taxonomy.json

attribute_category.json contains human-annotated classifications of all attributes in the SceneGraph, categorizing them into groups such as “color” and “size” for more detailed Scene Graph questions generation.

cateid_to_concept.json, cateid_to_objects.json, and taxonomy.json includes all human-annotated knowledge graphs (taxonomy) and all annotations of angles, materials, colors, and shapes of 3D assets for Task-Me-Anything.

  1. cateid_to_concept.json:
    • Collected all concepts from Scene Graphs and 3D assets (e.g. apple, glass, dog, etc.) and normalized them to their corresponding Wikidata pages.
    • For example, “eyeglass”, “eyeglasses”, “glasses”, “spectacles” in 3D assets, and “eye glasses”, “glasses” in Scene Graphs were normalized to a QID: Q37501 which corresponding to a concept page on Wikidata: https://www.wikidata.org/wiki/Q37501.
      "Q37501": {
          "surface_name": [
              "eyeglasses",
              "glasses",
              "spectacles"
          ],
          "wikipedia": "Glasses",
          "wikidata": "glasses",
          "wikidata_description": "accessories that improve human vision",
          "objaverse": [
              "eyeglass",
              "eyeglasses",
              "glasses",
              "spectacles"
          ],
          "scene_graph": [
              "eye glasses",
              "glasses"
          ]
      },
      
    • surfaces_name is the normalized name we use to generate questions, (e.g. There is a concept named orange_(fruit) in 3D assets , we normalized it to orange as surface_name for better readibility in question genrations).
  2. cateid_to_objects.json:
    • contains the annotations of angles, materials, colors, and shapes of 3D assets in each QID.
    • For example, QID: Q37501 contains 3D asset: a709eff74e544fd6b9390bb2bae0f77e, images means its visable prospectives in 2D stickers image scenarios, attributes contain the color, material, shape of this 3D assets, angles contain the visable angles of this 3D assets in 3D scenarios.
      "a709eff74e544fd6b9390bb2bae0f77e": {
          "images": [
              "000.png",
              "001.png",
              "002.png",
              "003.png",
              "004.png",
              "005.png",
              "006.png",
              "007.png",
              "008.png"
          ],
          "attributes": {
              "color": [
                  "blue"
              ],
              "material": [],
              "shape": []
          },
          "angles": [
              0,
              120,
              240
          ]
      },
      
  3. taxonomy.json:
    • Leveraged the concept net in Wikidata to build a concept graph (taxonomy) for all concepts (QID) in Task-Me-Anything.
    • Includes information like “glasses (Q37501) is a subclass of optical instrument (Q1751850)”.
    • In taxonomy.json,
        [
            "Q11422",
            "Q682582"
        ],
      
      means Q682582 is a subclass of Q11422.
    • the nodes below edges are all the concepts (QID) that are not in the Scene Graphs and 3D assets, but helps to build the taxonomy.
Clone this wiki locally