Article Extraction Schema

This model has been tailored based on customer feedback and usage. If you need a specific model and enough generalist, you can contact us on the support link below. If some fields are missing, you can also contact us to add them.

Contact us
Article Schema object
  • url [string, null]

    URL of the article

  • headline [string, null]

    Headline of the article

  • date_published [string, null]

    COMPUTE: Convert date_published_raw to YYYY-MM-DD format. If date_published_raw is null or cannot be parsed, set to null.

  • date_published_raw [string, null]

    Raw published date of the article as it appears on the page

  • date_modified [string, null]

    COMPUTE: Convert date_modified_raw to YYYY-MM-DD format. If date_modified_raw is null or cannot be parsed, set to null.

  • date_modified_raw [string, null]

    Raw modified date of the article as it appears on the page

  • author [string, null]

    Author of the article

  • authors_list [array, null]

    List of authors

    Items object
    • author_name string

      Name of the author

  • language [string, null]

    Language of the article (ISO 639 code)

  • breadcrumbs [array, null]

    Breadcrumbs for navigation

    Items object
    • name string

      Name of the breadcrumb

    • link string

      Link of the breadcrumb

  • main_image [string, null]

    URL of the main image. First image in the list of images if the main image is not available

  • images [array, null]

    List of image URLs extracted from the document

    Items object
    • image_url string

      URL of the image

  • guessed_topics [array, null]

    List of guessed topics

    Items string
    Items string
  • sentiment [string, null]

    Sentiment of the article

  • sentiment_probability [number, null]

    Probability of the sentiment from 0 to 1

  • description [string, null]

    Description of the article

  • article_body [string, null]

    Body of the article, with markdown links text only and spacing, punctuation fixed

  • article_body_html [string, null]

    HTML body of the article

  • video_urls [array, null]

    List of video URLs

    Items object
    • video_url string

      URL of the video

  • audio_urls [array, null]

    List of audio URLs

    Items object
    • audio_url string

      URL of the audio

  • related_articles [array, null]

    List of related articles

    Items object
    • headline string

      Headline of the related article

    • description string

      Description of the related article

    • url string

      URL of the related article

  • canonical_url [string, null]

    Canonical URL of the article

  • corpus [array, null]

    Structured content of the article

    Items object
    • type string

      Type of the content segment

    • content string

      Content of the segment