Article Extraction Schema
This model has been tailored based on customer feedback and usage. If you need a specific model and enough generalist, you can contact us on the support link below. If some fields are missing, you can also contact us to add them.
Contact us
Article Schema object
-
url
[string, null]
URL of the article
-
headline
[string, null]
Headline of the article
-
date_published
[string, null]
COMPUTE: Convert date_published_raw to YYYY-MM-DD format. If date_published_raw is null or cannot be parsed, set to null.
-
date_published_raw
[string, null]
Raw published date of the article as it appears on the page
-
date_modified
[string, null]
COMPUTE: Convert date_modified_raw to YYYY-MM-DD format. If date_modified_raw is null or cannot be parsed, set to null.
-
date_modified_raw
[string, null]
Raw modified date of the article as it appears on the page
-
author
[string, null]
Author of the article
-
authors_list
[array, null]
List of authors
Items object
-
author_name
string
Name of the author
-
author_name
string
-
language
[string, null]
Language of the article (ISO 639 code)
-
breadcrumbs
[array, null]
Breadcrumbs for navigation
Items object
-
name
string
Name of the breadcrumb
-
link
string
Link of the breadcrumb
-
name
string
-
main_image
[string, null]
URL of the main image. First image in the list of images if the main image is not available
-
images
[array, null]
List of image URLs extracted from the document
Items object
-
image_url
string
URL of the image
-
image_url
string
-
guessed_topics
[array, null]
List of guessed topics
Items string
Items string
-
sentiment
[string, null]
Sentiment of the article
-
sentiment_probability
[number, null]
Probability of the sentiment from 0 to 1
-
description
[string, null]
Description of the article
-
article_body
[string, null]
Body of the article, with markdown links text only and spacing, punctuation fixed
-
article_body_html
[string, null]
HTML body of the article
-
video_urls
[array, null]
List of video URLs
Items object
-
video_url
string
URL of the video
-
video_url
string
-
audio_urls
[array, null]
List of audio URLs
Items object
-
audio_url
string
URL of the audio
-
audio_url
string
-
related_articles
[array, null]
List of related articles
Items object
-
headline
string
Headline of the related article
-
description
string
Description of the related article
-
url
string
URL of the related article
-
headline
string
-
canonical_url
[string, null]
Canonical URL of the article
-
corpus
[array, null]
Structured content of the article
Items object
-
type
string
Type of the content segment
-
content
string
Content of the segment
-
type
string