It's not possible to select preceding siblings using CSS selectors (unlike following siblings).
However, depending on your scraping stack there are several different ways to achieve this:
Use Beautifulsoup and Python to select the preceding siblings:
from bs4 import BeautifulSoup
html = """
<div>
<h2>Heading 1</h2>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<h2>Heading 2</h2>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</div>
"""
soup = BeautifulSoup(html, "html.parser")
# Find root element:
second_h2_element = soup.find_all("h2")[1]
# Select the preceding siblings using .previous_siblings property:
preceding_siblings = second_h2_element.previous_siblings
for sibling in preceding_siblings:
print(sibling.text)
- Using Cheerio and Javascript to select the preceding siblings:
const cheerio = require("cheerio");
const html = `
<div>
<h2>Heading 1</h2>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<h2>Heading 2</h2>
<p>Paragraph 3</p>
<p>Paragraph 4</p>
</div>
`;
const $ = cheerio.load(html);
// Get the second h2 element
const second_h2_element = $("h2").eq(1);
// Select the preceding siblings of the h2 element
const preceding_siblings = second_h2_element.prevAll();
// Loop over the preceding siblings and print their text content
preceding_siblings.each(function() {
console.log($(this).text());
});