반응형
안녕하세요. 행부장입니다.
파이썬에서 BeautifulSoup 4 사용방법 (python 3.x)입니다.
1. pip install bs4
2. 함수 사용하기
###################################################################
1) find('tag') - 처음 1개열만 반환
from bs4 import BeautifulSoup
myHtml01 = '''
<html>
<head>
<title> My HTML </title>
</head>
<body>
<H1> 안녕, 세상아! Hello, World!</H1>
<p align="center"> center </p>
<p align="right"> right </p>
<p align="left"> left </p>
</body>
</html> '''
soup = BeautifulSoup(myHtml01 , 'html.parser')
soup.find('p')
>>> <p align="center"> center </p>
soup.find('p', align='center')
>>> <p align="center"> center </p>
###################################################################
2) find_all('tag') - 전체 열 반환
find_all('tag)[number] - 해당배열건 열 반환
find_all(['tag1', 'tag2']) - 여러 태그 값을 OR로 열 반환
soup.find_all('p')
>>> [<p align="center"> center </p>,
<p align="right"> right </p>,
<p align="left"> left </p>]
soup.find_all('p')[0]
>>> <p align="center"> center </p>
soup.find_all('p')[2]
>>> <p align="left"> left </p>
soup.find_all(['p','h1'])
>>> [<h1> 안녕, 세상아! Hello, World!</h1>,
<p align="center"> center </p>,
<p align="right"> right </p>,
<p align="left"> left </p>]
data=soup.find_all('p')
for i in data :
print (i.string)
>>> center
right
left
for i in data :
print (i.get_text())
>>> center
right
left
###################################################################
3) select('상위tag > 상위다음(2번째)tag > 3번째tag') - 해당부분 열 반환
select('상위tag.class > 상위다음(2번째)tag.class') - 해당부분 열 반환
select('#id') - 해당 id명 해당부분 열 반환
select('#id > tag.class') - 해당부분 열 반환
select('tag[속성]') - 해당 tag 아래 속성과 일치하는 열 반환
myHtml02 = '''
<html>
<head>
잡지, 책 정보
</head>
<body>
<h1> 잡지, 책 목록
<div>
<p id='magzine1' class='magzines' title='잡지1'> 잡지1
<span class = 'price'> 7000원 </span>
<span class = 'count'> 10개 </span>
<span class = 'store'> 잡지1 </span>
<a href="www.magazine.xxx">www.magazine1.xxx</a>
</p>
</div>
<div>
<p id='magzine2' class='magzines' title='잡지2'> 잡지2
<span class = 'price'> 8500원 </span>
<span class = 'count'> 20개 </span>
<span class = 'store'> 잡지2</span>
<a href="www.magazine.xxx">www.magazine2.xxx</a>
</p>
</div>
<div>
<p id='book1' class='books' title='책1'> 책1
<span class = 'price'> 13500원 </span>
<span class = 'count'> 30개 </span>
<span class = 'store'> 서점1</span>
<a href="www.book.xxx">www.book1.xxx</a>
</p>
<div>
<div>
<p id='book2' class='books' title='책2'> 책2
<span class = 'price'> 18000원 </span>
<span class = 'count'> 10개 </span>
<span class = 'store'> 서점2</span>
<a href="www.book.xxx">www.book2.xxx</a>
</p>
<div>
</body>
</html> '''
soup02 = BeautifulSoup(myHtml02 , 'html.parser')
soup02.select('div > p > span')
>>>[<span class="price"> 7000원 </span>,
<span class="count"> 10개 </span>,
<span class="store"> 잡지1 </span>,
<span class="price"> 8500원 </span>,
<span class="count"> 20개 </span>,
<span class="store"> 잡지2</span>,
<span class="price"> 13500원 </span>,
<span class="count"> 30개 </span>,
<span class="store"> 서점1</span>,
<span class="price"> 18000원 </span>,
<span class="count"> 10개 </span>,
<span class="store"> 서점2</span>]
soup02.select('div > p > span')[3]
>>><span class="price"> 8500원 </span>
soup02.select('#book1')
>>>[<p class="books" id="book1" title="책1"> 책1
<span class="price"> 13500원 </span>
<span class="count"> 30개 </span>
<span class="store"> 서점1</span>
</p>]
soup02.select('#book1 > span.store')
>>>[<span class="store"> 서점1</span>]
soup02.select('a[href]')
>>>[<a href="www.magazine.xxx">www.magazine1.xxx</a>,
<a href="www.magazine.xxx">www.magazine2.xxx</a>,
<a href="www.book.xxx">www.book1.xxx</a>,
<a href="www.book.xxx">www.book2.xxx</a>]
soup02.select('a[href]')[1]
>>><a href="www.magazine.xxx">www.magazine2.xxx</a>
soup02.select('#book2 > a[href]')
[<a href="www.book.xxx">www.book2.xxx</a>]
Jupytor notbook 파일도 첨부합니다.
도움이 되었다면 아래▼▼▼ 공감하트 클릭, 응원댓글 부탁드립니다.(공감과 댓글은 로그인이 필요 없어요)
감사합니다.
반응형
댓글