2020. 3. 16. 09:58ㆍPython programming
string을 포함한 JSON 데이터를 파이썬 오브젝트 set으로 변환해줌
To read a file from JSON format
<비교>
json.loads() : for loading JSON data from a string ("loads" is short for "load string")
json.load() : for loading from a file object
3. Pandas의 pandas.read_json()
4. Pandas의 pandas.DataFrame( ) contructor
- JSON의 dictionary 리스트를 (바로 통과시켜서) dataframe으로 변환함 pass the list of dictionaries directly to it to convert the JSON to a dataframe
ex)
1.use Series.apply() and len() to create a boolean mask based on whether each item in tags has a length of 4
2. Use the boolean mask to filter tags. Assign the result to four_tags.
> tags = hn_df['tags']
tags_apply = tags.apply(len) == 4
four_tags = tags[tags_apply]
print(four_tags.head())
>> 43 [story, author_alamgir_mand, story_7813869, show_hn]
86 [story, author_cweagans, story_7812404, ask_hn]
104 [story, author_nightstrike789, story_7812099, ask_hn]
107 [story, author_ISeemToBeAVerb, story_7812048, ask_hn]
109 [story, author_Swizec, story_7812018, show_hn]
Name: tags, dtype: object
. List Comprehensions
1) 언제 쓰나?
- Iterated over values in a list.
- Performed a transformation on those values.
- Assigned the result to a new list.
2) 어떻게 쓰나?
we do the following within brackets [ ] :
- Start with the code that transforms each item.
- Continue with our for statement (without a colon).
ex 1)
# LOOP VERSION
ints = [1, 2, 3, 4]
plus_one = []
for i in ints:
plus_one.append(i + 1)
print(plus_one)
# LIST COMPREHENSION VERSION
plus_on = [i+1 for i in ints]
ex2 )
# LOOP VERSION
#
# hn_clean = []
#
# for d in hn:
# new_d = del_key(d, 'createdAtI')
# hn_clean.append(new_d)
# LIST COMPREHENSION VERSION
hn_clean = [ del_key(d, 'createdAtI') for d in hn]
5. Transforming a list
6. Creating a new list
7. Reducing a list
ex1 ) 50보다 작은 정수는 삭제
- 먼저, loop version
- list comprehension version
ex 2) comments가 있는 것만 남기기
8. min(), max(), sorted()
- 괄호가 없으면 function 처럼 쓸 수 있다. 예를 들어,
이런 원리를 따른 셈.
만약 괄호를 제거해본다면? 괄호없이 쓴다면 function을 마치 variable처럼 다룰 수 있다.
또다른 variable-like behaviors 는 아래처럼 나타난다
이를 function으로 만들면
- sorted()는 이렇게 돌아간다 (관련 문서 읽기) : json 리스트에서 각 item을 function에 통과시켜서 해당하는 값을 return 시키고, 그 중에서 min을 가져오는 식으로
- min ()
- max()
9. lambda functions
Lambda functions can be defined in a single line, which allows you to define a function you want to pass as an argument at the time you need it.
- Use the lambda keyword, followed by
- The parameter and a colon, and then
- The transformation we wish to perform on our argument
- Learned how to use a lambda function to pass an argument in place when calculating minimums, maximums, and sorting lists of lists.
- JSON 파일의 sorted( ) 에 적용해보자
ex) Using sorted() and a lambda function, sort the hn_clean JSON list by the number of points (dictionary key points) from highest to lowest:
>> hn_sorted_points = sorted(hn_clean, key=lambda d : d['points'], reverse=True)
- JSON 파일의 min( )에 적용해보자
- JSON 파일의 max( )에 적용해보자
10. ternary operator
예시 )
- Check the length of the list.
- If the length of the list is equal to four, return the last value.
- If the length of the list isn't equal to four, return a null value.
> def extract_tag(l):
if len(l) == 4:
return l[-1]
else:
return None
>
ex)
문제 :
- Use Series.apply() and a lambda function to extract the tag data from tags:
- Where the item is a list with length four, return the last item.
- In all other cases, return None.
- Assign the result to cleaned_tags.
- Assign the cleaned_tags series to the tags column of the hn_df dataframe.
답 :
# def extract_tag(l):
# return l[-1] if len(l) == 4 else None
cleaned_tags = tags.apply(lambda l : l[-1] if len(l) == 4 else None)
hn_df['tags'] = cleaned_tags
>> Series (<class 'pandas.core.series.Series'>)
0 None
1 None
2 None
3 None
'Python programming' 카테고리의 다른 글
Pass = 아무 것도 안하고 넘어간다 (0) | 2020.06.16 |
---|---|
Public Private Protected (0) | 2020.06.14 |
[참고] real world data set 모음 (0) | 2020.03.08 |
[참고] Loan Prediction - github (0) | 2020.03.08 |
정규 표현식 Regular Expression Syntax (0) | 2020.03.01 |