Forum

Notifications
Clear all

[Solved] AI4I-3-Q How many n-grams  

   RSS

0

Hi, does anyone know how to derive the correct answer (7047) to the last question of the quiz? I'm currently getting 8138. Below is the last few lines I'd put in before getting 8183:

print(df['age'].describe())

##count 2355.000000 mean 42.582335 std 23.854559 min 1.000000 25% 34.000000 50% 41.000000 75% 50.000000 max 1000.000000 Name: age, dtype: float64

mean = df['age'].mean()
std = df['age'].std()
cutoff = std*2
lower,upper = mean - cutoff, mean + cutoff
new_df = df[(df['age']lower)]

new_df['age'].hist()
plt.show()

cv=CountVectorizer(ngram_range=(2,2))
print(cv)

cv.fit_transform(new_df['Q9: OTHER COMMENTS'].values.astype('str'))

 

Just trying to figure out at which point I went wrong. Any advice is appreciated! 🙂

5 Answers
0
cv=CountVectorizer(ngram_range=(2,2))
cv.fit_transform(df['Q9: OTHER COMMENTS'].values.astype('str'))

 

Hi ellyn! This gives you the right answer in 7047. You shouldn't modify the dataframe to new_df as in your code.

0

Hey I got 7047 using your method as well.

Did you modify the column or download the right CSV? It might be also you're using the same df (new_df) as the one you removed rows for although I'm not sure how that leads to more 2n-grams.

Here's the relevant code I used:

df = pd.read_csv('C:/Users/nicho/Desktop/candy.csv',encoding='latin')
df3=df.loc[:,'Q9: OTHER COMMENTS']
vec = CountVectorizer(ngram_range = (2,2))
cv = vec.fit_transform(df3.values.astype('str'))
cv_array = cv.toarray()
cv_df = pd.DataFrame(cv_array, columns=vec.get_feature_names()).add_prefix('Counts_')
print(cv_df.shape)
 
Output:

 

(2460, 7047)
This post was modified 1 month ago by Yi Sheng

@nicholasleong

Thanks, this worked too! Appreciate the help 🙂

0

I had a different count value(got 2351). Have you tried dropping missing values after converting the 'age' column to numeric data? 

0

I tried using nltk.util.ngrams and got 8256.
Are we expected to use sklearn.feature_extraction.text.CountVectorizer for this quiz?

 

count[]
comments_col = df['Q9: OTHER COMMENTS'].dropna()
for comment in comments_col:
bigrams = ngrams(comment.split(), 2)
for gram in bigrams:
count.append(len(list(bigrams)))
ngram_count = sum(count)

0
cv=CountVectorizer(ngram_range=(2,2))
cv.fit_transform(df['Q9: OTHER COMMENTS'].values.astype('str'))

This is the intended method. However, we note that there are multiple methods of deriving the n-grams and the question will be made less specific.
Share:

Delete your account