[Solved] AI4I-3-Q How many n-grams
Hi, does anyone know how to derive the correct answer (7047) to the last question of the quiz? I'm currently getting 8138. Below is the last few lines I'd put in before getting 8183:
##count 2355.000000 mean 42.582335 std 23.854559 min 1.000000 25% 34.000000 50% 41.000000 75% 50.000000 max 1000.000000 Name: age, dtype: float64
mean = df['age'].mean()
std = df['age'].std()
cutoff = std*2
lower,upper = mean - cutoff, mean + cutoff
new_df = df[(df['age']
cv.fit_transform(new_df['Q9: OTHER COMMENTS'].values.astype('str'))
Just trying to figure out at which point I went wrong. Any advice is appreciated! 🙂
Hey I got 7047 using your method as well.
Did you modify the column or download the right CSV? It might be also you're using the same df (new_df) as the one you removed rows for although I'm not sure how that leads to more 2n-grams.
Here's the relevant code I used:
df = pd.read_csv('C:/Users/nicho/Desktop/candy.csv',encoding='latin')
df3=df.loc[:,'Q9: OTHER COMMENTS'] vec = CountVectorizer(ngram_range = (2,2)) cv = vec.fit_transform(df3.values.astype('str')) cv_array = cv.toarray() cv_df = pd.DataFrame(cv_array, columns=vec.get_feature_names()).add_prefix('Counts_') print(cv_df.shape)
I had a different count value(got 2351). Have you tried dropping missing values after converting the 'age' column to numeric data?
I tried using nltk.util.ngrams and got 8256.
Are we expected to use sklearn.feature_extraction.text.CountVectorizer for this quiz?
comments_col = df['Q9: OTHER COMMENTS'].dropna()
for comment in comments_col:
bigrams = ngrams(comment.split(), 2)
for gram in bigrams:
ngram_count = sum(count)
cv.fit_transform(df['Q9: OTHER COMMENTS'].values.astype('str'))
This is the intended method. However, we note that there are multiple methods of deriving the n-grams and the question will be made less specific.