Additional Resources for NLP and Elastic Search with Python

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Below are three interesting reads I came across today. They inspired a GitHub commit on my 'nlpfun' repo.

The first is about using elastic search and python to index, import, and search your gmail messages.
https://github.com/oliver006/elasticsearch-gmail

The second is another take on using python with elastic search.
http://bitquabit.com/post/having-fun-python-and-elasticsearch-part-1/

The third also touches on elastic search, but the majority of the article discusses using the NLTK with python before touching on elastic search.
http://engineroom.trackmaven.com/blog/monthly-challenge-natural-language-processing/

The commit that was inspired by these posts adjusted the way in which I measured the reading score for text as I am now using "stopwords" appropriately. Additionally, the script will print out the Top 10 bigrams and trigrams after the readability statistics.
https://github.com/rarmknecht/nlpfun/commit/6298f53d7b202446b5afd75029eb4c66664f5348

Below is the output for The Gathering Storm:

[[email protected] nlpfun (master)]$ ./basic_info.py samples/Gathering\ Storm\,\ The\ -\ Robert\ Jordan\;\ Brandon\ Sanderson.txt 
{'grade_level': 5.280620877501001,
 'missed_count': 748,
 'missed_pct': 0.004213965803780175,
 'reading_ease': 66.94434753592452,
 'sentences': 32319,
 'syllables': 280679,
 'words': 176757}

Bigrams
====================
(('Aes', 'Sedai'), 0.002985425087633967)
(('Egwene', 'said'), 0.0012746758801133792)
(('I', 'm'), 0.0012746758801133792)
(('al', 'Thor'), 0.0012243597269510089)
(('White', 'Tower'), 0.0011852249411580542)
(('said', 'I'), 0.0009280534916614952)
(('Rand', 'said'), 0.0008945093895532484)
(('Mat', 'said'), 0.0007826957158590924)
(('I', 'know'), 0.0007491516137508456)
(('Dragon', 'Reborn'), 0.0006932447769037675)

Trigrams
====================
(('Tel', 'aran', 'rhiod'), 25.609375317031827)
(('Daughter', 'Nine', 'Moons'), 24.89567950218847)
(('In', 'Old', 'Tongue'), 22.487696760443576)
(('Blood', 'bloody', 'ashes'), 21.01314272463045)
(('Dark', 'One', 'prison'), 19.634089416376085)
(('took', 'deep', 'breath'), 19.374140616577723)
(('Lews', 'Therin', 'whispered'), 18.546228180716582)
(('The', 'Wheel', 'Time'), 17.623100920894437)
(('short', 'distance', 'away'), 17.37556393273867)
(('Egwene', 'al', 'Vere'), 16.368587037186575)

I'm interested in diving farther into this NLP topic to improve searching, filtering, and summarizing of content in a more automated fashion.
-----BEGIN PGP SIGNATURE-----
Version: Keybase OpenPGP v2.0.1
Comment: https://keybase.io/crypto

wsBcBAABCgAGBQJUpiTzAAoJEBRcFxCgapM2xS4IAIgZ6lVACawCGLIjYMo55zWB
Ax4KTVr61PBlZGabrkjWyn8myhBobwHrrv7fWOi4U6tdox8boDmBzLkBwRhG7BXY
bwp6aHMxAzZ9fODePDT49C1WspWd99bDBWsxwBI5AMy+UkiYZYuC4x9yFhJ5p1gx
vkp0FuISBjPn1O81ciufQLDg1/0+9h6AAHW4GElQQmO8rdjVOZgVQmvZPqXO0TGs
DJqiSr/hbZkwq40U7i31I+QyeVkR8XgQQ3QMnutj65GoPXkWHKolrt7vtQWJiDar
JG2Cskf0IyKOR2ln0MLzrhh5fReRxgbzE2ekT+UhiefCyU+JsDwNrGPH7aEDU80=
=owh8
-----END PGP SIGNATURE-----