Bahay Mga Uso Isang malalim na pagsisid sa hadoop - techwise episode 1 transcript

Isang malalim na pagsisid sa hadoop - techwise episode 1 transcript

Anonim

Tala ng Editor: Ito ay isang transcript ng isang live Webcast. Maaari mong tingnan ang buong webcast dito.


Eric Kavanagh: Mga kababaihan at ginoo, oras na upang maging marunong! Panahon na para sa TechWise, isang bagong tatak ng palabas! Ang pangalan ko ay Eric Kavanagh. Ako ang magiging moderator mo para sa aming inaugural episode ng TechWise. Tama yan. Ito ay isang pakikipagtulungan ng Techopedia at ang Bloor Group, siyempre, ng katanyagan ng Inside Analysis.


Ang pangalan ko ay Eric Kavanagh. Ako ay moderating ito talagang kawili-wili at kasangkot na kaganapan, mga tao. Kami ay maghuhukay nang malalim sa habi upang maunawaan kung ano ang nangyayari sa malaking bagay na tinatawag na Hadoop. Ano ang elepante sa silid? Ito ay tinatawag na Hadoop. Susubukan naming malaman kung ano ang ibig sabihin nito at kung ano ang nangyayari dito.


Una sa lahat, maraming salamat sa aming mga sponsor, GridGain, Actian, Zettaset at DataTorrent. Makakakuha kami ng isang maikling ilang mga salita mula sa bawat isa sa kanila malapit sa pagtatapos ng kaganapang ito. Magkakaroon din kami ng Q&A, kaya huwag mahiya - ipadala ang iyong mga katanungan sa anumang oras.


Kami ay maghukay sa mga detalye at itapon ang mga mahirap na katanungan sa aming mga eksperto. At nagsasalita ng mga eksperto, hey, nandiyan sila. Kaya, pupunta kami sa pagdinig mula sa aming sariling Dr. Robin Bloor, at mga tao, natutuwa ako na magkaroon ng maalamat na Ray Wang, punong tagasuri at tagapagtatag ng Konstelasyong Pananaliksik. Nasa online siya ngayon upang bigyan kami ng kanyang mga saloobin at tulad niya kay Robin na siya ay hindi mapaniniwalaan o kapani-paniwala magkakaibang at talagang nakatuon sa maraming iba't ibang mga lugar at may kakayahang synthesize ang mga ito at talagang maunawaan kung ano ang nangyayari doon sa buong larangan ng teknolohiya ng impormasyon at pamamahala ng data.


Kaya, mayroong maliit na nakatutuwang elepante. Nasa simula siya ng kalsada, tulad ng nakikita mo. Nagsisimula pa lang ito ngayon, ito ay uri ng pagsisimula, ang buong bagay na Hadoop na ito. Siyempre, bumalik noong 2006 o 2007, sa palagay ko, kapag ito ay pinakawalan sa bukas na mapagkukunan na komunidad, ngunit nagkaroon ng maraming mga bagay na nangyayari, mga tao. Nagkaroon ng malaking pag-unlad. Sa katunayan, nais kong maiparating ang kwento, kaya gagawin ko ang isang mabilis na pagbabahagi sa desktop, kahit na sa tingin ko ako. Gawin natin ang isang mabilis na pagbabahagi sa desktop.


Ipinakikita ko sa iyo ito ng mga baliw, mabaliw na mga tao lamang sa kuwento. Kaya namuhunan ang Intel ng $ 740 milyon upang bumili ng 18 porsyento ng Cloudera. Naisip ko at tulad ko, "Holy Christmas!" Sinimulan kong gawin ang matematika at tulad nito, "Ito ay isang pagpapahalaga ng $ 4.1 bilyon." Isipin natin ang tungkol sa isang segundo. Ibig kong sabihin, kung ang WhatsApp ay nagkakahalaga ng $ 2 bilyon, sa palagay ko ay maaaring pati na rin si Cloudera ay nagkakahalaga ng $ 4.1 bilyon, di ba? Ibig kong sabihin, bakit hindi? Ang ilan sa mga bilang na ito ay nasa labas lamang ng bintana sa mga araw na ito, mga tao. Ibig kong sabihin, karaniwang sa mga tuntunin ng pamumuhunan, mayroon kang EBITDA at lahat ng iba pang iba't ibang mga mekanismo, maraming mga kita at iba pa. Sa gayon, ito ay magiging isang sakong ng maraming kita upang makakuha ng $ 4.1 bilyon para sa Cloudera, na isang kamangha-manghang kumpanya. Huwag mo akong mali - mayroong ilang napaka, napaka matalino na tao doon kasama ang taong nagsimula ng buong pagkahumaling sa Hadoop, si Doug Cutting, siya ay naroroon - maraming napakatalino na mga tao na gumagawa ng maraming, talagang mga cool na bagay, ngunit ang nasa ilalim na linya ay ang $ 4.1 bilyon, iyan ay maraming pera.


Kaya narito ang uri ng isang bihag na malinaw na sandali ng pagdaan sa aking ulo ngayon na kung saan ay isang maliit na tilad, Intel. Ang kanilang mga taga-disenyo ng chip ay dinadala upang makita ang ilang chip ng Hadoop-optimize - Kailangang isipin ko ito, mga tao. Iyon lang ang hula ko. Iyon lang ang alingawngaw, nagmula sa akin, kung gugustuhin mo, ngunit ito ay uri ng kahulugan. At ano ang ibig sabihin ng lahat?


Kaya narito ang aking teorya. Anong nangyayari? Ang isang pulutong ng mga bagay na ito ay hindi bago. Ang napakalaking kahanay na pagpoproseso ay hindi bago. Tiyak na pagpoproseso sigurado ay hindi bago. Matagal na ako sa mundo ng supercomputing. Ang isang pulutong ng mga bagay na ito ay nangyayari ay hindi bago, ngunit mayroong uri ng pangkalahatang kamalayan na mayroong isang bagong paraan upang salakayin ang ilan sa mga problemang ito. Ang nakikita kong nangyayari, kung titingnan mo ang ilan sa mga malalaking nagtitinda ng Cloudera o Hortonworks at ilan sa mga iba pang mga lalaki, kung ano ang ginagawa nila talaga kung pakuluan mo ito hanggang sa pinaka-butil na distilled na antas ay ang pag-unlad ng aplikasyon. Iyon ang kanilang ginagawa.


Nagdidisenyo sila ng mga bagong aplikasyon - ang ilan sa mga ito ay nagsasangkot ng analytics ng negosyo; ang ilan sa mga ito ay nagsasangkot lamang ng mga supercharging system. Isa sa aming mga nagtitinda na napag-usapan iyon, ginagawa nila ang ganitong uri ng buong araw, sa palabas ngayon. Ngunit kung ito ay napakalaking bago, muli ang sagot ay "hindi talaga, " ngunit mayroong malaking bagay na nangyayari, at personal, sa palagay ko kung ano ang nangyayari sa Intel na gumawa ng malaking pamumuhunan na ito ay isang paggawa ng merkado. Tumitingin sila sa mundo ngayon at nakikita na ito ay uri ng isang monopolyo mundo ngayon. Mayroong Facebook at pinalo nila ang mga snot lamang sa hindi magandang MySpace. Tinalo ng LinkedIn ang snot sa labas ng mahirap na Sino Sino. Kaya't tumingin ka sa paligid at ito ay isang serbisyo na nangingibabaw sa lahat ng iba't ibang mga puwang na ito sa ating mundo ngayon, at sa palagay ko ang ideya ay ang Intel ay ihahagis ang lahat ng kanilang mga chips sa Cloudera at subukang itaas ito sa tuktok ng salansan - iyon lang teorya ko.


Kaya't ang mga tao, tulad ng sinabi ko, magkakaroon kami ng isang mahabang session ng Q&A, kaya huwag mahiya. Ipadala ang iyong mga katanungan sa anumang oras. Magagawa mo ito gamit ang Q&A na sangkap ng iyong webcast console. At kasama iyon, nais kong makarating sa aming nilalaman dahil marami kaming bagay na makukuha.


Kaya, Robin Bloor, hayaan mo akong ibigay ang mga susi sa iyo at sa iyo ang sahig.


Robin Bloor: OK, Eric, salamat para doon. Dalhin natin ang mga elepante sa sayawan. Ito ay isang kakaibang bagay, sa totoo lang, na ang mga elepante ay ang tanging mga mammal ng lupa na hindi talaga maaaring tumalon. Ang lahat ng mga elepante sa partikular na graphic na ito ay nakakuha ng hindi bababa sa isang paa sa lupa, kaya inaakala kong magagawa ito, ngunit sa isang tiyak na sukat, ito ay malinaw na ang mga elepante ng Hadoop, sa gayon, napaka-kaya.


Ang tanong, talaga, na sa palagay ko ay kailangang talakayin at kailangang pag-usapan sa buong katapatan. Kailangang pag-usapan ito bago ka pumunta sa kahit saan pa, na kung saan ay talagang magsimulang makipag-usap tungkol sa kung ano talaga si Hadoop.


Ang isa sa mga bagay na ganap na mula sa batayan ng man-play ay ang tindahan ng key-halaga. Dati na mayroon kaming mga tindahan ng key-halaga. Dati namin ito sa IBM mainframe. Mayroon kaming mga ito sa mga minicomputers; Ang DEC VAX ay may mga file ng IMS. Mayroong mga kakayahan ng ISAM na halos bawat minicomputer maaari mong makuha ang iyong mga kamay. Ngunit sa panahon ng huli na '80s, pumasok si Unix at si Unix ay hindi talaga mayroong anumang tindahan ng key-halaga dito. Nang binuo ito ni Unix, mabilis silang binuo. Ang nangyari talaga ay ang mga nagtitinda ng database, lalo na sa Oracle, ay nagpunta sa steaming doon at ipinagbenta nila ang iyong mga database upang alagaan ang anumang data na pinapahalagahan mong pamahalaan sa Unix. Ang Windows at Linux ay naging pareho. Kaya, ang industriya ay nagpunta para sa pinakamahusay na bahagi ng 20 taon nang walang isang pangkalahatang layunin na key-halaga na tindahan. Well, bumalik na ito ngayon. Hindi lamang ito bumalik, nasusukat.


Ngayon, sa palagay ko talaga ito ang pundasyon ng kung ano talaga ang Hadoop at sa isang tiyak na degree, tinutukoy nito kung saan pupunta. Ano ang gusto natin tungkol sa mga tindahan ng key-halaga? Ang mga sa iyo na kasing edad ko at talagang tandaan na nagtatrabaho sa mga tindahan ng key-halaga ay napagtanto na maaari mong magamit ang mga ito upang hindi pormal na mag-set up ng isang database, ngunit hindi pormal lamang. Alam mong mabilis na pinahahalagahan ng mga metadata ang mga tindahan sa code ng programa, ngunit maaari mo talagang gawin na isang panlabas na file, at maaari mo kung nais mong simulan ang pagpapagamot ng isang tindahan ng key-halaga ng kaunti tulad ng isang database. Ngunit syempre wala itong lahat na kakayahan sa paggaling na mayroon ng isang database at wala itong isang kakila-kilabot na mga bagay na nakuha na ng mga database, ngunit ito ay isang tunay na kapaki-pakinabang na tampok para sa mga developer at iyon ang isa sa mga dahilan na sa palagay ko na ang Hadoop ay napatunayan na napakapopular - dahil lamang ito ay mga coder, programmer, mga developer na mabilis na. Napagtanto nila na hindi lamang isang mahalagang halaga ng tindahan ngunit ito ay isang scale-out na key-halaga na tindahan. Ito ay kaliskis ng halos walang hanggan. Ipinadala ko ang mga kaliskis na ito sa libu-libong mga server, kaya't iyon ang talagang malaking bagay tungkol sa Hadoop, iyon na.


Mayroon din itong tuktok ng MapReduce, na kung saan ay isang parallelization algorithm, ngunit sa totoo lang iyon, sa aking palagay, hindi mahalaga. Kaya, alam mo, isang chameleon ni Hadoop. Ito ay hindi lamang isang file system. Nakita ko ang iba't ibang uri ng mga paghahabol na ginawa para sa Hadoop: ito ay isang lihim na database; hindi ito lihim na database; ito ay isang pangkaraniwang tindahan; ito ay isang analytical toolbox; ito ay isang kapaligiran sa ELT; ito ay tool sa paglilinis ng data; ito ay isang bodega ng data ng streaming platform; ito ay isang archive store; lunas ito sa cancer, at iba pa. Karamihan sa mga bagay na ito ay talagang hindi totoo para sa vanilla Hadoop. Ang Hadoop ay marahil isang prototyping - tiyak na isang prototyping na kapaligiran para sa isang database ng SQL, ngunit wala talaga ito, kung naglalagay ka ng puwang ng edad na may katalogo ng edad sa paglipas ng Hadoop, mayroon kang isang bagay na mukhang isang database, ngunit hindi talaga kung ano ang tatawagin ng isang database sa mga tuntunin ng kakayahan. Ang isang pulutong ng mga kakayahan na ito, maaari mong tiyak na makuha ang mga ito sa Hadoop. Tiyak na marami sa kanila. Sa aktwal na, maaari kang makakuha ng ilang mapagkukunan ng Hadoop, ngunit ang Hadoop mismo ay hindi kung ano ang tatawagin kong operasyon na pinatigas, at samakatuwid ang pakikitungo tungkol sa Hadoop, talagang hindi ako magiging anumang bagay, ay ang uri ng kailangan mong magkaroon ng pangatlo -party na mga produkto upang mapahusay ito.


Kaya, ang pakikipag-usap tungkol sa iyo ay maaari lamang ihagis sa ilang mga linya habang pinag-uusapan ko ang overlay ng Hadoop. Una sa lahat, ang kakayahan sa query sa real-time, alam mo na ang real-time ay uri ng oras ng negosyo, talaga, halos palaging pagganap kritikal kung hindi. Ibig kong sabihin, bakit ka mang-engineer para sa totoong oras? Hindi ito ginagawa ng Hadoop. Gumagawa ito ng isang bagay na malapit sa real-time ngunit hindi talaga ito gumagawa ng mga real-time na bagay. Gumagawa ito ng streaming, ngunit hindi ito ginagawa sa streaming tungkol sa nais kong tawaging talagang misyon-kritikal na uri ng application-streaming platform ang maaaring gawin. Mayroong pagkakaiba sa pagitan ng isang database at isang malinaw na tindahan. I-synchronize ito sa higit sa Hadoop ay nagbibigay sa iyo ng isang malinaw na tindahan ng data. Na uri ng tulad ng isang database ngunit hindi ito katulad ng isang database. Ang Hadoop sa katutubong form nito, sa aking palagay, ay hindi talagang kwalipikado bilang isang database ng lahat dahil ito ay maikli ng kaunting ilang mga bagay na dapat magkaroon ng isang database. Marami ang ginagawa ni Hadoop, ngunit hindi ito gaanong ginagawa. Muli, ang kakayahan doon ngunit kami ay isang paraan na malayo sa aktwal na pagkakaroon ng isang mabilis na kakayahan sa lahat ng mga lugar na ito.


Ang iba pang bagay na maunawaan tungkol sa Hadoop ay, ito ay uri ng dumating mula sa isang mahabang paraan mula nang ito ay binuo. Ito ay binuo sa mga unang araw; binuo ito kapag mayroon kaming mga server na talagang mayroon lamang isang processor sa bawat server. Kami ay hindi kailanman nagkaroon ng mga multi-core processors at ito ay binuo upang tumakbo sa ibabaw ng mga grids, ilulunsad ang mga grids at severs. Ang isa sa mga layunin ng disenyo ng Hadoop ay hindi kailanman mawalan ng trabaho. At iyon ay talagang tungkol sa pagkabigo ng disk, dahil kung mayroon kang daan-daang mga server, kung gayon ang posibilidad ay, kung nakakuha ka ng mga disk sa mga server, ang posibilidad ay makakakuha ka ng isang magagamit na oras ng pagkakaroon ng isang bagay tulad ng 99.8. Nangangahulugan ito na makakakuha ka ng average ng isang kabiguan ng isa sa mga server nang isang beses bawat 300 o 350 araw, isang araw sa isang taon. Kaya kung mayroon kang daan-daang mga ito, ang posibilidad ay nasa anumang araw ng taon na makakakuha ka ng pagkabigo sa server.


Ang Hadoop ay partikular na itinayo upang matugunan ang problemang iyon - kaya't, kung sakaling nabigo, kumukuha ng mga snapshot ng lahat ng nangyayari, sa bawat partikular na server at maaari itong mabawi ang trabaho sa batch na tumatakbo. At iyon ang lahat na talagang tumakbo sa Hadoop ay mga trabaho sa batch at iyon ay isang talagang kapaki-pakinabang na kakayahan, dapat itong sabihin. Ang ilan sa mga trabaho sa batch na pinapatakbo - lalo na sa Yahoo, kung saan sa palagay ko si Hadoop ay uri ng ipinanganak - ay tatakbo sa dalawa o tatlong araw, at kung nabigo ito pagkatapos ng isang araw, talagang ayaw mong mawala ang trabaho nagawa na. Kaya iyon ang disenyo ng punto sa likod ng pagkakaroon ng Hadoop. Hindi mo tatawagin ang mataas na kakayahang iyon, ngunit maaari mong tawagan ito ng mataas na kakayahang magamit para sa mga serial na trabaho sa batch. Iyon marahil ang paraan upang tingnan ito. Ang mataas na kakayahang magamit ay palaging isinaayos ayon sa mga katangian ng linya ng trabaho. Sa ngayon, ang Hadoop ay maaari lamang mai-configure para sa talagang mga serial batch na trabaho tungkol sa uri ng pagbawi. Ang pagkakaroon ng mataas na enterprise ay marahil pinakamahusay na naisip sa mga tuntunin ng transactional LLP. Naniniwala ako na kung hindi mo tinitingnan ito bilang uri ng isang real-time na bagay, hindi pa iyon ginagawa ni Hadoop. Marahil isang mahabang paraan ang layo sa paggawa nito.


Ngunit narito ang magandang bagay tungkol sa Hadoop. Ang graphic na iyon sa kanang bahagi ay nakakuha ng isang listahan ng mga nagtitinda sa paligid ng gilid at lahat ng mga linya dito ay nagpapahiwatig ng mga koneksyon sa pagitan ng mga nagtitinda at iba pang mga produkto sa Hadoop ecosystem. Kung titingnan mo iyon, iyon ay isang hindi kapani-paniwalang kahanga-hangang ekosistema. Ito ay lubos na kapansin-pansin. Kitang-kita namin, nakikipag-usap kami sa maraming mga vendor sa mga tuntunin ng kanilang kakayahan. Sa gitna ng mga vendor na nakausap ko, mayroong ilang mga talagang pambihirang kakayahan ng paggamit ng Hadoop at nasa memorya, paraan ng paggamit ng Hadoop bilang isang naka-compress na archive, ng paggamit ng Hadoop bilang isang kapaligiran sa ETL, at iba pa. Ngunit talagang, kung idagdag mo ang produkto sa Hadoop mismo, gumagana ito nang mahusay sa isang partikular na espasyo. Kaya't habang kritikal ako ng katutubong Hadoop, hindi ako kritikal sa Hadoop kapag talagang nagdaragdag ka ng ilang kapangyarihan dito. Sa palagay ko, ang uri ng katanyagan ni Hadoop ay ginagarantiyahan ang hinaharap nito. Ibig sabihin nito, kahit na ang bawat linya ng code na nakasulat sa Hadoop ay nawala, hindi ako naniniwala na mawala ang HDFS API. Sa madaling salita, sa palagay ko, ang file system, API, ay narito upang manatili, at marahil YARN, ang scheduler na tumitingin dito.


Kung talagang tinitingnan mo iyon, iyon ay isang napakahalagang kakayahan at makikita kong uri ng waks sa tungkol sa isang minuto, ngunit ang iba pang bagay na, sabihin natin, ang mga kapana-panabik na mga tao tungkol sa Hadoop ay ang buong bukas na mapagkukunan ng larawan. Kaya sulit ang pagdaan kung ano ang bukas na mapagkukunan ng larawan sa mga tuntunin ng itinuturing kong totoong kakayahan. Habang ang Hadoop at lahat ng mga sangkap nito ay tiyak na maaaring gawin ang tinatawag nating mga haba ng data - o tulad ng gusto kong tawagan ito, isang data reservoir - tiyak na isang napakahusay na lugar ng pagtatanghal upang ihulog ang data sa samahan o upang mangolekta ng data sa samahan - napakaganda para sa mga sandbox at para sa data ng pagkagalit. Napakabuti bilang isang prototyping platform ng pag-unlad na maaari mong ipatupad sa pagtatapos ng araw, ngunit alam mo bilang isang kapaligiran sa pag-unlad na halos lahat ng nais mo ay nandiyan. Bilang isang archive store, medyo nakuha ang lahat ng kailangan mo, at siyempre hindi ito mahal. Hindi sa palagay ko dapat nating hiwalayan ang alinman sa dalawang bagay na ito mula sa Hadoop kahit na hindi pormal na ito, kung gusto mo, mga bahagi ng Hadoop. Ang online wedge ay nagdala ng isang malawak na halaga ng analytics sa open-source mundo at maraming mga analytics na ito ay pinapatakbo ngayon sa Hadoop dahil nagbibigay sa iyo ng isang maginhawang kapaligiran kung saan maaari kang aktwal na kumuha ng maraming panlabas na data at magsimula lamang maglaro sa isang analytical sandbox.


At pagkatapos ay nakuha mo ang mga bukas na mapagkukunan na bukas, pareho sa mga ito ay ang pag-aaral ng makina. Pareho ang mga ito ay napakalakas sa kamalayan na nagpapatupad sila ng mga malakas na algorithm ng analitik. Kung pinagsama mo ang mga bagay na ito, nakuha mo ang mga kernels ng ilang napaka, napakahalagang kakayahan, na sa isang paraan o sa iba pang posibilidad na - kung ito ay bubuo sa sarili o kung ang mga nagtitinda ay pumupuno upang punan ang nawawalang mga piraso - malamang na magpatuloy sa loob ng mahabang panahon at tiyak na sa palagay ko ang pag-aaral ng makina ay mayroon na talagang malaking epekto sa mundo.


Ang ebolusyon ng Hadoop, binago ng YARN ang lahat. Ang nangyari ay MapReduce ay medyo welded sa unang bahagi ng file system HDFS. Kapag ipinakilala ang YARN, lumikha ito ng isang kakayahan sa pag-iskedyul sa unang paglabas nito. Hindi mo inaasahan ang sobrang sopistikadong pag-iskedyul mula sa unang paglaya, ngunit nangangahulugan ito na ngayon ay hindi na kinakailangang isang patch environment. Ito ay isang kapaligiran kung saan maaaring mag-iskedyul ng maraming trabaho. Sa sandaling nangyari iyon, mayroong isang buong serye ng mga nagtitinda na nagpalayo sa Hadoop - pumasok lamang sila at nakakonekta dito sapagkat pagkatapos ay maaari lamang nilang tignan ito bilang pag-iiskedyul ng kapaligiran sa isang system ng file at maaari nilang matugunan ang mga bagay-bagay sa ito. Mayroong kahit na mga nagtitinda ng database na nagpatupad ng kanilang mga database sa HDFS, dahil kukuha lang sila ng makina at inilalagay lamang ito sa HDFS. Sa pamamagitan ng pag-cascading at sa YARN, ito ay nagiging isang napaka-kagiliw-giliw na kapaligiran dahil maaari kang lumikha ng mga kumplikadong mga daloy ng trabaho sa HDFS at nangangahulugan ito na maaari mong simulan ang pag-iisip nito bilang talagang isang platform na maaaring tumatakbo ng maraming mga trabaho nang sabay-sabay at itulak ang sarili patungo sa punto ng paggawa ng mga bagay na kritikal sa misyon. Kung gagawin mo iyon, marahil ay kailangan mong bumili ng ilang mga bahagi ng third-party tulad ng seguridad at iba pa at iba pa, na si Hadoop ay hindi talaga mayroong isang account sa pag-audit upang punan ang mga gaps, ngunit ikaw makarating sa puntong kung saan kahit na may katutubong bukas na mapagkukunan ay maaari kang gumawa ng ilang mga kagiliw-giliw na bagay.


Sa mga tuntunin ng kung saan sa palagay ko ay pupunta talaga si Hadoop, personal kong naniniwala na ang HDFS ay magiging isang default scale-out file system at samakatuwid ay magiging OS, ang operating system, para sa grid para sa daloy ng data. Sa palagay ko nakakakuha ito ng isang malaking hinaharap sa iyon at sa palagay ko hindi ito titigil doon. At sa palagay ko sa aktwal na katotohanan ang ekosistema ay nakakatulong lamang dahil halos lahat, ang lahat ng mga nagtitinda sa kalawakan, ay talagang pagsasama sa Hadoop sa isang paraan o sa iba pa at pinapagana lamang nila ito. Sa mga tuntunin ng isa pang punto na nagkakahalaga ng paggawa, sa mga tuntunin ng overlay ng Hadoop, hindi ba ito isang napakahusay na platform kasama ang paralelisasyon. Kung titingnan mo talaga kung ano ang ginagawa, kung ano ang tunay na ginagawa ay ang pagkuha ng isang snapshot nang regular sa bawat server dahil sa pagpapatupad nito ng mga trabaho sa MapReduce. Kung magdidisenyo ka para sa talagang mabilis na pagkakapareho, hindi ka gagawa ng anumang bagay na ganyan. Sa aktwal na, marahil ay hindi ka gumagamit ng MapReduce sa sarili nitong. Ang MapReduce lamang ang sasabihin ko sa kalahati na may kakayahang pagkakatulad.


Mayroong dalawang mga pamamaraan sa pagkakatulad: ang isa ay sa pamamagitan ng mga proseso ng pipelining at ang isa pa ay sa pamamagitan ng paghati ng data na MapReduce at ginagawa nito ang paghahati ng data kaya maraming mga trabaho kung saan ang MapReduce ay hindi talaga magiging pinakamabilis na paraan upang gawin ito, ngunit gagawin nito bigyan ka paralelismo at walang pag-aalis mula doon. Kapag nakakuha ka ng maraming data, ang uri ng kapangyarihan ay hindi karaniwang kapaki-pakinabang. Ang pananaw, tulad ng nasabi ko na, ay isang napakabata na kakayahan sa pag-iskedyul.


Ang Hadoop ay, uri ng pagguhit ng linya sa buhangin dito, ang Hadoop ay hindi isang bodega ng data. Napakalayo nito mula sa pagiging isang bodega ng data na halos isang hindi makatuwirang mungkahi upang sabihin na ito ay. Sa diagram na ito, ang ipinapakita ko sa tuktok ay isang uri ng daloy ng data, na nagmula sa isang reservoir ng data ng Hadoop sa isang gargantuan scale-out database na kung saan ay talagang gagawin namin, isang bodega ng data ng negosyo. Nagpapakita ako ng mga database ng legacy, pagpapakain ng data sa data ng bodega at aktibidad ng offload na lumilikha ng mga database ng offload mula sa bodega ng data, ngunit iyon ay talagang isang larawan na nagsisimula akong makakita, at sasabihin kong ito ay tulad ng unang henerasyon ng ano ang nangyayari sa bodega ng data kasama ang Hadoop. Ngunit kung titingnan mo mismo ang data ng bodega, napagtanto mo na sa ilalim ng bodega ng data, mayroon kang isang optimizer. Naipamahagi mo na ang mga manggagawa sa query sa napakaraming mga proseso na nakaupo sa marahil napakaraming maraming mga disk. Iyon ang nangyayari sa isang bodega ng data. Iyon talaga ang uri ng arkitektura na itinayo para sa isang bodega ng data at kinakailangan ng isang mahabang panahon upang makabuo ng isang bagay na ganoon, at si Hadoop ay wala nang anuman. Kaya ang Hadoop ay hindi isang bodega ng data at hindi ito magiging isa, sa palagay ko, anumang oras sa lalong madaling panahon.


Mayroon itong reservoir ng kamag-anak na data na ito, at ang uri nito ay mukhang kawili-wili kung titingnan mo lamang ang mundo bilang isang serye ng mga kaganapan na dumadaloy sa samahan. Iyon ang ipinapakita ko sa kaliwang bahagi ng diagram na ito. Ang pagkakaroon nito ay dumaan sa isang pag-filter at kakayahan sa pag-ruta at ang mga bagay na kailangang pumunta para sa streaming ay napapagod ng mga streaming apps at ang lahat ay dumiretso sa data ng reservoir kung saan ito ay handa at nalinis, at pagkatapos ay ipinasa ng ETL sa alinman sa isang solong data bodega o isang logical warehouse ng data na binubuo ng maraming mga makina. Ito ay, sa aking palagay, isang natural na linya ng pag-unlad para sa Hadoop.


Sa mga tuntunin ng ETW, ang isa sa mga bagay na nagkakahalaga ng uri ng pagturo ay ang data ng bodega mismo ay talagang inilipat - hindi ito kung ano ito. Tiyak, ngayon, inaasahan mong mayroong isang hierarchical na kakayahan sa bawat hierarchical data ng kung ano ang mga tao, o ilang mga tao, tumawag sa mga dokumento sa bodega ng data. Iyon si JSON. Posibleng, mga query sa network na mga database ng graph, marahil analytics. Kaya, kung ano ang lumipat kami patungo sa isang ETW na aktwal na nakakuha ng isang mas kumplikadong karga sa trabaho kaysa sa dati nating nakasanayan. Kaya ang uri ng kawili-wili dahil sa isang paraan ay nangangahulugang ang bodega ng data ay nakakakuha ng mas sopistikado, at dahil doon, magiging isang mas mahabang oras bago makarating ang Hadoop kahit saan malapit dito. Ang kahulugan ng bodega ng data ay umaabot, ngunit kasama pa rin ang pag-optimize. Kailangan mong magkaroon ng isang kakayahan sa pag-optimize, hindi lamang sa mga query ngayon ngunit sa lahat ng mga aktibidad na ito.


Iyon lang, talaga. Iyon lang ang nais kong sabihin tungkol sa Hadoop. Sa palagay ko mahahawakan ko si Ray, na wala pang slide, ngunit laging magaling siyang makipag-usap.


Eric Kavanagh: kukuha ako ng mga slide. Nariyan ang aming kaibigan, si Ray Wang. Kaya, Ray, ano ang iyong mga saloobin sa lahat ng ito?


Ray Wang: Ngayon, sa palagay ko marahil ay isa sa mga pinaka-malubhang at mahusay na kasaysayan ng mga tindahan ng key-halaga at kung saan ang Hadoop ay may kaugnayan sa negosyo na wala, kaya't madalas akong natututo nang nakikinig sa Robin.


Sa totoo lang, mayroon akong isang slide. Maaari akong mag-pop up ng isang slide dito.


Eric Kavanagh: Sige na lang at mag-click sa, mag-click sa simula at pumunta upang ibahagi ang iyong desktop.


Ray Wang: Mayroon ka, doon ka pupunta. Magbabahagi talaga ako. Maaari mong makita ang app mismo. Tingnan natin kung paano ito napunta.


Ang lahat ng pag-uusap na ito tungkol sa Hadoop at pagkatapos ay malalim kaming nag-uusap tungkol sa mga teknolohiyang nariyan at kung saan pupunta si Hadoop, at maraming beses na nais kong dalhin ito muli upang magkaroon talaga ng talakayan sa negosyo. Ang isang pulutong ng mga bagay-bagay na nangyayari sa panig ng teknolohiya ay ang bahaging ito kung saan pinag-uusapan natin ang tungkol sa mga bodega ng data, pamamahala ng impormasyon, kalidad ng data, mastering ang data na iyon, at sa gayon ay may posibilidad nating makita ito. Kaya kung titingnan mo ang graph na ito dito sa pinakadulo, napaka-kagiliw-giliw na ang mga uri ng mga indibidwal na nakikipag-usap sa atin tungkol sa Hadoop. Mayroon kaming mga teknolohikal at mga siyentipiko ng data na nakakakuha ng geeking, pagkakaroon ng maraming kaguluhan, at kadalasan tungkol sa mga mapagkukunan ng data, di ba? Paano natin makokontrol ang mga mapagkukunan ng data? Paano natin ito makuha sa tamang antas ng kalidad? Ano ang gagawin natin tungkol sa pamamahala? Ano ang maaari nating gawin upang tumugma sa iba't ibang uri ng mga mapagkukunan? Paano namin panatilihin ang linya? At lahat ng uri ng talakayan. At paano natin makukuha ang higit pang SQL sa labas ng aming Hadoop? Kaya ang bahaging iyon ay nangyayari sa antas na ito.


Pagkatapos sa bahagi ng impormasyon at orkestasyon, ito ay kung saan nakakakuha ng kawili-wili. Nagsisimula kami upang itali ang mga output ng pananaw na ito na nakuha namin o kinukuha namin ito pabalik sa mga proseso ng negosyo? Paano natin ito ibabalik sa anumang uri ng mga modelo ng metadata? Kinokonekta ba natin ang mga tuldok sa pagitan ng mga bagay? At kaya ang mga bagong pandiwa at talakayan tungkol sa kung paano namin ginagamit ang data na iyon, paglipat mula sa tradisyonal na tayo ay nasa isang mundo ng CRUD: lumikha, basahin, i-update, tanggalin, tanggalin, sa isang mundo na pinag-uusapan tungkol sa kung paano tayo nakikipag-ugnay o nagbabahagi o nakikipagtulungan o gusto o hilahin ang isang bagay.


Iyon ay kung saan nagsisimula kaming makita ang maraming kaguluhan at pagbabago, lalo na tungkol sa kung paano hilahin ang impormasyong ito at dalhin ito sa halaga. Iyon ang talakayan na hinihimok ng teknolohiya sa ilalim ng pulang linya. Sa itaas ng pulang linya, nakakakuha kami ng mismong mga katanungan na lagi naming nais na tanungin at ang isa sa kanila na palagi naming dinadala ay tulad ng, halimbawa, marahil ang tanong sa tingian para sa iyo ay tulad ng, "Bakit ang mga red sweaters ay nagbebenta ng mas mahusay sa Alabama kaysa sa mga asul na sweaters sa Michigan? " Maaari mong isipin ang tungkol dito at sabihin, "Iyon ang uri ng kawili-wili." Nakikita mo ang pattern na iyon. Itatanong namin ang tanong na iyon, at nagtataka tayo, "Uy, ano ang ginagawa namin?" Marahil ay tungkol sa mga paaralan ng estado - Michigan laban sa Alabama. OK, nakuha ko ito, nakikita ko kung saan kami pupunta. At kaya nagsisimula kaming makuha ang bahagi ng negosyo ng bahay, ang mga tao sa pananalapi, mga taong nagkakaroon ng tradisyonal na mga kakayahan ng BI, ang mga tao sa marketing, at ang mga tao sa HR na nagsasabing, "Nasaan ang mga pattern ko?" Paano tayo makakarating sa mga pattern na iyon? At sa gayon nakikita namin ang isa pang paraan ng pagbabago sa panig ng Hadoop. Talagang tungkol sa kung paano namin mas mabilis na ma-update ang mga pananaw nang mas mabilis. Paano natin gagawin ang mga ganitong uri ng koneksyon? Nagpapatuloy ito sa mga taong gumagawa ng tulad ng, ad: tech na karaniwang sinusubukan upang ikonekta ang mga ad at may-katuturang nilalaman mula sa anumang bagay mula sa mga network ng pag-bid sa real-time hanggang sa mga kontekstwal na ad at paglalagay ng ad at ginagawa iyon.


Kaya nakakainteres na to. Nakikita mo ang pag-unlad ng Hadoop mula sa, "Hoy, narito ang solusyon sa teknolohiya. Narito ang kailangan nating gawin upang mailabas ang impormasyong ito sa mga tao." Pagkatapos habang tumatawid ito sa linya ng bahagi ng negosyo, narito kung saan ito ay nakakakuha ng kawili-wili. Ito ang pananaw. Nasaan ang pagganap? Nasaan ang pagbabawas? Paano namin hinuhulaan ang mga bagay? Paano tayo nakakaimpluwensya? At pagkatapos ay dalhin iyon sa huling antas na kung saan aktwal na nakikita natin ang isa pang hanay ng mga pagbabago sa Hadoop na nangyayari sa paligid ng mga sistema ng desisyon at pagkilos. Ano ang susunod na pinakamahusay na pagkilos? Kaya alam mo na ang mga asul na sweaters ay nagbebenta ng mas mahusay sa Michigan. Nakaupo ka sa isang toneladang asul na sweaters sa Alabama. Ang malinaw na bagay ay, "Yeah, well hayaan natin itong maipadala doon." Paano natin ito gagawin? Ano ang susunod na hakbang? Paano natin itatali iyon? Marahil ang susunod na pinakamahusay na aksyon, marahil ito ay isang mungkahi, marahil ito ay isang bagay na makakatulong sa iyo na maiwasan ang isang isyu, marahil hindi rin ito pagkilos, na kung saan ay isang pagkilos sa sarili nito. Kaya nagsisimula kaming makita ang uri ng mga pattern na ito ay lumabas. At ang kagandahan nito pabalik sa iyong sinasabi tungkol sa mga tindahan ng key-value, Robin, ay napakabilis na nangyayari. Nangyayari ito sa paraang hindi natin iniisip ang ganitong paraan.


Marahil ay sasabihin ko sa huling limang taon na napulot namin. Sinimulan namin ang pag-iisip sa mga termino kung paano namin maiikot muli ang mga tindahan ng key-value, ngunit sa huling limang taon, tinitingnan ito ng mga tao at ito ay tulad ng mga siklo ng teknolohiya na paulit-ulit ang mga pattern sa 40-taong mga pattern, kaya ito ay mabait ng isang nakakatawang bagay kung saan kami tumitingin sa ulap at ako ay tulad ng pagbabahagi ng pangunahing oras ng oras. Tumitingin kami sa Hadoop at tulad ng tindahan ng key-halaga - marahil ito ay isang data mart, mas mababa sa isang bodega ng data - at kaya nagsisimula kaming makita muli ang mga pattern na ito. Ang sinusubukan kong gawin ngayon ay pag-isipan kung ano ang ginagawa ng mga tao 40 taon na ang nakakaraan? Ano ang mga pamamaraan at pamamaraan at pamamaraan na inilalapat na limitado ng mga teknolohiyang mayroon ang mga tao? Iyan ang uri ng pagmamaneho ng proseso ng pag-iisip na ito. Kaya habang dumadaan tayo sa mas malaking larawan ng Hadoop bilang isang tool, kapag bumalik tayo at iniisip ang tungkol sa mga implikasyon ng negosyo, ito ay uri ng landas na karaniwang nakukuha natin sa mga tao upang makita kung anong mga piraso, kung anong mga bahagi ang nasa data mga landas sa pagpapasya. Ito ay isang bagay na nais kong ibahagi. Ito ay uri ng pag-iisip na ginagamit namin sa loob at sana ay nagdaragdag sa talakayan. Kaya ibabalik ko ito sa iyo, Eric.


Eric Kavanagh: Napakaganda. Kung maaari kang dumikit para sa ilang Q&A. Ngunit nagustuhan ko na ibalik mo ito sa antas ng negosyo dahil sa pagtatapos ng araw, ito ay tungkol sa negosyo. Ito ay tungkol sa pagkuha ng mga bagay at siguraduhin na gumastos ka ng pera nang matalino at iyon ang isa sa mga tanong na nakita ko na, kaya ang mga nagsasalita ay maaaring mag-isip tungkol sa kung ano ang TCL ng pagpunta sa ruta ng Hadoop. Mayroong ilang mga matamis na lugar sa pagitan, halimbawa, gamit ang mga tool sa istante ng opisina upang gawin ang mga bagay sa ilang tradisyonal na paraan at paggamit ng mga bagong hanay ng mga tool, dahil muli, isipin ito, maraming bagay na ito ay hindi bago, ito ay uri lamang ng coalescing sa isang bagong paraan ay, sa palagay ko, ang pinakamahusay na paraan upang ilagay ito.


Kaya sige at ipakilala ang aming kaibigan, si Nikita Ivanov. Siya ang tagapagtatag at CEO ng GridGain. Nikita, tutuloy ako at ibibigay ang mga susi sa iyo, at naniniwala ako na nasa labas ka. Naririnig mo ba ako Nikita?


Nikita Ivanov: Oo, narito ako.


Eric Kavanagh: Magaling. Kaya ang sahig ay iyo. Mag-click sa slide na iyon. Gamitin ang down arrow, at ilayo ito. Limang minuto.


Nikita Ivanov: Aling slide ang mai-click ko?


Eric Kavanagh: I-click lamang saanman sa slide na iyon at pagkatapos mong gamitin ang down arrow sa keyboard upang lumipat. Mag-click lamang sa slide mismo at gamitin ang down arrow.


Nikita Ivanov: Alright kaya ilang mga mabilis na slide lamang tungkol sa GridGain. Ano ang ginagawa natin sa konteksto ng pag-uusap na ito? Ang GridGain ay karaniwang gumagawa ng isang in-memory computing software at bahagi ng platform na binuo namin ay nasa-memorya ng Hadoop accelerator. Sa mga tuntunin ng Hadoop, malamang na isipin natin ang ating sarili bilang mga dalubhasa sa pagganap ng Hadoop. Kung ano ang ginagawa namin, mahalagang, sa tuktok ng aming pangunahing platform sa pag-compute ng kompyuter na binubuo ng mga teknolohiya tulad ng data grid, memorya ng streaming at computation grids ay makakapag-plug-at-play ng Hadoop accelerator. Napakasimpleng iyon. Mas maganda kung maaari tayong bumuo ng ilang uri ng plug-and-play solution na maaaring mai-install nang tama sa pag-install ng Hadoop. Kung ikaw, ang nag-develop ng MapReduce, ay nangangailangan ng pagpapalakas nang walang kinakailangang sumulat ng anumang bagong software o pagbabago ng code o pagbabago, o karaniwang magkaroon ng isang minimal na pagbabago sa pagsasaayos sa kumpol ng Hadoop. Iyon ang aming binuo.


Sa panimula, ang nasa-memorya na Hadoop accelerator ay batay sa pag-optimize ng dalawang sangkap sa Hadoop ecosystem. Kung sa tingin mo tungkol sa Hadoop, higit sa lahat batay sa HDFS, na kung saan ang file system. Ang MapReduce, na kung saan ay ang balangkas upang patakbuhin ang mga kumpetisyon na magkatulad sa tuktok ng system file. Upang ma-optimize ang Hadoop, na-optimize namin ang pareho sa mga sistemang ito. Binuo namin ang sistema ng file ng memorya na ganap na katugma, 100% katugmang plug-and-play, kasama ang HDFS. Maaari kang tumakbo sa halip ng HDFS, maaari kang tumakbo sa tuktok ng HDFS. At nabuo din namin ang nasa-memorya na MapReduce na plug-and-play na katugma sa Hadoop MapReduce, ngunit mayroong maraming pag-optimize sa kung paano gumagana ang daloy ng trabaho ng MapReduce at kung paano gumagana ang iskedyul sa MapReduce.


Kung titingnan mo, halimbawa sa slide na ito, kung saan ipinapakita namin ang uri ng salansan ng pagkopya. Sa kaliwang bahagi, mayroon kang iyong pangkaraniwang operating system na may GDM at sa tuktok ng diagram na ito mayroon kang application center. Sa gitna mayroon kang Hadoop. At ang Hadoop ay muli batay sa HDFS at ang MapReduce. Kaya ito ay kumakatawan sa diagram na ito, na kung ano ang uri ng pag-embed sa Hadoop stack. Muli, ito ay plug-and-play; hindi mo kailangang baguhin ang anumang code. Gumagana lamang ito sa parehong paraan. Sa susunod na slide, ipinakita namin ang mahalagang kung paano namin na-optimize ang daloy ng MapReduce. Iyon marahil ang pinaka-kagiliw-giliw na bahagi dahil nagbibigay ito sa iyo ng pinaka-pakinabang kapag nagpatakbo ka ng mga trabaho sa MapReduce.


Ang tipikal na MapReduce, kapag nagsusumite ka ng trabaho, at sa kaliwa ay mayroong diagram, karaniwang application. Kaya kadalasan ikaw ay nagsusumite ng trabaho at ang trabaho ay napupunta sa isang tracker ng trabaho. It interacts with the Hadoop name node and the name node is actually the piece of software that manages the interaction with the digital files, and kind of keeps the directory of files and then the job tracker interacts with the task tracker on each individual node and the task tracker interacts with a Hadoop data node to get data from. So that's basically a very kind of high-level overview of how your MapReduce job gets in the computers. As you can see what we do with our in-memory, Hadoop MapReduce will already completely bypass all this complex scheduling that takes a lot of time off your execution and go directly from client to GridGain data node and GridGain data node keeps all that e-memory for a blatantly fast, fast execution.


So all in all basically, we allow it to get anywhere from 5x up all the way to 100x performance increase on certain types of loads, especially for short leaf payloads where you literally measure every second. We can give you a dramatic boost in performance with literally no core change.


Alright, that's all for me.


Eric Kavanagh: Yes, stick around for the Q&A. No doubt about it.


Let me hand it off to John Santaferraro. John, just click on that slide. Use the down arrow to move on.


John Santaferraro: Alright. Thanks a lot, Eric.


My perspective and Actian's perspective really is that Hadoop is really about creating value and so this is an example from digital media. A lot of the data that is pumping into Hadoop right now has to do with digital media, digital marketing, and customer, so there is great opportunity - 226 billion dollars of retail purchases will be made online next year. Big data and Hadoop is about capturing new data to give you insight to get your share of that. How do you drive 14% higher marketing return and profits based on figuring out the right medium X and the right channels and the right digital marketing plan? How do you improve overall return on marketing investment? By the way, in 2017, what we ought to be thinking about when we look at Hadoop is the fact that CMO, chief marketing officer, spending in 2017 will outpace that of IT spending, and so it really is about driving value. Our view is that there are all kinds of noise being made on the left-hand side of this diagram, the data pouring into Hadoop.


Ultimately, our customers are wanting to create customer delight, competitive advantage, world-class risk management, disruptive new business models, and to do all of that to deliver transformational value. They are looking to capture all of this data in Hadoop and be able to do best-in-class kinds of things like discovery on that data without any limitations, no latency at any scale of the data that lives in there - moving from reactive to predictive kinds of analytics and doing everything dynamically instead of looking at data just as static. What pours into Hadoop? How do you analyze it when it arrives? Where do you put it to get the high-performance analytics? And ultimately moving everything down to a segment of one.


So what we've done at Actian in the Actian Analytics Platform, we have built an exoskeleton around Hadoop to give it all of these capabilities that you need so you are able to connect to any data source bringing it into Hadoop, delivering it as a data service wherever you need it. We have libraries of analytics and data blending and data enrichment kinds of operators that you literally drag and drop them so that you can build out these data and analytic workflows, and without ever doing any programming, we will push that workload via YARN right down to the Hadoop nodes so you can do high-performance data science natively on Hadoop. So all of your data prep, all of your data science happening on Hadoop highly parallelized, highly optimized, highly performance and then when you need to, you move it to the right via a high-speed connection over to our high-performance analytic engine, where you can do super-low latency kinds of analytics, and all of that delivering out these real-time kinds of analytics to users, machine-to-machine kinds of communication, and betting those on analytics and business processes, feeding big data apps or applications.


This is an example of telco churn, where at the top of this chart if you're just building telco churn for example, where you have captured one kind of data and poured that into Hadoop, I'd be able to identify about 5% of your potential churn audience. As you move down this chart and add additional kinds of data sources, you do more complex kinds of analytics in the center column there. It allows you to act against that churn in a way that allows you to identify. You move from 5% identification up to 70% identification. So for telecommunications companies, for retail organizations, for any of the fast providers, anybody that has a customer base where there is a fear and a damage that is caused by churn.


This kind of analytics running on top of that exoskeleton-enabled version of Hadoop is what drives real value. What you can see here is that kind of value. This is an example taken from off of the annual report of a telecommunications company that shows their actual total subscribers, 32 million. Their existing churn rate which every telco reports 1.14, 4.3 million subscribers lost every year, costing them 1.14 billion dollars as well as 2.1 billion in revenue. This is a very modest example of how you generate value out of your data that lives in Hadoop, where you can see the potential cost of reacquisition where the potential here is to use Hadoop with the exoskeleton running analytics to basically help this telecommunications company save 160 million dollars as well as avoid 294 million in loss. That's the kind of example that we think is driving Hadoop forward.


Eric Kavangh: Alright, fantastic. And Jim, let me go ahead and give the keys to you. So, Jim Vogt. If you would click on that slide and use the down arrow in your keyboard.


Jim Vogt: I got it. Great picture. OK, thank you very much. I'll tell a little bit about Zettaset. We've been talking about Hadoop all afternoon here. What's interesting about our company is that we basically spend our careers hardening new technology for the enterprise - being able to plug the gaps, if you will, in our new technology to allow it to be widely deployed within our enterprise operational environment. There are a couple of things happening in the market right now. It's kind of like a big open pool party, right? But now the parents have come home. And basically we're trying to bring this thing back to some sense of reality in terms of how you build a real infrastructure piece here that can be scalable, repeatable, non-resource intensive, and secure, most importantly secure. In the marketplace today, most people are still checking the tires on Hadoop. The main reason is, there is a couple of things. One is that within the open source itself, although it does some very useful things in terms of being able to blend data sources, being able to find structure data and very useful data sources, it really lacks for a lot of the hardening and enterprise features around security, higher availability and repeatability that people need to deploy not just a 10- or 20-node cluster, but a 2, 000- and 20, 000-node cluster - there are multiple clusters. What has been monetized in the last two years has been mainly pro-services around setting up these eval clusters. So there is a not a repeatable software process to actually actively deploy this into the marketplace.


So what we built in our software is a couple of things. We're actually transparent into the distributions. At the end of the day, we don't care if it's CVH or HDP, it's all open source. If you look at the raw Apache components that built those distributions, there is really no reason why you have to lock yourself into any one distribution. And so, we work across distributions.


The other thing is that we fill in the gaps transparently in terms of some of the things that are missing within the code itself, the open source. So we talked about HA. HA is great in terms of making no failover, but what happens if any of the active processes that you're putting on these clusters fail? That could take it down or create a security hole, if you will. When we built software components into our solution, they all fall under an HA umbrella where we're actively monitoring all the processes running on the cluster. If code roles goes down, you take the cluster down, so basically, meaning no failover is great, unless you're actively monitoring all the processes running on the cluster, you don't have true HA. And so that's essential of what we developed here at Zettaset. And in a way that we've actually got a patent that has been issued on this and granted last November around this HA approach which is just quite novel and different from the open-source version and is much more hardened for the enterprise.


The second piece is being able to do real RBAC. People are talking about RBAC. They talk about other open-source projects. Why should you have to recreate all those entries and all those users and roles when they already exist in LDAP or in active directory? So we link those transparently and we fold all our processes not only under this RBAC umbrella, but also under the HA umbrella. They start to layer into this infrastructure encryption, encryption at data rest, state of motion, all the hardened security pieces that you really need to secure the information.


What is really driving this is our industries, which I have on the next slide, which profit finance and healthcare and have our compliances. You have to be able to protect this sets of data and you have to be able to do it on a very dynamic fashion because this data can be sitting anywhere across these parallel nodes and clusters and it can be duplicated and so forth, so essentially that's the big umbrella that we built. The last piece that people need is they need to be able to put the pieces together. So having the analytics that John talked to and being able to get value out of data and do that through an open interface tapped into this infrastructure, that's what we built in our software.


So the three cases that I had in here, and you guys are popping me along here were really around finance, healthcare and also cloud, where you're having to deal with multi-tenant environments and essentially have to separate people's sensitive data, so security and performance are key to this type of application whether its cloud or in a sensitive data environment.


The last slide here really talks to this infrastructure that we put together as a company is not just specific to Hadoop. It's something that we can equally apply to other NoSQL technologies and that's where we're taking our company forward. And then we're also going to pull in other open-source components, HBase and so forth, and secure those within that infrastructure in a way that you're not tied to any one distribution. It's like you truly have an open, secure and robust infrastructure for the enterprise. So that's what we're about and that's what we're doing to basically accelerate adoption of Hadoop so people get away from sending twenty-node clusters and actually have the confidence to employ a much larger environment that is more eyes on Hadoop and speeds the market along. Salamat.


Eric Kavanagh: That's fantastic, great. Stick around for the Q&A. Finally, last but not the least, we've got Phu Hoang, CEO of DataTorrent. Let me go ahead and hand the keys to you. The keys are now yours. Click anywhere on that slide, use the down arrow on your keyboard to move them along.


Phu Hoang: Thank you so much.


So yes, I'm here to talk about DataTorrent and I actually think the story of DataTorrent is a great example of what Robin and Ray have been talking about through this session where they say that Hadoop is a great body of work, a great foundation. But it has a lot of goals. But the future is bright because the Hadoop ecosystem where more players are coming in are able to build and add value on top of that foundation to really bring it from storage to insights to action, and really that's the story of DataTorrent.


What I'm going to talk about today is really about real-time big data screening processing. What you see, as I'm interacting with customers, I've never met a single customer that says to me, "Hey, my goal is to take action hours or days after my business events arrive." In fact, they all say they want to take action immediately after the events occur. The problem with the delay is that, that is what Hadoop is today with its MapReduce paradigm. To understand why, it's worth revisiting the history of Hadoop.


I was leading much of Yahoo engineering when we hired Doug Cutting, the creator of Hadoop, and assigned over a hundred engineers to build out Hadoop to power our web search, advertising and data science processing. But Hadoop was built really as a back system to read and write and process these very large files. So while it's great disruptive technology because of its massive scalability and high ability at no cost, it has a hole in that there is a lot of latency to process these large files. Now, it is fair to say that Hadoop is now becoming the plateau operating system that is truly computing and is gaining wide adoption across many enterprises. They are still using that same process of collecting events into large files, running these batch Hadoop jobs to get there inside the next day. What enterprise customers now want is that they want those exact same insights but they want to build to get these insights much earlier, and this will enable them to really act on these events as the event happens, not after maybe hours later after it has been back processed.


Eric Kavanagh: Do you want to be moving your slides forward, just out of curiosity?


Phu Hoang: Yeah it's coming now. Let me illustrate that one example. In this example, using Hadoop in back-slope where you're constantly engaging with files, first an organization might accumulate all the events for the full day, 24 hours' worth of data. And then they batch process it, which may take another eight hours using MapReduce, and so now there is 32 hours of elapsed time before they get any insight. But with real-time stream processing, the events are coming in and are getting processed immediately, there is no accumulation time. Because we do all this processing, all in memory, the in-memory processing is also sub-second. All the time, you are reducing the elapsed time on 30 hours plus to something that is very small. If you're reducing 30 hours to 10 hours, that's valuable but if we can reduce it to a second, something profound happens. You can now act on your event while the event is still happening, and this gives enterprises the ability to understand what their products are doing, what their business is doing, what their users are doing in real time and react to it.


Let's take a look at how this happens. Really, a combination of market forces and technology has enabled a solution like DataTorrent to come together, so from a market perspective, Hadoop is really becoming the de facto big data architecture as we said, right? In an IDC study in 2013, they say that by the end of this year, two-thirds of enterprises would have deployed Hadoop and for DataTorrent, whether that's Apache Hadoop or any of our certified partners like Cloudera or Hortonworks, Hadoop is really clearly the choice for enterprise. From a technology perspective, and I think Robin and Ray alluded to this, Hadoop 2.0 was created to really enable Hadoop to extend to much more general cases than the batch MapReduce paradigm, and my co-founder, Amal, who was at Yahoo leading the development of Hadoop 2.0 really allows this layer of OS to have many more computation paradigms on top of it and real-time streaming is what we chose. By putting this layer of real-time streaming on top of YARN, you can really think of DataTorrent as the real-time equivalent of MapReduce. Whatever you can do in batch with MapReduce, you can now do in streaming with DataTorrent and we can process massive amount of data. We can slice and dice data in multiple dimensions. We have distributed computing and use YARN to give us resources. We have the full ecosystem of the open source Hadoop to enable fast application development.


Let me talk a little bit about the active capabilities of DataTorrent. In five minutes, it is hard for me to kind of give to you much in detail, but let me just discuss and re-differentiate it. First of all, sub-second scalable ingestions, right? This refers to DataTorrent's platform to be able to take that in real-time from hundreds of data sources and begin to process them immediately. This is in direct contact to the back processing of MapReduce that is in Hadoop 1.0 and events can vary in size. They may be as simple as a line in the log file or they may be much more complex like CDR, call data record in the telcom industry. DataTorrent is able to scale the ingestion dynamically up or down depending on the incoming load, and we can deal with tens of millions of incoming events per second. The other major thing here, of course, is the processing itself which is in real-time ETL logic. So once the data is in motion, it is going to go into the ETL logic where you are doing a stack transform and load, and so on. And the logic is really executed by combining a series of what we call operators connected together in a data flow grab. We have open source of over 400 operators today to allow you to build applications very quickly. And they cover everything from input connectors to all kinds of message process to database drivers and connectors where you are to load to all kinds of information to unstream.


The combination of doing all these in memory and building the scale across hundreds of nodes really drive the superior performance. DataTorrent is able to process billions of events per second with sub-second latency.


The last piece that I'd like to highlight is the high-availability architecture. DataTorrent's platform is fully post knowledge; that means that the platform automatically buffers the event and regularly checkpoints the state of the operators on the disk to ensure that there is possibly no problem. The applications can tell you in seconds with no data log and no human intervention. Simply put, data form processes billions of events and allots in data in seconds, it runs 24/7 and it never, ever goes down. The capabilities really set DataTorrent apart from the market and really make it the leading mission-critical, real-time analytics platform for enterprise. With that, we invite you to come visit our website and check us out.


Salamat.


Eric Kavanagh: Yeah, thank you so much. I'll throw a question over to you, really a comment, and let you kind of expound upon it. I really think you're on the ball here with this concept of turning over these operators and letting people use these operators almost like Legos to build big data applications. Can you kind of talk about what goes into the process of taking these operators and stitching them together, how do you actually do that?


Phu Hoang: That's a great question. So first of all, these operators are in your standard application Java Logic. We supply 400 of them. They do all kinds of processing and so to build your application, you really are just connecting operators together into a data flow graph. In our customers, we find that they use a number of operators that we have in our library as well as they take their own job of custom logic and make it an operator so that they can substantiate that into a graph.


Eric Kavanagh: OK, good. I think it's a good segue to bring in John Santaferraro from Actian because you guys have a slightly similar approach, it seems to me, in opening up a sort of management layer to be able to play around with different operators. Can you talk about what you do with respect to what tools we're just talking about, John?


John Santaferraro: Yeah, exactly. We have a library of analytics operators as well as transformational operators, operators for blending and enriching data and it is very similar. You use a drag-and-drop interface to be able to stitch together these data flows or work flows, and even analytic workflows. So it's everything from being able to connect to data, to be able to blend and enrich data, to be able to run data science or machine learning algorithms and then even being able to push that into a high-performance low-latency analytic engine. What we find is that it's all built on the open-source nine project. So we capture a lot of the operators that they are developing and then we take all of that, and via YARN, very similar to what Phu described at DataTorrent, we push that down so that it is parallelized against all of the nodes in a Hadoop cluster. A lot of it is about making the data in Hadoop much more accessible to business users and less-skilled workers, somebody besides a data scientist.


Eric Kavanagh: OK, let me go bring in Nikita once again. I'm going to throw your five up as well. Can you kind of talk about how you approach this solution vis-à-vis what these two gentlemen just talked about? How does someone actually put this stuff together and make use from GridGain?


Nikita Ivanov: Well, I think the biggest difference between us and from practically the rest of them is we don't require you to do any recording - you don't have to do anything, it's a plug-and-play. If you have an application today, it's going to work faster. You don't have to change code; you don't have to do anything; you just have to install GridGain along the side of Hadoop cluster and that's it. So that's the biggest difference and we talked to our customers. There are different myriad of solutions today that ask you to change something: programming, doing your API, using your interfaces and whatnot. Ours is very simple. You don't need to invest a lot of time into the Hadoop ecosystem, and whatever you used to do, the MapReduce or any of the tools continue to use. With GridGain, you don't have to change any single line of code, it's just going to work faster. That's the biggest difference and that's the biggest message for us.


Eric Kavanagh: Let's get Jim back in here too. Jim, your quote is killing me. I had to write it down in between that. I'll put it into some kind of deck, but the Hadoop ecosystem right now is like a pool party and the parents just came home. That is funny stuff man; that is brilliant. Can you kind of talk about how you guys come onto the scene? How do you actually implement this? How long does that take? How does all that work?


Jim Kaskade: Yes. So there are a couple of varieties depending on the target customer, but typically these days, you see evaluations where security is factored in, in some of these hardening requirements that I talked about. What has happened in some other cases, and especially last year where people had big plans to deploy, is that there was kind of a science project, if you will, or somebody was playing with the technology and had a cluster up and working and was working with it but then the security guy shows up, and if it is going to go on a live data center, it has to basically comply with the same requirements that we have for other equipment running in the data center, if it is going to be an infrastructure that we build out. Last year, we had even some banks that told us they were going to deploy 400 to 1, 000 nodes last year and they're still sitting on a 20-node cluster mainly because now a security person has been plugged in. They've got to be worried about financial compliance, about sets of information that is sitting on a cluster, and so forth. It varies by customer, but typically this is kind of what elongates the cycles and this is typical of a new technology where if you really want to deploy this in production environment, it really has to have some of these other pieces including the very valuable open-source pieces, right?


Eric Kavanagh: OK, good. Tingnan natin. I'm going to bring Phu back into the equation here. We've got a good question for you. One of the attendees is asking how is DataTorrent different from Storm or Kafka or the Redis infrastructure. Phu, are you out there? Hey, Phu, can you hear me? Maybe I'm mute.


Let's bring Ray Wang back into this. Ray, you've seen a lot of these technologies and looked at how they worked. I really love this concept of turning over control or giving control to end users of the operators. I like to think of them as like really powerful Legos that they can use to kind of build some of these applications. Can you comment on that? What do you think about all that?


Ray Wang: Coming from my technical background, I'd say I'm scared - I was scared shitless! But honestly, I think it's important, I mean, in order to get scale. There's no way you can only put so many requests. Think about the old way we did data warehousing. In the business I had to file the request for a report so that they could match all the schemes. I mean, it's ridiculous. So we do have to get to a way for the business side of the house and definitely become data jocks. We actually think that in this world, we're going to see more digital artists and people that have the right skills, but also understand how to take that data and translate that into business value. And so these digital artisans, data artisans depending on how you look at this, are going to need both really by first having the curiosity and the right set of questions, but also the knowledge to know when the data set stinks. If I'm getting a false positive or a false negative, why is that happening?


I think a basic level of stats, a basic level of analytics, understanding that there's going to be some training required. But I don't think it's going to be too hard. I think if you get the right folks that should be able to happen. You can't democratize the whole decision-making process. I see that happening. We see that in a lot of companies. Some are financial services clients are doing that. Some of our retail folks are doing that, especially in the razor-thin margins that you are seeing in retail. I was definitely seeing that in high tech just around here in the valley. That's just kind of how people are. It's emerging that way but it's going to take some time because these basic data skills are still lacking. And I think we need to combine that with some of the stuff that some of these guys are doing here on this webinar.


Eric Kavanagh: Well, you bring up a really good point. Like how many controls you want to give to the average end user. You don't want to give an airplane cockpit to someone who's driving a car for the first time. You want to be able to closely control what they have control over. I guess my excitement kind of stems around being able to do things yourself, but the key is you got to put the right person in that cockpit. You got to have someone who really knows what they're doing. No matter what you hear from the vendor community folks, when somebody's more powerful tools are extremely complex, I mean if you are talking about putting together a string of 13, 14, 15 operators to do a particular type of transformation on your data, there are not many people who could do that well. I think we're going to have many, many more people who do that well because the tools are out there now and you can play with the stuff, and there is going to be a drive to be able to perfect that process or at least get good at it.


We did actually lose Phu, but he's back on the line now. So, Phu, the question for you is how is DataTorrent different from, like, Storm or Kafka or Redis or some of these others?


Phu Hoang: I think that's a great question. So, Redis of course is really an in-memory data store and we connect to Redis. We see ourselves as really a processing engine of data, of streaming data. Kafka again is a great bus messaging bus we use. It's actually one of our favorite messaging bus, but someone has to do the big data processing across hundreds of nodes that is fault tolerant, that is scalable, and I repeat that as the job that we play. So, yes, we are similar to Storm, but I think that Storm is really developed a long time ago even before Hadoop, and it doesn't have the enterprise-level thinking about scalability to the hundreds and millions, now even billions of events, nor does it really have the HA capability that I think enterprise requires.


Eric Kavanagh: Great. And you know, speaking of HA, I'll use that as an excuse to bring Robin Bloor back into the conversation. We just talked about this yesterday. What do you mean by high availability? What do you mean by fault tolerance? What do you mean by real time, for example? These are terms that can be bent. We see this all time in the world of enterprise technology. It's a good term that other people kind of glom onto and use and co-opt and move around and then suddenly things don't mean quite what they used to. You know, Robin, one of my pet peeves is this whole universe of VOIP. It's like "Why would we go down in quality? Isn't it important to understand what people say to you and why that matters?" But I'll just ask you to kind of comment on what you think. I'm still laughing about Ray's comment that he's scared shitless about giving these people. What do you think about that?


Ray Wang: Oh, I think it's a Spider-man problem, isn't it? Sa pamamagitan ng mahusay na kapangyarihan ay dumating ang malaking responsibilidad. You really, in terms of the capabilities out there, I mean it changed me actually a long time ago. You know, I would give my ITs some of the capabilities that they have gotten now. We used to do it extraordinary amounts of what I would say was grunt work that the machines do right now and do it in parallel. They do things that we could never have imagined. I mean we would have understood mathematically, but we could never imagine doing. But there is some people understand data and Ray is completely right about this. The reason to be scared is that people will actually start getting wrong conclusions, that they will wrangle with the data and they will apply something extremely powerful and it will appear to suggest something and they will believe it without actually even being able to do anything as simple as have somebody doing audit on whether their result is actually a valid result. We used to do this all the time in the insurance company I used to work for. If anybody did any work, somebody always checks. Everything was checked by at least one person against the person who did it. These environments, the software is extremely strong but you got to have the discipline around it to use it properly. Otherwise, there'll be tears before bedtime, won't there?


Eric Kavanagh: I love that quote, that's awesome. Let me see. I'm going to go ahead and throw just for this slide up here from GridGain, can you talk about, Nikita, when you come in to play, how do you actually get these application super charged? I mean, I understand what you are doing, but what does the process look like to actually get you embedded, to get you woven in and to get all that stuff running?


Nikita Ivanov: Well, the process is relatively simple. You essentially just need to install GridGain and make a small configuration change, just to let Hadoop know that there is now the HDFS if you want to use HDFS and you have to set up which way you want to use it. You can get it from BigTop, by the way. It's probably the easiest way to install it if you're using the Hadoop. That's about it. With the new versions coming up, a little in about few weeks from now, by the end of May, we're going to have even more simplified process for this. So the whole point of the in-memory Hadoop accelerator is to, do not code. Do not make any changes to your code. The only that you need to do is install it and have enough RAM in the cluster and off you go, so the process is very simple.


Eric Kavanagh: Let me bring John Santaferraro back in. We'll take a couple more questions here. You know, John, you guys, we've been watching you from various perspectives of course. You were over at PEAR Excel; that got folded into Actian. Of course, Actian used to be called Ingres and you guys made a couple of other acquisitions. How are you stitching all of that stuff together? I realize you might not want to get too technical with this, but you guys have a lot of stuff now. You've got Data Rush. I'm not sure if it's still the same name, but you got a whole bunch of different products that have been kind of woven together to create this platform. Talk about what's going on there and how that's coming along.


John Santaferraro: The good news is, Eric, that separately in the companies that we're acquired Pervasive, PEAR Excel and even when Actian had developed, everybody developed their product with very similar architectures. Number one, they were open with regards to data and interacting with other platforms. Number two, everything was parallelized to run in a distributed environment. Number three, everything was highly optimized. What that allowed us to do is to very quickly make integration points, so that you can be creating these data flows already today. We have established the integration, so you create the data flows. You do your data blending and enriching right on Hadoop, everything parallelized, everything optimized. When you want, you move that over into our high-performance engines. Then, there's already a high-performance connection between Hadoop and our massively parallel analytic engine that does these super-low-latency things like helping a bank recalculate and recast their entire risk portfolio every two minutes and feeding that into our real-time trading system or feeding it into some kind of a desktop for the wealth manager so they can respond to the most valuable customers for the bank.


We have already put those pieces together. There's additional integration to be done. But today, we have the Actian Analytics Platform as our offering because a lot of that integration was ready to go. It has already been accomplished, so we're stitching those pieces together to drive this entire analytic value chain from connecting the data, all of the processing that you do of it, any kind of analytics you want to run, and then using it to feed into these automated business processes so that you're actually improving that activity over time. It's all about this end-to-end platform that already exists today.


Eric Kavanagh: That's pretty good stuff. And I guess, Jim, I'll bring you back in for another couple of comments, and Robin, I want to bring you in for just one big question, I suppose. Folks, we will keep all these questions - we do pass them on to the people who participated in the event today. If you ever feel a question you asked was not answered, feel free to email yours truly. You should have some information on me and how to get ahold from me. Also, just now I put a link to the full deck with slides from non-sponsoring vendors. So we put the word out to all the vendors out there in the whole Hadoop space. We said, "Tell us what your story is; tell us what's going on." It's a huge file. It's about 40-plus megabytes.


But Jim, let me bring you back in and just kind of talk about - again, I love this concept - where you're talking about the pool party that comes to an end. Could you talk about how it is that you manage to stay on top on what's happening in the open-source community? Because it's a very fast-moving environment. But I think you guys have a pretty clever strategy of serving this sort of enterprise-hardening vendor that sits on top or kind of around that. Can you talk about your development cycles and how you stay on top of what's happening?


Jim Vogt: Sure. It is pretty fast moving in terms of if you look at just a snapshot updates, but what we're shipping in functionality today is about a year to a year and a half ahead of what we can get on security capabilities out to the community today. It's not that they're not going to get there; it just takes time. It's a different process, it has contributors and so forth, and it just takes time. When we go to a customer, we need to be very well versed in the open source and very well versed in mainly the security things that we're bringing. The reason that we're actually issuing patents and submitting patents is that there is some real value in IP, intellectual property, around hardening these open-source components. When we support a customer, we have to support all the varying open-source components and all the varying distributions as we do, and we also need to have the expertise around the specific features that we're adding to that open source to create the solution that we create. As a company, although we don't want the customer to be a Hadoop expert, we don't think you need to be a mechanic to drive the car. We need to be a mechanic that understands the car and how it works and understand what's happening between our code and the open source code.


Eric Kavanagh: That's great. Phu, I'll give you one last question. Then Robin, I have one question for you and then we'll wrap up, folks. We will archive this webcast. As I suggested, we'll be up on insideanalysis.com. We'll also go ahead and have some stuff up on Techopedia. A big thank you to those folks for partnering with us to create this cool new series.


But Phu … I remember watching the demo of the stuff and I was just frankly stunned at what you guys have done. Can you explain how it is that you can achieve that level of no failover?


Phu Hoang: Sure, I think it's a great question. Really, the problem for us had three components. Number one is, you can't lose the events that are moving from operator to operator in the Hadoop cluster. So we have to have event buffering. But even more importantly, inside your operators, you may have states that you're calculating. Let's say you're actually counting money. There's a subtotal in there, so if that node goes down and it's in memory, that number is gone, and you can't start from some point. Where would you start from?


So today, you have to actually do a regular checkpoint of your operator state down to this. You put that interval so it does not become a big overhead, but when a node goes down, it can come back up and be able to go back to exactly the right state where you last checkpointed and be able to bring in the events starting from that state. That allows you to therefore continue as if the event actually has never happened. Of course, the last one is to make sure that your application manager is also fault tolerant so that doesn't go down. So all three factors need to be in place for you to say that you're fully fault tolerant.


Eric Kavanagh: Yeah, that's great. Let me go ahead and throw one last question over to Robin Bloor. So one of the attendees is asking, does anyone think that Hortonworks or another will get soaked up/invested in by a major player like Intel? I don't think there's any doubt about that. I'm not surprised, but I'm fascinated, I guess, that Intel jumped in before like an IBM or an Oracle, but I guess maybe the guys at IBM and Oracle think they've already got it covered by just co-opting what comes out of the open-source movement. What do you think about that?


Robin Bloor: It's a very curious move. We should see in light of the fact that Intel already had its own Hadoop distribution and what it has effectively done is just passed that over to Cloudera. There aren't many powers in the industry as large as Intel and it is difficult to know what your business model actually is if you have a Hadoop distribution, because it is difficult to know exactly what it is going to be used for in the future. In other words, we don't know where the revenue streams are necessarily coming from.


With somebody like Intel, they just want a lot of processes to be solved. It is going to support their main business plan the more that Hadoop is used. It's kind of easy to have a simplistic explanation of what Intel are up to. It's not so easy to guess what they might choose to do in terms of putting code on chips. I'm not 100% certain whether they're going to do that. I mean, it's a very difficult thing to call that. Their next move at the hardware level, I think, is the system on a chip. When we go to the system on a chip, you may actually want to put some basic software on the chip, so to speak. So putting HDFS on there; that might make some sense. But I don't think that that was what that money investment was about. I think all that money investment was about was just making sure that Intel had a hand in the game and is actually going forward.


In terms of who else is going to buy, that is also difficult to say. I mean, certainly the SAPs and Oracles of this world have got enough money to buy into this or IBM has got enough money to buy into it. But, you know, this is all open source. IBM never bought a Linux distribution, even though they plowed a lot of money into Linux. It didn't break their hearts that they didn't actually have a Linux distribution. They're very happy to cooperate with Red Hat. I would say maybe Red Hat will buy one of these distributions, because they know how to make that business model work, but it's difficult to say.


Eric Kavanagh: Yeah, great point. So folks, I'm going to go ahead and just share my desktop one last time here and just show you a couple of things. So after the event, check out Techopedia - you can see that on the left-hand side. Here's a story that yours truly wrote, I guess a couple of months ago or a month and a half ago, I suppose. It really kind of spun out of a lot of the experience that we had talking with various vendors and trying to dig in to understanding what exactly is going on with the space because sometimes it can be kind of difficult to navigate the buzz words and the hype and the terminology and so forth.


Also a very big thank you to all of those who have been Tweeting. We had one heck of a Tweet stream here going today. So, thank you, all of you. You see that it just goes on and on and on. A lot of great Tweets on TechWise today.


This is the first of our new series, folks. Thank you so much for tuning in. We will let you know what's going on for the next series sometime soon. I think we're going to focus on analytics probably in June sometime. And folks, with that, I think we're going to go ahead and close up our event. We will email you tomorrow with a link to the slides from today and we're also going to email you the link to that full deck, which is a huge deck. We've got about twenty different vendors with their Hadoop story. We're really trying to give you a sort of compendium of content around a particular topic. So for bedtime reading or whenever you're interested, you can kind of dive in and try to get that strategic view of what's going on here in the industry.


Kasama nito, i-bid ka namin ng paalam, mga tao. Thank you again so much. Go to insideanalysis.com and Techopedia to find more information about all this in the future and we'll catch up to you next time. Paalam.

Isang malalim na pagsisid sa hadoop - techwise episode 1 transcript