Tandaan ng Editor: Ito ay isang transcript ng isa sa aming mga nakaraang webcasts. Ang susunod na yugto ay mabilis na darating, mag-click dito upang magparehistro.
Eric Kavanagh: Mga kababaihan at mga ginoo, kumusta at maligayang pagdating muli sa Episode 2 ng TechWise. Oo, talaga, oras na upang makakuha ng marunong! Nakakuha ako ng isang grupo ng mga talagang matalinong tao sa linya ngayon upang matulungan kami sa pagsusumikap na iyon. Ang pangalan ko ay Eric Kavanagh, syempre. Ako ang magiging host mo, iyong moderator, para sa session ng kidlat na ito. Marami kaming nilalaman dito, mga tao. Mayroon kaming ilang mga malalaking pangalan sa negosyo, na naging mga analyst sa aming puwang at apat sa mga pinaka-kagiliw-giliw na mga nagbebenta. Kaya kami ay magkakaroon ng maraming mabuting pagkilos sa tawag ngayon. At syempre, nasa labas ka ng madla ay may mahalagang papel sa pagtatanong.
Kaya't sa sandaling muli, ang palabas ay TechWise at ang paksa ngayon ay "Paano Mapapabuti ang Negosyo sa Negosyo?" Malinaw, ito ay isang mainit na paksa kung saan susubukan na maunawaan ang iba't ibang uri ng analytics na maaari mong gawin at kung paano maaaring mapagbuti ang iyong mga operasyon dahil iyon ang lahat sa katapusan ng araw.
Kaya maaari mong makita ang aking sarili doon doon sa tuktok, iyon ang iyong tunay. Kirk Borne, isang mabuting kaibigan mula sa George Mason University. Siya ay isang scientist ng data na may napakalaking dami ng karanasan, napakalalim na kadalubhasaan sa puwang na ito at pagmimina ng data at malaking data at lahat ng uri ng nakakatuwang bagay. At, siyempre, mayroon kaming sarili nating Dr. Robin Bloor, Chief Analyst dito sa Bloor Group. Sino ang nagsanay bilang isang artista ng maraming, maraming taon na ang nakalilipas. At talagang nakatuon na siya sa buong malaking puwang ng data at ang analytic space na medyo masidhi para sa huling kalahating dekada. Limang taon na halos mula nang ilunsad namin ang Bloor Group per se. Kaya lumilipas ang oras kapag nagsasaya ka.
Naririnig din namin mula kay Will Gorman, Chief Architect of Pentaho; Steve Wilkes, CCO ng WebAction; Frank Sanders, Direktor ng Teknikal sa Marklogic; at Hannah Smalltree, Direktor sa Data ng Kayamanan. Kaya tulad ng sinabi ko, na maraming nilalaman.
Kaya paano makakatulong ang analytics sa iyong negosyo? Sa gayon, paano hindi ito makakatulong sa iyong negosyo, nang lantaran? Mayroong lahat ng mga uri ng mga paraan na maaaring magamit ng analytics upang gawin ang mga bagay na nagpapabuti sa iyong samahan.
Kaya ang mga operasyon ng streamline. Iyon ang isa na hindi mo naririnig ang tungkol sa ginagawa mo tungkol sa mga bagay tulad ng marketing o pagtaas ng kita o kahit na pagkilala ng mga oportunidad. Ngunit ang pag-stream ng iyong mga operasyon ay ito talaga, talagang malakas na bagay na maaari mong gawin para sa iyong samahan sapagkat maaari mong makilala ang mga lugar kung saan maaari kang mag-outsource ng isang bagay o maaari kang magdagdag ng data sa isang partikular na proseso, halimbawa. At iyon ay maaaring i-streamline ito sa pamamagitan ng hindi hinihiling sa isang tao na kunin ang telepono upang tawagan o isang tao na mag-email. Maraming iba't ibang mga paraan na maaari mong streamline ang iyong mga operasyon. At lahat ng iyon ay nakakatulong na ibagsak ang iyong gastos, di ba? Iyon ang susi, ibinaba nito ang gastos. Ngunit pinapayagan ka nitong mas mahusay na maglingkod sa iyong mga customer.
At kung iniisip mo ang tungkol sa kung paano naging mga walang pasensya ang mga tao, at nakikita ko ito sa bawat solong araw sa mga tuntunin kung paano nakikipag-ugnay ang mga tao sa online, kahit na sa aming mga palabas, mga service provider na ginagamit namin. Ang pasensya na mayroon ang mga tao, ang haba ng atensyon, ay nagiging mas maikli at mas maikli sa araw. At ang ibig sabihin nito ay kailangan mong, bilang isang samahan, tumugon sa mas mabilis at mas mabilis na mga tagal ng panahon upang masiyahan ang iyong mga customer.
Kaya, halimbawa, kung ang isang tao ay nasa iyong webcast site o nagba-browse sa paligid na nagsisikap na makahanap ng isang bagay, kung nabigo sila at umalis sila, mabuti, baka nawala ka lang sa isang customer. At depende sa kung magkano ang singil mo para sa iyong produkto o serbisyo, at marahil iyan ay isang malaking pakikitungo. Kaya ang nasa ilalim na linya ay ang pag-stream ng mga operasyon, sa palagay ko, ay isa sa mga pinakamainit na puwang para sa pag-apply ng analytics. At ginagawa mo iyon sa pamamagitan ng pagtingin sa mga numero, sa pamamagitan ng pag-crunching ng data, sa pamamagitan ng pag-iisip, halimbawa, "Hoy, bakit nawawala tayo sa napakaraming tao sa pahinang ito ng aming website?" "Bakit namin nakuha ang ilan sa mga tawag sa telepono ngayon?"
At ang mas tunay na oras na maaari mong tumugon sa ganoong uri ng bagay, mas mahusay na pagkakataon na magkakaroon ka ng pagkuha sa itaas ng sitwasyon at paggawa ng isang bagay tungkol dito bago ito huli. Sapagkat mayroong window na iyon ng oras kapag may nagalit sa isang bagay, hindi sila nasisiyahan o sinusubukan nilang makahanap ng isang bagay ngunit nabigo sila; nakakuha ka ng isang window ng pagkakataon doon upang maabot ang mga ito, upang kunin ang mga ito, upang makipag-ugnay sa customer na iyon. At kung gagawin mo ito sa wastong paraan gamit ang tamang data o magaling na larawan ng customer - pag-unawa kung sino ang customer na ito, ano ang kanilang kakayahang kumita, ano ang kanilang mga kagustuhan - kung makakakuha ka talaga ng isang hawakan, gagawin mo isang mahusay na trabaho na hawakan sa iyong mga customer at pagkuha ng mga bagong customer. At iyon ang tungkol sa lahat.
Kaya't sa gayon, ihahatid ko ito, sa totoo lang, kay Kirk Borne, isa sa aming data na siyentipiko sa tawag ngayon. At ang mga ito ay medyo bihirang mga araw na ito, mga tao. Mayroon kaming dalawa sa kanila kahit papaano sa tawag kaya malaki ang pakikitungo nito. Gamit nito, Kirk, ibibigay ko ito sa iyo upang pag-usapan ang tungkol sa analytics at kung paano nakakatulong ito sa negosyo. Pumunta para dito.
Dr. Kirk Borne: Well, maraming salamat, Eric. Naririnig mo ba ako?
Eric: Ayos lang, sige na.
Kirk: Okay, mabuti. Gusto ko lamang ibahagi kung makipag-usap ako sa loob ng limang minuto, at ang mga tao ay kumakaway sa akin. Kaya't ang pambungad na komentaryo, Eric, na ginawa mo talagang itali sa paksang ito ay pag-uusapan ko nang maaga tungkol sa mga susunod na ilang minuto na ang paggamit ng malaking data at analytics para sa mga data sa mga pagpapasyang suportahan, doon. Ang puna na ginawa mo tungkol sa pag-stream ng pagpapatakbo, sa akin, ito ay uri ng pagkahulog sa konseptong ito ng pagpapatakbo ng analytics kung saan maaari mong makita ang tungkol lamang sa bawat aplikasyon sa buong mundo kung ito ay aplikasyon sa agham, isang negosyo, isang seguridad sa cyber at pagpapatupad ng batas at gobyerno, pangangalaga sa kalusugan. Anumang bilang ng mga lugar kung saan mayroon kaming isang stream ng data at gumagawa kami ng ilang uri ng tugon o pagpapasya bilang reaksyon sa mga kaganapan at mga alerto at pag-uugali na nakikita natin sa stream ng data na iyon.
At kaya ang isa sa mga bagay na nais kong pag-usapan ngayon ay uri ng kung paano mo kinuha ang kaalaman at pananaw mula sa malaking data upang makarating sa puntong iyon kung saan maaari tayong talagang makagawa ng mga pagpapasyang gumawa ng mga aksyon. At madalas na pinag-uusapan natin ito tungkol sa isang konteksto ng automation. At ngayon nais kong timpla ang automation sa analyst ng tao sa loop. Kaya sa ganito ang ibig sabihin ko habang ang negosyante ay nagsasagawa ng isang mahalagang papel dito sa mga tuntunin ng pagtaya, kwalipikasyon, pagpapatunay ng mga tukoy na aksyon o mga patakaran sa pagkatuto ng makina na kinuha namin mula sa data. Ngunit kung makarating kami sa isang punto kung saan medyo nakumbinsi namin ang mga panuntunan sa negosyo na nakuha namin at ang mga mekanismo para sa pag-alerto sa amin ay may bisa, kung gayon maaari naming lubos na i-on ito sa isang awtomatikong proseso. Talagang ginagawa namin ang operational streamlining na pinag-uusapan ni Eric.
Kaya mayroon akong kaunting pag-play sa mga salita dito ngunit inaasahan ko, kung ito ay gumana para sa iyo, napag-usapan ko ang hamon sa D2D. At ang D2D, hindi lamang data ang mga pagpapasya sa lahat ng mga konteksto, tinitingnan namin ito sa uri ng ilalim ng slide na ito na sana ay makikita mo ito, paggawa ng mga pagtuklas at pagtaas ng mga dolyar ng kita mula sa aming mga pipeline ng analytics.
Kaya sa kontekstong ito, mayroon talaga akong papel na ito ng nagmemerkado sa aking sarili ngayon na nagtatrabaho ako at iyon ay; ang unang bagay na nais mong gawin ay tukuyin ang iyong data, kunin ang mga tampok, kunin ang mga katangian ng iyong mga customer o anumang nilalang ito ay sinusubaybayan mo sa iyong puwang. Marahil ito ay isang pasyente sa isang kapaligiran sa analytics ng kalusugan. Marahil ito ay isang gumagamit ng Web kung naghahanap ka ng isang uri ng isang isyu sa seguridad sa cyber. Ngunit kilalanin at kunin ang mga katangian at pagkatapos ay kunin ang ilang konteksto tungkol sa indibidwal na iyon, tungkol sa nilalang na iyon. At pagkatapos ay tipunin mo ang mga piraso na nilikha mo lamang at ilagay ang mga ito sa ilang uri ng isang koleksyon mula kung saan maaari mong ilapat ang mga algorithm sa pag-aaral ng machine.
Ang dahilan na sinasabi ko ito sa paraang ito ay, sabihin na lang, mayroon kang isang surveillance camera sa isang paliparan. Ang video mismo ay isang napakalaking, malaking dami at ito rin ay napaka-istraktura. Ngunit maaari kang mag-extract mula sa pagsubaybay sa video, facial biometrics at makilala ang mga indibidwal sa mga camera ng pagsubaybay. Kaya halimbawa sa isang paliparan, maaari mong makilala ang mga tiyak na indibidwal, maaari mong subaybayan ang mga ito sa paliparan sa pamamagitan ng pagtukoy ng parehong indibidwal sa maraming mga camera ng pagsubaybay. Sa gayon ang mga nakuha na tampok na biometric na talagang pagmimina at pagsubaybay ay hindi ang aktwal na detalyadong video mismo. Ngunit sa sandaling mayroon ka ng mga pagkuha na iyon, maaari mong ilapat ang mga patakaran sa pag-aaral ng machine at analytics upang makagawa ng mga pagpapasya kung kailangan mong gumawa ng isang aksyon sa isang partikular na kaso o isang bagay na nangyari nang hindi tama o isang bagay na mayroon kang isang pagkakataon upang makagawa ng isang alok. Kung ikaw, halimbawa, kung mayroon kang isang tindahan sa paliparan at nakikita mong darating ang iyong customer at alam mo mula sa iba pang impormasyon tungkol sa customer na iyon, marahil ay talagang interesado siyang bumili ng mga bagay-bagay sa shop na walang bayad o isang bagay na tulad nito, gawin ang alok na iyon.
Kaya anong uri ng mga bagay ang ibig kong sabihin sa pamamagitan ng pagkilala at potensyal? Sa pamamagitan ng characterization Ibig kong sabihin, muli, pagkuha ng mga tampok at katangian sa data. At maaari itong maging makina na nabuo, kung gayon ang mga algorithm nito ay maaaring aktwal na kunin, halimbawa, ang mga pirma ng biometric mula sa pagsusuri sa video o sentimento. Maaari mong kunin ang damdamin ng customer sa pamamagitan ng mga online na pagsusuri o social media. Ang ilan sa mga bagay na ito ay maaaring nabuo ng tao, upang ang tao, ang analyst ng negosyo, ay maaaring kumuha ng mga karagdagang tampok na ipapakita ko sa susunod na slide.
Ang ilan sa mga ito ay maaaring maging madla. At sa pamamagitan ng madla, maraming iba't ibang mga paraan na maaari mong isipin tungkol doon. Ngunit napaka-simple, halimbawa, ang iyong mga gumagamit ay dumating sa iyong website at inilalagay nila ang mga salita sa paghahanap, mga keyword, at nagtatapos sila sa isang tiyak na pahina at talagang gumugol ng oras doon sa pahinang iyon. Sa katunayan, talagang, nauunawaan nila na ang alinman sa pagtingin, pag-browse, pag-click sa mga bagay sa pahinang iyon. Ang sinasabi sa iyo ay ang keyword na kanilang nai-type sa umpisa ay ang naglalarawan ng pahinang iyon dahil napunta ito sa customer sa pahina na inaasahan nila. At sa gayon maaari mong idagdag ang karagdagang piraso ng impormasyon, iyon ang mga customer na gumagamit ng keyword na ito ang aktwal na nakilala ang webpage na ito sa loob ng aming arkitektura ng impormasyon bilang lugar kung saan ang nilalaman na tumutugma sa keyword na iyon.
At sa gayon ang crowdsourcing ay isa pang aspeto na kung minsan nakalimutan ng mga tao, na uri ng pagsubaybay sa mga tinapay ng tinapay sa iyong mga customer, kaya't magsalita; paano sila lumilipat sa kanilang puwang, kung ito ay isang online na pag-aari o isang tunay na pag-aari. At pagkatapos ay gamitin ang uri ng landas nila, na ang customer ay tumatagal ng karagdagang impormasyon tungkol sa mga bagay na tinitingnan namin.
Kaya nais kong sabihin ang mga bagay na nilikha ng tao, o nabuo ang makina, nagtapos ng pagkakaroon ng isang konteksto sa uri ng pag-annotate o pag-tag ng mga tukoy na data ng mga granules o mga nilalang. Kung ang mga nilalang na iyon ay mga pasyente sa isang setting ng ospital, mga customer o anupaman. At kaya mayroong iba't ibang uri ng pag-tag at mga anotasyon. Ang ilan doon ay tungkol sa data mismo. Iyon ang isa sa mga bagay, kung anong uri ng impormasyon, kung anong uri ng impormasyon, ano ang mga tampok, mga hugis, marahil ang mga texture at pattern, anomalya, hindi pagkilos na anomalya. At pagkatapos ay kunin ang ilang mga semantika, iyon ay, paano ito nauugnay sa iba pang mga bagay na alam ko, o ang kostumer na ito ay isang customer ng electronics. Ang customer na ito ay isang customer na damit. O gusto ng customer na ito na bumili ng musika.
Kaya ang pagkilala sa ilang mga semantika tungkol doon, ang mga kostumer na gusto ng musika ay may gusto sa libangan. Siguro maaari kaming mag-alok sa kanila ng iba pang mga pag-aari ng libangan. Kaya ang pag-unawa sa mga semantika at din ang ilang mga napatunayan, na kung saan ay pangunahing sinasabi: kung saan nanggaling ito, na nagbigay ng assertion na ito, anong oras, anong petsa, sa ilalim ng anong pangyayari?
Kaya't sa sandaling mayroon ka ng lahat ng mga anotasyong ito at mga pagkilala, idagdag sa gayon ang susunod na hakbang, na kung saan ay ang konteksto, uri ng kung sino, ano, kailan, saan at bakit ito. Sino ang gumagamit? Ano ang channel na pinasok nila? Ano ang pinagmulan ng impormasyon? Anong uri ng paggamit ang nakita natin sa partikular na piraso ng impormasyon o produkto ng data? At ano, ito ay uri ng, halaga sa proseso ng negosyo? At pagkatapos ay kolektahin ang mga bagay na iyon at pamahalaan ang mga ito, at talagang makatulong na lumikha ng database, kung nais mong isipin ito. Gawin silang mahahanap, magagamit muli, ng iba pang mga analyst ng negosyo o sa pamamagitan ng isang awtomatikong proseso na, sa susunod na oras na makita ko ang mga hanay ng mga tampok na ito, maaaring gawin ng system ang awtomatikong pagkilos na ito. At kaya nakarating kami sa uri ng kahusayan ng pagpapatakbo na analitiko, ngunit mas kinokolekta namin ang kapaki-pakinabang, komprehensibong impormasyon, at pagkatapos ay curate ito para sa mga kaso ng paggamit.
Bumaba kami sa negosyo. Ginagawa namin ang data analytics. Naghahanap kami ng mga kagiliw-giliw na pattern, sorpresa, outliers ng bago ng buhay, anomalya. Naghahanap kami ng mga bagong klase at mga segment sa populasyon. Naghahanap kami ng mga asosasyon at ugnayan at mga link sa mga iba't ibang mga nilalang. At pagkatapos ay ginagamit namin ang lahat ng iyon upang himukin ang aming pagtuklas, desisyon at proseso ng paggawa ng dolyar.
Kaya't muli, narito nakuha namin ang huling slide ng data na mayroon ako ay talaga lamang na nagbubuod, na pinapanatili ang analista ng negosyo sa loop, muli, hindi mo nakuha ang tao at lahat ito ay mahalaga upang mapanatili ang taong iyon.
Kaya ang mga tampok na ito, lahat sila ay ibinibigay ng mga makina o mga analyst ng tao o kahit na pag-rally. Inilapat namin ang kumbinasyon ng mga bagay upang mapagbuti ang aming mga hanay ng pagsasanay para sa aming mga modelo at nagtatapos sa mas tumpak na mga modelo ng mahuhula, mas kaunting maling mga positibo at negatibo, mas mahusay na pag-uugali, mas mahusay na pakikialam sa aming mga customer o kahit sino.
Kaya, sa pagtatapos ng araw, talagang pinagsasama-sama lamang natin ang pag-aaral ng makina at malaking data sa kapangyarihang ito ng pag-cognition ng tao, na kung saan ang uri ng pag-tag ng anotasyon na piraso ay pumapasok. At maaaring humantong sa pamamagitan ng visualization at visual analytics-type mga tool o mga nakaka-engganyong data sa kapaligiran o pag-rally. At, sa pagtatapos ng araw, kung ano ang talagang ginagawa ay ang pagbuo ng aming pagtuklas, pananaw at D2D. At iyon ang aking mga puna, kaya salamat sa pakikinig.
Eric: Hoy mahusay na tunog at hayaan akong magpatuloy at ibigay ang mga susi kay Dr. Robin Bloor upang mabigyan din ang kanyang pananaw. Oo, gusto kong marinig ka magkomento tungkol sa pag-stream ng konsepto ng pagpapatakbo at pinag-uusapan mo ang mga analytics ng pagpapatakbo. Sa palagay ko iyon ay isang malaking lugar na kailangang galugarin nang lubusan. At sa palagay ko, totoong mabilis bago Robin, ibabalik kita, Kirk. Kinakailangan nito na mayroon kang ilang medyo makabuluhang pakikipagtulungan sa iba't ibang mga manlalaro sa kumpanya, di ba? Kailangan mong makipag-usap sa mga tao sa pagpapatakbo; kailangan mong makuha ang iyong mga teknikal na tao. Minsan nakukuha mo ang iyong mga tao sa marketing o ang iyong mga taong interface ng Web. Ito ay karaniwang magkakaibang mga pangkat. Mayroon ka bang pinakamahusay na kasanayan o mungkahi sa kung paano uri ng makuha ang lahat na ilagay ang kanilang balat sa laro?
Dr. Kirk: Well, sa palagay ko ito ay kasama sa kultura ng negosyo ng pakikipagtulungan. Sa katunayan, pinag-uusapan ko ang tungkol sa tatlong C ng uri ng kultura ng analytics. Ang isa ay pagkamalikhain; ang isa pa ay pag-usisa at ang pangatlo ay ang pakikipagtulungan. Kaya't nais mo ang malikhaing, malubhang tao, ngunit kailangan mo ring makiisa ang mga taong ito. At nagsisimula talaga ito mula sa itaas, ang uri ng pagbuo ng kulturang iyon sa mga tao na dapat na bukas na magbahagi at magtulungan patungo sa mga karaniwang layunin ng negosyo.
Eric: Lahat ng ito ay may katuturan. At kailangan mo talagang makakuha ng mahusay na pamumuno sa tuktok upang maganap iyon. Kaya't ituloy natin ito at ibigay kay Dr. Bloor. Robin, ang sahig ay iyo.
Robin Bloor: Okay. Salamat sa intro na iyon, Eric. Okay, ang paraan ng paglabas ng mga ito, ang mga palabas na ito, sapagkat mayroon kaming dalawang analyst; Nakikita ko ang pagtatanghal ng analyst na hindi ginagawa ng ibang mga lalaki. Alam ko kung ano ang sasabihin ni Kirk at pupunta lang ako sa isang iba't ibang anggulo upang hindi kami masyadong mag-overlap.
Kaya ang talagang pinag-uusapan o pinag-uusapan kong pag-usapan dito ay ang papel ng analyst ng data kumpara sa papel ng analyst ng negosyo. At ang paraan ng pagkikilala ko rito, mabuti, ang dila-sa-pisngi sa isang tiyak na lawak, ay uri ng Jekyll at Hyde na bagay. Ang pagkakaiba ay partikular na ang mga siyentipiko ng data, sa teorya ng hindi bababa sa, alam kung ano ang kanilang ginagawa. Habang ang mga analyst ng negosyo ay hindi ganoon, okay sa paraan ng paggawa ng matematika, kung ano ang maaaring mapagkakatiwalaan at kung ano ang hindi mapagkakatiwalaan.
Kaya't bumaba lamang tayo sa kadahilanan na ginagawa natin ito, ang kadahilanan na ang pagsusuri ng data ay biglang naging isang malaking pakikitungo mula sa katotohanan na maaari nating pag-aralan ang napakalaking halaga ng data at hilahin ang data mula sa labas ng samahan; ito ay nagbabayad. Ang paraan ng pagtingin ko sa ito - at sa palagay ko ito lamang ay nagiging isang kaso ngunit tiyak na sa palagay ko ito ay isang kaso - ang pagsusuri ng data ay talagang negosyo sa R&D. Kung ano ang talagang ginagawa mo sa isang paraan o sa iba pang pagsusuri ng data ay tinitingnan mo ang isang proseso ng negosyo sa isang uri o kung iyon ang pakikisalamuha sa isang customer, alinman sa paraan ng iyong operasyon sa tingi, ang paraan na iyong ipinagtatagal iyong mga tindahan. Hindi mahalaga kung ano ang isyu. Tumitingin ka sa isang naibigay na proseso ng negosyo at sinusubukan mong pagbutihin ito.
Ang kinalabasan ng matagumpay na pananaliksik at pag-unlad ay isang proseso ng pagbabago. At maaari mong isipin ang paggawa, kung nais mo, bilang isang karaniwang halimbawa nito. Dahil sa pagmamanupaktura, nagtitipon ang mga tao ng impormasyon tungkol sa lahat upang subukan at mapabuti ang proseso ng pagmamanupaktura. Ngunit sa palagay ko kung ano ang nangyari o kung ano ang nangyayari sa malaking data ay ang lahat ng ito ay inilalapat ngayon sa lahat ng mga negosyo ng anumang uri sa anumang paraan na maisip ng sinuman. Kaya't ang anumang proseso ng negosyo ay para sa pagsusuri kung maaari kang mangalap ng data tungkol dito.
Kaya iyon ang isang bagay. Kung gusto mo, na nangyayari sa tanong ng pagsusuri ng data. Ano ang magagawa ng data analytics para sa negosyo? Kaya, maaari nitong baguhin ang negosyo nang buo.
Ang partikular na diagram na hindi ko ilalarawan sa anumang lalim, ngunit ito ay isang diagram na napunta namin bilang ang paghantong sa proyekto ng pananaliksik na ginawa namin sa unang anim na buwan ng taong ito. Ito ay isang paraan ng kumakatawan sa isang malaking arkitektura ng data. At isang bilang ng mga bagay na karapat-dapat na ituro bago ako magpatuloy sa susunod na slide. Mayroong dalawang daloy ng data dito. Ang isa ay isang stream ng real-time na data, na napupunta sa tuktok ng diagram. Ang iba pa ay isang mabagal na stream ng data na napupunta sa ilalim ng diagram.
Tumingin sa ilalim ng diagram. Nakakuha kami ng Hadoop bilang isang data ng reservoir. Mayroon kaming iba't ibang mga database. Nakakuha kami ng isang buong data doon na may isang buong bungkos ng aktibidad na nangyayari dito, na ang karamihan sa mga ito ay analytical na aktibidad.
Ang puntong ginagawa ko rito at ang tanging punto na talagang nais kong gawin dito ay mahirap ang teknolohiya. Hindi ito simple. Hindi madali. Ito ay hindi isang bagay na ang sinumang bago sa laro ay maaaring talagang magkasama. Ito ay medyo kumplikado. At kung magsusumite ka ng isang negosyo para sa paggawa ng maaasahang analytics sa lahat ng mga prosesong ito, kung gayon hindi ito isang bagay na mangyayari partikular na mabilis. Ito ay nangangailangan ng maraming teknolohiya na maidaragdag sa halo.
Sige. Ang tanong kung ano ang isang scientist ng data, maaari kong maangkin na isang data scientist dahil sinanay ako talaga sa mga istatistika bago ako sinanay sa computing. At gumawa ako ng isang actuarial na trabaho para sa isang tagal ng oras upang malaman ko ang paraan na ang isang negosyo ay nagsasaayos, pagsusuri sa istatistika, upang mapatakbo ang sarili. Hindi ito isang bagay na walang halaga. At mayroong isang kakila-kilabot na pinakamahusay na kasanayan na kasangkot kapwa sa panig ng tao at sa panig ng teknolohiya.
Kaya sa pagtatanong sa tanong na "kung ano ang isang siyentipiko ng data, " Inilagay ko ang pic na Frankenstein dahil lamang ito ay isang kombinasyon ng mga bagay na kailangang magkadikit. May kasamang pamamahala ng proyekto. Mayroong malalim na pag-unawa sa mga istatistika. Mayroong kadalubhasaan sa negosyo ng domain, na kung saan ay higit pa sa isang problema ng isang analyst ng negosyo kaysa sa scientist ng data, kinakailangan. Mayroong karanasan o ang kailangan upang maunawaan ang arkitektura ng data at upang makapagtayo ng data ng data at may kasamang software engineering. Sa madaling salita, marahil ay isang koponan. Marahil hindi ito isang indibidwal. At nangangahulugan ito na marahil isang kagawaran na kailangang maisaayos at ang samahan nito ay kailangang isipin nang medyo malawakan.
Paghahagis sa ihalo ang katotohanan ng pag-aaral ng makina. Hindi namin magawa, ang ibig kong sabihin, ang pag-aaral ng makina ay hindi bago sa kamalayan na ang karamihan sa mga istatistikong pamamaraan na ginagamit sa pag-aaral ng makina ay nalalaman tungkol sa mga dekada. Mayroong ilang mga bagong bagay, ang ibig kong sabihin ay ang mga neural network ay medyo bago, sa palagay ko ay mga 20 taong gulang lamang sila, kaya ang ilan sa mga ito ay medyo bago. Ngunit ang problema sa pag-aaral ng makina ay talagang hindi namin talaga nakuha ang lakas ng computer upang gawin ito. At ang nangyari, bukod sa anupaman, ay nasa lugar na ang lakas ng computer. At nangangahulugan ito ng isang kakila-kilabot na kung ano ang sinasabi namin, ang mga data ng mga siyentipiko ay nagawa bago sa mga tuntunin ng mga modeling sitwasyon, sampling data at pagkatapos ay marshalling na upang makabuo ng isang mas malalim na pagsusuri ng data. Sa totoo lang, maaari lamang nating itapon ang kapangyarihan ng computer sa ilang mga kaso. Piliin lamang ang mga algorithm sa pag-aaral ng machine, ihagis ito sa data at makita kung ano ang lalabas. At iyon ay isang bagay na maaaring gawin ng isang analyst ng negosyo, di ba? Ngunit kailangang maunawaan ng analyst ng negosyo ang kanilang ginagawa. Ibig kong sabihin, sa palagay ko na ang isyu talaga, higit sa anupaman.
Sa gayon, ito ay upang malaman ang higit pa tungkol sa negosyo mula sa data nito kaysa sa anumang iba pang paraan. Hindi sinabi ni Einstein, sinabi ko iyon. Inilagay ko lang ang kanyang larawan para sa kredensyal. Ngunit ang sitwasyon ay aktwal na nagsisimula upang bumuo ay isa kung saan ang teknolohiya, kung ginamit nang maayos, at ang matematika, kung ginamit nang maayos, ay maaaring magpatakbo ng isang negosyo tulad ng sinumang indibidwal. Napanood namin ito sa IBM. Una sa lahat, maaari itong talunin ang pinakamahusay na mga lalaki sa chess, at pagkatapos ay maaari itong matalo ang pinakamahusay na mga lalaki sa Jeopardy; ngunit sa kalaunan ay magagawang talunin ang pinakamahusay na mga lalaki sa pagpapatakbo ng isang kumpanya. Ang mga istatistika sa kalaunan ay magtagumpay. At mahirap makita kung paano hindi mangyayari iyon, hindi pa ito nangyari.
Kaya ang sinasabi ko, at ito ay uri ng isang kumpletong mensahe ng aking pagtatanghal, ay ang dalawang isyu na ito ng negosyo. Ang una ay, maaari mong makuha ang teknolohiya di ba? Maaari mong gawin ang teknolohiya upang gumana para sa koponan na talagang magagawang mamuno dito at makakuha ng mga benepisyo para sa negosyo? At pagkatapos ay pangalawa, maaari mong makuha ang mga tao ng tama? At pareho ang mga ito ay mga isyu. At ang mga ito ay mga isyu na hindi, hanggang sa oras na ito, sinasabi nila na, nalutas.
Okay Eric, ipapasa ko ito sa iyo. O baka marahil ay ipasa ko ito kay Will.
Eric: Sa totoo lang, oo. Maraming salamat, Will Gorman. Oo, doon ka pupunta, Will. Kaya tingnan natin. Hayaan akong bigyan ka ng susi sa WebEx. Kaya kung ano ang kailangan mo? Pentaho, malinaw naman, kayo ay matagal nang umiikot at ang uri ng open-source na BI kung saan ka nagsimula. Ngunit marami kang nakuha kaysa sa dati mong mayroon, kaya tingnan natin kung ano ang nakuha mo sa mga araw na ito para sa mga analytics.
Ay Gorman: Ganap. Kumusta, lahat! Ang pangalan ko ay Will Gorman. Ako ang Chief Architect sa Pentaho. Para sa inyo na hindi pa naririnig sa amin, binanggit ko lang ang Pentaho ay isang malaking pagsasama ng data at analytics na kumpanya. Sampung taon kaming nasa negosyo. Ang aming mga produkto ay umusbong nang magkasama sa malaking data ng komunidad, na nagsisimula bilang isang bukas na mapagkukunan na platform para sa pagsasama ng data at analytics, makabagong sa teknolohiya tulad ng Hadoop at NoSQL kahit bago ang mga komersyal na nilalang na nabuo sa paligid ng mga tech. At ngayon mayroon kaming higit sa 1500 komersyal na mga customer at marami pang mga appointment ng produksyon bilang isang resulta ng aming makabagong ideya sa paligid ng bukas na mapagkukunan.
Ang aming arkitektura ay lubos na naka-embed at extensible, layunin na binuo upang maging kakayahang umangkop bilang malaking teknolohiya ng data sa partikular na umuusbong sa napakabilis na bilis. Nag-aalok ang Pentaho ng tatlong pangunahing mga lugar ng produkto ay nagtutulungan upang matugunan ang mga kaso ng paggamit ng data ng analytics.
Ang unang produkto sa lawak ng aming arkitektura ay ang Pagsasama ng Pentaho Data na nakatuon patungo sa data technologist at mga inhinyero ng data. Nag-aalok ang produktong ito ng isang visual, drag-and-drop na karanasan para sa pagtukoy ng mga pipelines ng data at mga proseso para sa pag-orkestra ng data sa loob ng mga malalaking kapaligiran ng data at tradisyonal na mga kapaligiran. Ang produktong ito ay isang magaan, metadatabase, platform ng pagsasama ng data na binuo sa Java at maaaring ma-deploy bilang isang proseso sa loob ng MapReduce o YARN o Storm at maraming iba pang mga batch at real-time platform.
Ang aming pangalawang lugar ng produkto ay nasa paligid ng visual analytics. Gamit ang teknolohiyang ito, ang mga samahan at OEM ay maaaring mag-alok ng isang mayamang karanasan sa pag-drag at pag-drop ng visualization at analytics para sa mga analyst ng negosyo at mga gumagamit ng negosyo sa pamamagitan ng mga modernong browser at tablet, na pinapayagan ang paglikha ng ad hoc ng mga ulat at dashboard. Pati na rin ang paglalahad ng perpektong dashboarding ng pixel at ulat.
Ang aming ikatlong lugar ng produkto ay nakatuon sa mga nahuhulaan na analytics na naka-target para sa mga siyentipiko ng data, mga algorithm ng pag-aaral ng machine. Tulad ng nabanggit dati, tulad ng mga neural network at tulad nito, ay maaaring maisama sa isang kapaligiran ng pagbabago ng data, na nagpapahintulot sa mga siyentipiko ng data na magmula sa pagmomolde sa kapaligiran ng produksiyon, na nagbibigay ng access upang mahulaan, at maaaring maapektuhan nang mabilis ang mga proseso ng negosyo.
Ang lahat ng mga produktong ito ay mahigpit na isinama sa isang solong karanasan sa maliksi at bigyan ang aming mga customer ng kakayahang umangkop na kailangan nila upang matugunan ang kanilang mga problema sa negosyo. Nakakakita kami ng isang mabilis na umuusbong na tanawin ng malaking data sa tradisyunal na teknolohiya. Lahat ng naririnig namin mula sa ilang mga kumpanya sa malaking puwang ng data na ang EDW ay malapit na matapos. Sa katunayan, ang nakikita natin sa aming mga customer ng negosyo ay kailangan nilang ipakilala ang malaking data sa umiiral na mga proseso ng negosyo at IT at hindi palitan ang mga proseso.
Ang simpleng diagram na ito ay nagpapakita ng punto sa arkitektura na madalas nating nakikita, na kung saan ay isang uri ng arkitektura ng pag-deploy ng EDW na may pagsasama ng data at mga kaso ng paggamit ng BI. Ngayon ang diagram na ito ay katulad ng slide ng Robin sa malaking arkitektura ng data, isinasama nito ang real-time at data sa kasaysayan. Tulad ng paglabas ng mga bagong mapagkukunan ng data at mga kinakailangan sa real-time, nakikita namin ang malaking data bilang isang karagdagang bahagi ng pangkalahatang arkitekturang IT. Ang mga bagong mapagkukunan ng data ay may kasamang data na nabuo ng makina, hindi nakaayos na data, ang karaniwang dami at bilis at iba't ibang mga kinakailangan na naririnig natin sa malaking data; hindi sila umaangkop sa tradisyonal na proseso ng EDW. Ang Pentaho ay gumagana nang malapit sa Hadoop at NoSQL upang gawing simple ang ingestion, pagproseso ng data at paggunita ng data na ito pati na rin ang pagsasama ng data na ito kasama ang tradisyonal na mga mapagkukunan upang bigyan ang mga customer ng isang buong view sa kanilang kapaligiran sa data. Ginagawa namin ito sa isang pinamamahalaan na paraan upang mag-alok ang IT ng isang buong solusyon sa analytics sa kanilang linya ng negosyo.
Sa pagsasara, nais kong i-highlight ang aming pilosopiya sa paligid ng malaking data analytics at pagsasama; naniniwala kami na ang mga teknolohiyang ito ay mas mahusay na magkasama na nagtatrabaho sa iisang pinag-isang arkitektura, na nagpapagana ng isang bilang ng mga kaso ng paggamit na kung hindi man ay hindi posible. Ang mga data ng aming mga customer ay higit pa sa malaking data, Hadoop at NoSQL. Ang anumang data ay patas na laro. At ang mga malalaking mapagkukunan ng data ay kailangang magamit at magtulungan upang makaapekto sa halaga ng negosyo.
Sa wakas, naniniwala kami na upang malutas ang mga problemang pangnegosyo sa mga negosyo na napaka-epektibo sa pamamagitan ng data, ang IT at mga linya ng negosyo ay kailangang magtulungan sa isang pinamamahalaan, pinaghalong diskarte sa malaking data analytics. Well maraming salamat sa pagbibigay sa amin ng oras upang makipag-usap, Eric.
Eric: Pusta ka. Hindi, magandang bagay iyon. Nais kong bumalik sa bahaging iyon ng iyong arkitektura habang nakarating kami sa Q & As. Kaya't lumipat tayo sa natitirang pagtatanghal at maraming salamat sa inyo. Tiyak na mabilis na gumagalaw ang mga nakaraang taon, kailangan kong sabihin na sigurado.
Kaya Steve, hayaan mo akong ituloy at ibigay ito sa iyo. At i-click lamang doon sa down arrow at pumunta para dito. Kaya Steve, binibigyan kita ng mga susi. Steve Wilkes, mag-click lamang sa pinakamalayo na arrow pababa doon sa iyong keyboard.
Steve Wilkes: Doon kami pupunta.
Eric: Doon ka pupunta.
Steve: Iyan ay isang mahusay na intro na ibinigay mo sa akin, bagaman.
Eric: Oo.
Steve: Kaya ako si Steve Wilkes. Ako ang CCO sa WebAction. Nag-iisang taon na kami sa paligid at tiyak na mabilis din kaming gumagalaw, mula noon. Ang WebAction ay isang real-time na platform ng data ng malaking data. Nauna nang nabanggit ni Eric, uri ng, kung gaano kahalaga ang tunay na oras at kung gaano katindi ang pagkuha ng iyong mga aplikasyon. Ang aming platform ay dinisenyo upang bumuo ng mga real-time na apps. At upang paganahin ang susunod na henerasyon ng mga apps na hinihimok ng data na maaaring maitayo nang pataas at upang payagan ang mga tao na bumuo ng mga dashboard mula sa data na nabuo mula sa mga app na iyon, ngunit nakatuon sa totoong oras.
Ang aming platform ay talagang isang buong end-to-end platform, ginagawa ang lahat mula sa pagkuha ng data, pagproseso ng data, sa lahat ng paraan hanggang sa data visualization. At nagbibigay-daan sa maraming iba't ibang mga uri ng mga tao sa loob ng aming negosyo upang magtulungan upang lumikha ng tunay na real-time na apps, na nagbibigay sa kanila ng pananaw sa mga bagay na nangyayari sa kanilang negosyo tulad ng nangyari.
At ito ay medyo naiiba sa kung ano ang nakikita ng karamihan sa mga tao sa malaking data, upang ang tradisyunal na diskarte - mabuti, tradisyonal na huling huling taon - ang diskarte na may malaking data ay upang makuha ito mula sa isang buong bungkos ng iba't ibang mga mapagkukunan at pagkatapos ay i-pile ito sa isang malaking reservoir o lawa o kung ano man ang nais mong tawagan ito. At pagkatapos ay iproseso ito kapag kailangan mong magpatakbo ng isang query dito; upang magpatakbo ng malakihang pagsusuri sa kasaysayan o kahit na ad hoc querying ng malaking halaga ng data. Ngayon ay gumagana para sa ilang mga kaso ng paggamit. Ngunit kung nais mong maging aktibo sa iyong negosyo, kung nais mong aktwal na sasabihin sa kung ano ang nangyayari sa halip na alamin kung may isang bagay na napunta sa uri ng pagtatapos ng araw o katapusan ng linggo, kung gayon kailangan mo talagang ilipat sa totoong oras.
At lumilipat ito ng kaunti. Inilipat nito ang pagproseso sa gitna. Kaya't mabisa mong isinasagawa ang mga daloy ng maraming mga data na patuloy na nabuo sa loob ng negosyo at pinoproseso mo ito habang nakuha mo ito. At dahil pinoproseso mo ito habang nakukuha mo, hindi mo na kailangang mag-imbak ng lahat. Maaari mo lamang maiimbak ang mahalagang impormasyon o ang mga bagay na kailangan mong alalahanin na nangyari talaga. Kaya kung sinusubaybayan mo ang lokasyon ng GPS ng mga sasakyan na lumilipat sa kalsada, hindi mo talaga pinapahalagahan kung nasaan sila bawat segundo, hindi mo na kailangang mag-imbak kung nasaan sila bawat segundo. Kailangan mo lang pakialam, iniwan na nila ang lugar na ito? Nakarating na ba sila sa lugar na ito? Nagmaneho ba sila, o hindi, ang freeway?
Kaya't talagang mahalaga na isaalang-alang na bilang higit pa at mas maraming data ang nabuo, kung gayon ang tatlong Rt. Ang bilis ng bilis ay tinutukoy kung gaano karaming data ang bumubuo bawat araw. Ang mas maraming data na nabuo nang higit pa kailangan mong mag-imbak. At ang mas kailangan mong mag-imbak, mas mahaba ang kinakailangan upang maproseso. Ngunit kung maaari mong iproseso ito habang nakukuha mo, pagkatapos makakakuha ka ng isang malaking benepisyo at maaari kang gumanti sa na. Maaari kang masabihan na ang mga bagay ay nangyayari sa halip na maghanap para sa kanila sa ibang pagkakataon.
Kaya ang aming platform ay dinisenyo upang maging lubos na nasusukat. Mayroon itong tatlong pangunahing mga piraso - ang piraso ng pagkuha, piraso ng pagproseso at pagkatapos ay ang paghahatid ng mga piraso ng paggunita sa platform. Sa panig ng acquisition, hindi lamang kami tumitingin sa data ng log na nabuo ng makina tulad ng mga log sa Web o mga aplikasyon na mayroong lahat ng iba pang mga log na nalilikha. We can also go in and do change data capture from databases. So that basically enables us to, we've seen the ETL side that Will presented and traditional ETL you have to run queries against the databases. We can be told when things happen in the database. We change it and we capture it and receive those events. And then there's obviously the social feeds and live device data that's being pumped to you over TCP or ACDP sockets.
There's tons of different ways of getting data. And talking of volume and velocity, we're seeing volumes that are billions of events per day, right? So it's large, large amounts of data that is coming in and needs to be processed.
That is processed by a cluster of our servers. The servers all have the same architecture and are all capable of doing the same things. But you can configure them to, sort of, do different things. And within the servers we have a high-speed query processing layer that enables you to do some real-time analytics on the data, to do enrichments of the data, to do event correlation, to track things happening within time windows, to do predictive analytics based on patterns that are being seen in the data. And that data can then be stored in a variety places - the traditional RDBMS, enterprise data warehouse, Hadoop, big data infrastructure.
And the same live data can also be used to power real-time data-driven apps. Those apps can have a real-time view of what's going on and people can also be alerted when important things happen. So rather than having to go in at the end of the day and find out that something bad really happened earlier on the day, you could be alerted about it the second we spot it and it goes straight to the page draw down to find out what's going on.
So it changes the paradigm completely from having to analyze data after the fact to being told when interesting things are happening. And our platform can then be used to build data-driven applications. And this is really where we're focusing, is building out these applications. For customers, with customers, with a variety of different partners to show true value in real-time data analysis. So that allows people that, or companies that do site applications, for example, to be able track customer usage over time and ensure that the quality of service is being met, to spot real-time fraud or money laundering, to spot multiple logins or hack attempts and those kind of security events, to manage things like set-top boxes or other devices, ATM machines to monitor them in real time for faults, failures that have happened, could happen, will happen in the future based on predictive analysis. And that goes back to the point of streamlining operations that Eric mentioned earlier, to be able to spot when something's going to happen and organize your business to fix those things rather than having to call someone out to actually do something after the fact, which is a lot more expensive.
Consumer analytics is another piece to be able to know when a customer is doing something while they're still there in your store. Data sent to management to be able to in real time monitor resource usage and change where things are running and to be able to know about when things are going to fail in a much more timely fashion.
So that's our products in a nutshell and I'm sure we'll come back to some of these things in the Q&A session. Salamat.
Eric: Yes, indeed. Great job. Okay good. And now next stop in our lightning round, we've got Frank Sanders calling in from MarkLogic. I've known about these guys for a number of years, a very, very interesting database technology. So Frank, I'm turning it over to you. Just click anywhere in that. Use the down arrow on your keyboard and you're off to the races. Doon ka pupunta.
Frank Sanders: Thank you very much, Eric. So as Eric mentioned, I'm with a company called MarkLogic. And what MarkLogic does is we provide an enterprise NoSQL database. And perhaps, the most important capability that we bring to the table with regards to that is the ability to actually bring all of these disparate sources of information together in order to analyze, search and utilize that information in a system similar to what you're used to with traditional relational systems, right?
And some of the key features that we bring to the table in that regard are all of the enterprise features that you'd expect from a traditional database management system, your security, your HA, your DR, your backup are in store, your asset transactions. As well as the design that allows you to scale out either on the cloud or in the commodity hardware so that you can handle the volume and the velocity of the information that you're going to have to handle in order to build and analyze this sort of information.
And perhaps, the most important capability is that fact that we're scheme agnostic. What that means, practically, is that you don't have to decide what your data is going to look like when you start building your applications or when you start pulling those informations together. But over time, you can incorporate new data sources, pull additional information in and then use leverage and query and analyze that information just as you would with anything that was there from the time that you started the design. Okay?
So how do we do that? How do we actually enable you to load different sorts of information, whether it be text, RDF triples, geospatial data, temporal data, structured data and values, or binaries. And the answer is that we've actually built our server from the ground up to incorporate search technology which allows you to put information in and that information self describes and it allows you to query, retrieve and search that information regardless of its source or format.
And what that means practically is that - and why this is important when you're doing analysis - is that analytics and information is most important ones when it's properly contextualized and targeted, right? So a very important key part of any sort of analytics is search, and the key part is search analytics. You can't really have one without the other and successfully achieve what you set out to achieve. Right?
And I'm going to talk briefly about three and a half different use cases of customers that we have at production that are using MarkLogic to power this sort of analytics. Sige. So the first such customer is Fairfax County. And Fairfax County has actually built two separate applications. One is based around permitting and property management. And the other, which is probably a bit more interesting, is the Fairfax County police events application. What the police events application actually does is it pulls information together like police reports, citizen reports and complaints, Tweets, other information they have such as sex offenders and whatever other information that they have access to from other agencies and sources. Then they allow them to visualize that and present this to the citizens so they can do searches and look at various crime activity, police activity, all through one unified geospatial index, right? So you can ask questions like, "what is the crime rate within five miles" or "what crimes occurred within five miles of my location?" Sige.
Another user that we've got, another customer that we have is OECD. Why OECD is important to this conversation is because in addition to everything that we've enabled for Fairfax County in terms of pulling together information, right; all the information that you would get from all various countries that are members of the OECD that they report on from an economic perspective. We actually laid a target drill into that, right. So you can see on the left-hand side we're taking the view of Denmark specifically and you can kind of see a flower petal above it that rates it on different axes. Right? And that's all well and good. But what the OECD has done is they've gone a step further.
In addition to these beautiful visualizations and pulling all these information together, they're actually allowing you in real time to create your own better life index, right, which you can see on the right-hand side. So what you have there is you have a set of sliders that actually allow you to do things like rank how important housing is to you or income, jobs, community, education, environment, civic engagement, health, life satisfaction, safety and your work/life balance. And dynamically based on how you are actually inputting that information and weighting those things, MarkLogic's using its real-time indexing capability and query capability to actually then change how each and every one of these countries is ranked to give you an idea of how well your country or your lifestyle maps through a given country. Okay?
And the final example that I'm going to share is MarkMail. And what MarkMail really tries to demonstrate is that we can provide these capabilities and you can do the sort of analysis not only on structured information or information that's coming in that's numerical but actually on more loosely structured, unstructured information, right? Things like emails. And what we've seen here is we're actually pulling information like geolocation, sender, company, stacks and concepts like Hadoop being mentioned within the context of an email and then visualizing it on the map as well as looking at who those individuals and what list across that, a sent and a date. This where you're looking at things that are traditionally not structured, that may be loosely structured, but are still able to derive some structured analysis from that information without having to go to a great length to actually try and structure it or process it at a time. And that's it.
Eric: Hey, okay good. And we got one more. We've got Hannah Smalltree from Treasure Data, a very interesting company. And this is a lot of great content, folks. Thank you so much for all of you for bringing such good slides and such good detail. So Hannah, I just gave the keys to you, click anywhere and use the down arrow on your keyboard. You got it. Kunin mo na.
Hannah Smalltree: Thank you so much, Eric. This is Hannah Smalltree from Treasure Data. I'm a director with Treasure Data but I have a past as a tech journalist, which means that I appreciate two things. First of all, these can be long to sit through a lot of different descriptions of technology, and it can all sound like it runs together so I really want to focus on our differentiator. And the real-world applications are really important so I appreciate that all of my peers have been great about providing those.
Treasure Data is a new kind of big data service. We're delivered entirely on the cloud in a software as a service or managed-service model. So to Dr. Bloor's point earlier, this technology can be really hard and it can be very time consuming to get up and running. With Treasure Data, you can get all of these kinds of capabilities that you might get in a Hadoop environment or a complicated on-premise environment in the cloud very quickly, which is really helpful for these new big data initiatives.
Now we talk about our service in a few different phases. We offer some very unique collection capabilities for collecting streaming data so particularly event data, other kinds of real-time data. We'll talk a little bit more about those data types. That is a big differentiator for our service. As you get into big data or if you are already in it then you know that collecting this data is not trivial. When you think about a car with 100 sensors sending data every minute, even those 100 sensors sending data every ten minutes, that adds up really quickly as you start to multiply the amount of products that you have out there with sensors and it quickly becomes very difficult to manage. So we are talking with customers who have millions, we have customers who have billions of rows of data a day that they're sending us. And they're doing that as an alternative to try and to manage that themselves in a complicated Amazon infrastructure or even try to bring it into their own environment.
We have our own cloud storage environment. We manage it. We monitor it. We have a team of people that's doing all that tuning for you. And so the data flows in, it goes into our managed storage environment.
Then we have embedded query engines so that your analyst can go in and run queries and do some initial data discovery and exploration against the data. We have a couple of different query engines for it actually now. You can use SQL syntax, which your analysts probably know and love, to do some basic data discovery, to do some more complex analytics that are user-defined functions or even to do things as simple as aggregate that data and make it smaller so that you can bring it into your existing data warehouse environment.
You can also connect your existing BI tools, your Tableau, is a big partner of ours; but really most BIs, visualization or analytics tools can connect via our industry standard JDBC and ODBC drivers. So it gives you this complete set of big data capabilities. You're allowed to export your queries results or data sets anytime for free, so you can easily integrate that data. Treat this as a data refinery. I like to think of it more as a refinery than a lake because you can actually do stuff with it. You can go through, find the valuable information and then bring it into your enterprise processes.
The next slide, we talk about the three Vs of big data - some people say four or five. Our customers tend to struggle with the volume and velocity of the data coming at them. And so to get specific about the data types - Clickstream, Web access logs, mobile data is a big area for us, mobile application logs, application logs from custom Web apps or other applications, event logs. And increasingly, we have a lot of customers dealing with sensor data, so from wearable devices, from products, from automotive, and other types of machine data. So when I say big data, that's the type of big data that I'm talking about.
Now, a few use cases in perspective for you - we work with a retailer, a large retailer. They are very well known in Asia. They're expanding here in the US. You'll start to see stores; they're often called Asian IKEA, so, simple design. They have a loyalty app and a website. And in fact, using Treasure Data, they were able to deploy that loyalty app very quickly. Our customers get up and running within days or weeks because of our software and our service architecture and because we have all of the people doing all of that hard work behind the scenes to give you all of those capabilities as a service.
So they use our service for mobile application analytics looking at the behavior, what people are clicking on in their mobile loyalty application. They look at the website clicks and they combine that with our e-commerce and POS data to design more efficient promotions. They actually wanted to drive people into stores because they found that people, when they go into stores spend more money and I'm like that; to pick up things, you spend more money.
Another use case that we're seeing in digital video games, incredible agility. They want to see exactly what is happening in their game, and make changes to that game even within hours of its release. So for them, that real-time view is incredibly important. We just released a game but we noticed in the first hour that everyone is dropping off at Level 2; how are we going to change that? They might change that within the same day. So real time is very important. They're sending us billions of event logs per day. But that could be any kind of mobile application where you want some kind of real-time view into how somebody's using that.
And finally, a big area for us is our product behavior and sensor analytics. So with sensor data that's in cars, that's in other kinds of machines, utilities, that's another area for us, in wearable devices. We have research and development teams that want to quickly know what the impact of a change to a product is or people interested in the behavior of how people are interacting with the product. And we have a lot more use cases which, of course, we're happy to share with you.
And then finally, just show you how this can fit into your environment, we offer again the capability to collect that data. We have very unique collection technology. So again, if real-time collection is something that you're struggling with or you anticipate struggling with, please come look at the Treasure Data service. We have really made capabilities for collecting streaming data. You can also bulk load your data, store it, analyze it with our embedded query engines and then, as I mentioned, you can export it right to your data warehouse. I think Will mentioned the need to introduce big data into your existing processes. So not go around or create a new silo, but how do you make that data smaller and then move it into your data warehouse and you can connect to your BI, visualization and advanced analytics tools.
But perhaps, the key points I want to leave you with are that we are managed service, that's software as a service; it's very cost effective. A monthly subscription service starting at a few thousand dollars a month and we'll get you up and running in a matter of days or weeks. So compare that with the cost of months and months of building your own infrastructure and hiring those people and finding it and spending all that time on infrastructure. If you're experimenting or if you need something yesterday, you can get up and running really quickly with Treasure Data.
And I'm just pointing you to our website and to our starter service. If you're a hands-on person who likes to play, please check out our starter service. You can get on, no credit card required, just name and email, and you can play with our sample data, load up your own data and really get a sense of what we're talking about. So thanks so much. Also, check our website. We were named the Gartner Cool Vendor in Big Data this year, very proud of that. And you can also get a copy of that report for free on our website as well as many other analyst white papers. So thanks so much.
Eric: Okay, thank you very much. We've got some time for questions here, folks. We'll go a little bit long too because we've got a bunch of folks still on the line here. And I know I've got some questions myself, so let me go ahead and take back control and then I'm going to ask a couple of questions. Robin and Kirk, feel free to dive in as you see fit.
So let me go ahead and jump right to one of these first slides that I checked out from Pentaho. So here, I love this evolving big data architecture, can you kind of talk about how it is that this kind of fits together at a company? Because obviously, you go into some fairly large organization, even a mid-size company, and you're going to have some people who already have some of this stuff; how do you piece this all together? Like what does the application look like that helps you stitch all this stuff together and then what does the interface look like?
Will: Great question. The interfaces are a variety depending on the personas involved. But as an example, we like to tell the story of - one of the panelists mentioned the data refinery use case - we see that a lot in customers.
One of our customer examples that we talk about is Paytronix, where they have that traditional EDW data mart environment. They are also introducing Hadoop, Cloudera in particular, and with various user experiences in that. So first there's an engineering experience, so how do you wire all these things up together? How do you create the glue between the Hadoop environment and EDW?
And then you have the business user experience which we talked about, a number of BI tools out there, right? Pentaho has a more embeddable OEM BI tool but there are great ones out there like Tableau and Excel, for instance, where folks want to explore the data. But usually, we want to make sure that the data is governed, right? One of the questions in the discussions, what about single-version experience, how do you manage that, and without the technology like Pentaho data integration to blend that data together not on the glass but in the IT environments. So it really protects and governs the data and allows for a single experience for the business analyst and business users.
Eric: Okay, good. That's a good answer to a difficult question, quite frankly. And let me just ask the question to each of the presenters and then maybe Robin and Kirk if you guys want to jump in too. So I'd like to go ahead and push this slide for WebAction which I do think is really a very interesting company. Actually, I know Sami Akbay who is one of the co-founders, as well. I remember talking to him a couple years ago and saying, "Hey man, what are you doing? What are you up to? I know you've got to be working on something." And of course, he was. He was working on WebAction, under the covers here.
A question came in for you, Steve, so I'll throw it over to you, of data cleansing, right? Can you talk about these components of this real-time capability? How do you deal with issues like data cleansing or data quality or how does that even work?
Steve: So it really depends on where you're getting your feeds from. Typically, if you're getting your feeds from a database as you change data capture then, again, it depends there on how the data was entered. Data cleansing really becomes a problem when you're getting your data from multiple sources or people are entering it manually or you kind of have arbitrary texts that you have to try and pull things out of. And that could certainly be part of the process, although that type simply doesn't lend itself to true, kind of, high-speed real-time processing. Data cleansing, typically, is an expensive process.
So it may well be that that could be done after the fact in the store site. But the other thing that the platform is really, really good at is correlation, so in correlation and enrichment of data. You can, in real time, correlate the incoming data and check to see whether it matches a certain pattern or it matches data that's being retrieved from a database or Hadoop or some other store. So you can correlate it with historical data, is one thing you could do.
The other thing that you can do is basically do analysis on that data and see whether it kind of matches certain required patterns. And that's something that you can also do in real time. But the traditional kind of data cleansing, where you're correcting company names or you're correcting addresses and all those types of things, those should probably be done in the source or kind of after the fact, which is very expensive and you pray that they won't do those in real time.
Eric: Yeah. And you guys are really trying to address the, of course, the real-time nature of things but also get the people in time. And we talked about, right, I mentioned at the top of the hour, this whole window of opportunity and you're really targeting specific applications at companies where you can pull together data not going the usual route, going this alternate route and do so in such a low latency that you can keep customers. For example, you can keep people satisfied and it's interesting, when I talked to Sami at length about what you guys are doing, he made a really good point. He said, if you look at a lot of the new Web-based applications; let's look at things like Twitter, Bitly or some of these other apps; they're very different than the old applications that we looked at from, say, Microsoft like Microsoft Word.
I often use Microsoft as sort of a whipping boy and specifically Word to talk about the evolution of software. Because Microsoft Word started out as, of course, a word processing program. I'm one of those people who remember Word Perfect. I loved being able to do the reveal keys or the reveal code, basically, which is where you could see the actual code in there. You could clean something up if your bulleted list was wrong, you can clean it up. Well, Word doesn't let you do that. And I can tell you that Word embeds a mountain of code inside every page that you do. If anyone doesn't believe me, then go to Microsoft Word, type "Hello World" and then do "Export as" or "Save as" .html. Then open that document in a text editor and that will be about four pages long of codes just for two words.
So you guys, I thought it was very interesting and it's time we talked about that. And that's where you guys focus on, right, is identifying what you might call cross-platform or cross-enterprise or cross-domain opportunities to pull data together in such quick time that you can change the game, right?
Steve: Yeah, absolutely. And one of the keys that, I think, you did elude to, anyway, is you really want to know about things happening before your customers do or before they really, really become a problem. As an example are the set-top boxes. Cable boxes, they emit telemetry all the time, loads and loads of telemetry. And not just kind of the health of the box but it's what you're watching and all that kind of stuff, right? The typical pattern is you wait till the box fails and then you call your cable provider and they'll say, "Well, we will get to you sometime between the hours of 6am and 11pm in the entire month of November." That isn't a really good customer experience.
But if they could analyze that telemetry in real time then they could start to do things like that we know these boxes are likely to fail in the next week based historical patterns. Therefore we'll schedule our cable repair guy to turn up at this person's house prior to it failing. And we'll do that in a way that suits us rather than having to send him from Santa Cruz up to Sunnyvale. We'll schedule everything in a nice order, traveling salesman pattern, etc., so that we can optimize our business. And so the customer is happy because they don't have a failing cable box. And the cable provider is happy because they have just streamlined things and they don't have to send people all over the place. That's just a very quick example. But there are tons and tons of examples where knowing about things as they happen, before they happen, can save companies a fortune and really, really improve their customer relations.
Eric: Yeah, right. No doubt about it. Let's go ahead and move right on to MarkLogic. As I mentioned before, I've known about these guys for quite some time and so I'll bring you into this, Frank. You guys were far ahead of the whole big data movement in terms of building out your application, it's really database. But building it out and you talked about the importance of search.
So a lot of people who followed the space know that a lot of the NoSQL tools out there are now bolting on search capabilities whether through third parties or they try to do their own. But to have that search already embedded in that, baked-in so to speak, really is a big deal. Because if you think about it, if you don't have SQL, well then how do you go in and search the data? How do you pull from that data resource? And the answer is to typically use search to get to the data that you're looking for, right?
So I think that's one of the key differentiators for you guys aside being able to pull data from all these different sources and store that data and really facilitate this sort of hybrid environment. I'm thinking that search capability is a big deal for you, right?
Frank: Yeah, absolutely. In fact, that's the only way to solve the problem consistently when you don't know what all the data is going to look like, right? If you cannot possibly imagine all the possibilities then the only way to make sure that you can locate all the information that you want, that you can locate it consistently and you can locate it regardless of how you evolve your data model and your data sets is to make sure you give people generic tools that allow them to interrogate that data. And the easiest, most intuitive way to do that is through a search paradigm, right? And through the same approach in search takes where we created an inverted index. You have entries where you can actually look into those and then find records and documents and rows that actually contain the information you're looking for to then return it to the customer and allow them to process it as they see fit.
Eric: Yeah and we talked about this a lot, but you're giving me a really good opportunity to kind of dig into it - the whole search and discovery side of this equation. But first of all, it's a lot of fun. For anyone who likes that stuff, this is the fun part, right? But the other side of the equation or the other side of the coin, I should say, is that it really is an iterative process. And you got to be able to - here I'll be using some of the marketing language - have that conversation with the data, right? In other words, you need to be able to test the hypothesis, play around with it and see how that works. Maybe that's not there, test something else and constantly change things and iterate and search and research and just think about stuff. And that's a process. And if you have big hurdles, meaning long latencies or a difficult user interface or you got to go ask IT; that just kills the whole analytical experience, right?
So it's important to have this kind of flexibility and to be able to use searches. And I like the way that you depicted it here because if we're looking at searching around different, sort of, concepts or keys, if you will, key values and they're different dimensions. You want to be able to mix and match that stuff in order to enable your analyst to find useful stuff, right?
Frank: Yeah, absolutely. I mean, hierarchy is an important thing as well, right? So that when you include something like a title, right, or a specific term or value, that you can actually point to the correct one. So if you're looking for a title of an article, you're not getting titles of books, right? Or you're not getting titles of blog posts. The ability to distinguish between those and through the hierarchy of the information is important as well.
You pointed out earlier the development, absolutely, right? The ability for our customers to actually pull in new data sources in a matter of hours, start to work with them, evaluate whether or not they're useful and then either continue to integrate them or leave them by the wayside is extremely valuable. When you compare it to a more traditional application development approach where what you end up doing is you have to figure out what data you want to ingest, source the data, figure out how you're going to fit it in your existing data model or model that in, change that data model to incorporate it and then actually begin the development, right? Where we kind of turn that on our head and say just bring it to us, allow you to start doing the development with it and then decide later whether or not you want to keep it or almost immediately whether or not it's of value.
Eric: Yeah, it's a really good point. That's a good point. So let me go ahead and bring in our fourth presenter here, Treasure Data. I love these guys. I didn't know much about them so I'm kind of kicking myself. And then Hannah came to us and told us what they were doing. And Hannah mentioned, she was a media person and she went over to the dark side.
Hannah: I did, I defected.
Eric: That's okay, though, because you know what we like in the media world. So it's always nice when a media person goes over to the vendor side because you understand, hey, this stuff is not that easy to articulate and it can be difficult to ascertain from a website exactly what this product does versus what that product does. And what you guys are talking about is really quite interesting. Now, you are a cloud-managed service. So any data that someone wants to use they upload to your cloud, is that right? And then you will ETL or CDC, additional data up to the cloud, is that how that works?
Hannah: Well, yeah. So let me make an important distinction. Most of the data, the big data, that our customers are sending us is already outside the firewall - mobile data, sensor data that's in products. And so we're often used as an interim staging area. So data is not often coming from somebody's enterprise into our service so much as it's flowing from a website, a mobile application, a product with lots of sensors in it - into our cloud environment.
Now if you'd like to enrich that big data in our environment, you can definitely bulk upload some application data or some customer data to enrich that and do more of the analytics directly in the cloud. But a lot of our value is around collecting that data that's already outside the firewall, bringing together into one place. So even if you do intend to bring this up sort of behind your firewall and do more of your advanced analytics or bring it into your existing BI or analytics environment, it's a really good staging point. Because you don't want to bring a billion rows of day into your data warehouse, it's not cost effective. It's even difficult if you're planning to store that somewhere and then batch upload.
So we're often the first point where data is getting collected that's already outside firewall.
Eric: Yeah, that's a really good point, too. Because a lot of companies are going to be nervous about taking their proprietary customer data, putting it up in the cloud and to manage the whole process.
Hannah: Yeah.
Eric: And what you're talking about is really getting people a resource for crunching those heavy duty numbers of, as you suggest, data that's third party like mobile data and the social data and all that kind of fun stuff. That's pretty interesting.
Hannah: Yeah, absolutely. And probably they are nervous about the products because the data are already outside. And so yeah, before bringing it in, and I really like that refinery term, as I mentioned, versus the lake. So can you do some basic refinery? Get the good stuff out and then bring it behind the firewall into your other systems and processes for deeper analysis. So it's really all data scientists can do, real-time data exploration of this new big data that's flowing in.
Eric: Yeah, that's right. Well, let me go ahead and bring in our analysts and we'll kind of go back in reverse order. I'll start with you, Robin, with respect to Treasure Data and then we'll go to Kirk for some of the others. And then back to Robin and back to Kirk just to kind of get some more assessment of this.
And you know the data refinery, Robin, that Hannah is talking about here. I love that concept. I've heard only a few people talking about it that way but I do think that you certainly mentioned that before. And it really does speak to what is actually happening to your data. Because, of course, a refinery, it basically distills stuff down to its root level, if you think about oil refineries. I actually studied this for a while and it's pretty basic, but the engineering that goes into it needs to be exactly correct or you don't get the stuff that you want. So I think it's a great analogy. What do you think about this whole concept of the Treasure Data Cloud Service helping you tackle some of those very specific analytical needs without having to bring stuff in-house?
Robin: Well, I mean, obviously depending on the circumstances to how convenient that is. But anybody that's actually got already made process is already going to put you ahead of the game if you haven't got one yourself. This is the first takeaway for something like that. If somebody assembled something, they've done it, it's proven in the marketplace and therefore there's some kind of value in effect, well, the work is already gone into it. And there's also the very general fact that refining of data is going to be a much bigger issue than it ever was before. I mean, it is not talked about, in my opinion anyway, it's not talked about as much as it should be. Simply apart from the fact that size of the data has grown and the number of sources and the variety of those sources has grown quite considerably. And the reliability of the data in terms of whether it's clean, they need to disambiguate the data, all sorts of issues that rise just in terms of the governance of the data.
So before you actually get around to being able to do reliable analysis on it, you know, if your data's dirty, then your results will be skewed in some way or another. So that is something that has to be addressed, that has to be known about. And the triangulator of providing, as far as I can see, a very viable service to assist in that.
Eric: Yes, indeed. Well, let me go ahead and bring Kirk back into the equation here just real quickly. I wanted to take a look at one of these other slides and just kind of get your impression of things, Kirk. So maybe let's go back to this MarkLogic slide. And by the way, Kirk provided the link, if you didn't see it folks, to some of his class discovery slides because that's a very interesting concept. And I think this is kind of brewing at the back of my mind, Kirk, as I was talking about this a moment ago. This whole question that one of the attendees posed about how do you go about finding new classes. I love this topic because it really does speak to the sort of, the difficult side of categorizing things because I've always had a hard time categorizing stuff. I'm like, "Oh, god, I can fit in five categories, where do I put it?" So I just don't want to categorize anything, right?
And that's why I love search, because you don't have to categorize it, you don't have to put it in the folder. Just search for it and you'll find it if you know how to search. But if you're in that process of trying to segment, because that's basically what categorization is, it's segmenting; finding new classes, that's kind of an interesting thing. Can you kind of speak to the power of search and semantics and hierarchies, for example, as Frank was talking about with respect to MarkLogic and the role that plays in finding new classes, what do you think about that?
Kirk: Well, first of all, I'd say you are reading my mind. Because that was what I was thinking of a question even before you were talking, this whole semantic piece here that MarkLogic presented. And if you come back to my slide, you don't have to do this, but back on the slide five on what I presented this afternoon; I talked about this semantics that the data needs to be captured.
So this whole idea of search, there you go. I firmly believe in that and I've always believed in that with big data, sort of take the analogy of Internet, I mean, just the Web, I mean having the world knowledge and information and data on a Web browser is one thing. But to have it searchable and retrievable efficiently as one of the big search engine companies provide for us, then that's where the real power of discovery is. Because connecting the search terms, sort of the user interests areas to the particular data granule, the particular webpage, if you want to think the Web example or the particular document if you're talking about document library. Or a particular customer type of segment if that's your space.
And semantics gives you that sort of knowledge layering on top of just a word search. If you're searching for a particular type of thing, understanding that a member of a class of such things can have a certain relationship to other things. Even include that sort of relationship information and that's a class hierarchy information to find things that are similar to what you're looking for. Or sometimes even the exact opposite of what you're looking for, because that in a way gives you sort of additional core of understanding. Well, probably something that's opposite of this.
Eric: Yeah.
Kirk: So actually understand this. I can see something that's opposite of this. And so the semantic layer is a valuable component that's frequently missing and it's interesting now that this would come up here in this context. Because I've taught a graduate course in database, data mining, learning from data, data science, whatever you want to call it for over a decade; and one of my units in this semester-long course is on semantics and ontology. And frequently my students would look at me like, what does this have to do with what we're talking about? And of course at the end, I think we do understand that putting that data in some kind of a knowledge framework. So that, just for example, I'm looking for information about a particular customer behavior, understanding that that behavior occurs, that's what the people buy at a sporting event. What kind of products do I offer to my customers when I notice on their social media - on Twitter or Facebook - that they say they're going to a sporting event like football, baseball, hockey, World Cup, whatever it might be.
Okay, so sporting event. So they say they're going to, let's say, a baseball game. Okay, I understand that baseball is a sporting event. I understand that's usually a social and you go with people. I understand that it's usually in an outdoor space. I mean, understanding all those contextual features, it enables sort of, more powerful, sort of, segmentation of the customer involved and your sort of personalization of the experience that you're giving them when, for example, they're interacting with your space through a mobile app while they're sitting in a stadium.
So all that kind of stuff just brings so much more power and discovery potential to the data in that sort of indexing idea of indexing data granules by their semantic place and the knowledge space is really pretty significant. And I was really impressed that came out today. I think it's sort of a fundamental thing to talk.
Eric: Yeah, it sure is. It's very important in the discovery process, it's very important in the classification process. And if you think about it, Java works in classes. It's an object oriented, I guess, more or less, you could say form of programming and Java works in classes. So if you're actually designing software, this whole concept of trying to find new classes is actually pretty important stuff in terms of the functionality you're trying to deliver. Because especially in this new wild, wooly world of big data where you have so much Java out there running so many of these different applications, you know there are 87, 000 ways or more to get anything done with a computer, to get any kind of bit of functionality done.
One of my running jokes when people say, "Oh, you can build a data warehouse using NoSQL." I'm like, "well, you could, yeah, that's true. You could also build a data warehouse using Microsoft Word." It's not the best idea, it's not going to perform very well but you can actually do it. So the key is you have to find the best way to do something.
Go ahead.
Kirk: Let me just respond to that. It's interesting you mentioned the Java class example which didn't come into my mind until you said it. One of the aspects of Java and classes and that sort of object orientation is that there are methods that bind to specific classes. And this is really the sort of a message that I was trying to send in my presentation and that once you understand some of these data granules - these knowledge nuggets, these tags, these annotations and these semantic labels - then you can bind a method to that. They basically have this reaction or this response and have your system provide this sort of automated, proactive response to this thing the next time that we see it in the data stream.
So that concept of binding actions and methods to specific class is really one of the powers of automated real-time analytics. And I think that you sort of hit on something.
Eric: Good, good, good. Well, this is good stuff. So let's see, Will, I want to hand it back to you and actually throw a question to you from the audience. We got a few of those in here too. And folks, we're going long because we want to get some of these great concepts in these good questions.
So let me throw a question over to you from one of the audience numbers who's saying, "I'm not really seeing how business intelligence is distinguishing cause and effect." In other words, as the systems are making decisions based on observable information, how do they develop new models to learn more about the world? It's an interesting point so I'm hearing a cause-and-effect correlation here, root cause analysis, and that's some of that sort of higher-end stuff in the analytics that you guys talk about as opposed to traditional BI, which is really just kind of reporting and kind of understanding what happened. And of course, your whole direction, just looking at your slide here, is moving toward that predictive capability toward making those decisions or at least making those recommendations, right? So the idea is that you guys are trying to service the whole range of what's going on and you're understanding that the key, the real magic, is in the analytical goal component there on the right.
Will: Absolutely. I think that question is somewhat peering into the future, in the sense that data science, as I mentioned before, we saw the slide with the requirements of the data scientist; it's a pretty challenging role for someone to be in. They have to have that rich knowledge of statistics and science. You need to have the domain knowledge to apply your mathematical knowledge to the domains. So what we're seeing today is there aren't these out-of-the-box predictive tools that a business user, like, could pull up in Excel and automatically predict their future, right?
It does require that advanced knowledge in technology at this stage. Now someday in the future, it may be that some of these systems, these scale-out systems become sentient and start doing some wild stuff. But I would say at this stage, you still have to have a data scientist in the middle to continue to build models, not these models. These predictive models around data mining and such are highly tuned in and built by the data scientist. They're not generated on their own, if you know what I mean.
Eric: Yeah, exactly. That's exactly right. And one of my lines is "Machines don't lie, at least not yet."
Will: Not yet, exactly.
Eric: I did read an article - I have to write something about this - about some experiment that was done at a university where they said that these computer programs learned to lie, but I got to tell you, I don't really believe it. We'll do some research on that, folks.
And for the last comment, so Robin I'll bring you back in to take a look at this WebAction platform, because this is very interesting. This is what I love about a whole space is that you get such different perspectives and different angles taken by the various vendors to serve very specific needs. And I love this format for our show because we got four really interesting vendors that are, frankly, not really stepping on each others' toes at all. Because we're all doing different bits and pieces of the same overall need which is to use analytics, to get stuff done.
But I just want to get your perspective on this specific platform and their architecture. How they're going about doing things. I find it pretty compelling. Ano sa tingin mo?
Robin: Well, I mean, it's pointed at extremely fast results from streaming data and as search, you have to architect for that. I mean, you're not going to get away with doing anything, amateurish, as we got any of that stuff. I hear this is extremely interesting and I think that one of the things that we witnessed over the past; I mean I think you and I, our jaw has been dropping more and more over the past couple of years as we saw more and more stuff emerge that was just like extraordinarily fast, extraordinarily smart and pretty much unprecedented.
This is obviously, WebAction, this isn't its first rodeo, so to speak. It's actually it's been out there taking names to a certain extent. So I don't see but supposed we should be surprised that the architecture is fairly switched but it surely is.
Eric: Well, I'll tell you what, folks. We burned through a solid 82 minutes here. I mean, thank you to all those folks who have been listening the whole time. If you have any questions that were not answered, don't be shy, send an email to yours truly. We should have an email from me lying around somewhere. And a big, big thank you to both our presenters today, to Dr. Kirk Borne and to Dr. Robin Bloor.
Kirk, I'd like to further explore some of that semantic stuff with you, perhaps in a future webcast. Because I do think that we're at the beginning of a very new and interesting stage now. What we're going to be able to leverage a lot of the ideas that the people have and make them happen much more easily because, guess what, the software is getting less expensive, I should say. It's getting more usable and we're just getting all this data from all these different sources. And I think it's going to be a very interesting and fascinating journey over the next few years as we really dig into what this stuff can do and how can it improve our businesses.
So big thank you to Techopedia as well and, of course, to our sponsors - Pentaho, WebAction, MarkLogic and Treasure Data. And folks, wow, with that we're going to conclude, but thank you so much for your time and attention. We'll catch you in about a month and a half for the next show. And of course, the briefing room keeps on going; radio keeps on going; all our other webcast series keep on rocking and rolling, folks. Maraming salamat. We'll catch you next time. Paalam.