原點科技介紹既2011年終回顧

原點科技成立於2010年十月,在過去的一年多間,本公司就為Location Based Service & Mobile開發及學習了許多技術。

Json WebService API with Jersey


身為SaaS的提供者,我們需要提供許多不同的WebService API給我們的客戶,在轉換內部的Java Object到外部的Json Object,我們選用的是Jersey這套 JAX-RS 標準的實作,以及Jackson這套 Json library。

這部份的成果,我們曾在2010年11月於 TWJUG 發表過,相關的投影片請見,另外關於Jackson的使用,我們也發表了幾篇部落格文章講解,怎麼在Scala上使用Jackson怎麼樣處理多型使用Jackson的小眉角,以及怎麼在Android上使用Jackson

對於 Jersey & Jackson 這一套 Json WebService framework,我們使用的經驗是很滿意,也大量的使用在我們的內外部系統間。

Search with Lucene


在 Lucene 之上,我們建立了一套,不同於Solr的 WebService 實作,透過我們自有的Search API,能夠對我們的Json文件庫做全文檢索及條件比對,並且可做翻頁、及選取部份欄位等運算。

Location Based(Spatial) Search


當我們開始做 LBS 時,一開始我們是始用 lucene-spatial 這套官方的函式庫來做 LBS ,但是,很快的,我們就了解到,用 Map Tile 來做 LBS 的問題,Map Tile的做法是,把地圖割成不同大小的區塊然後編號,然後,把這區塊內所有的座標,都標上一樣的 tag 值,例如 t16m2345 - t(tile_size)m(tile_id)。那麼,在做搜尋時,我們只要把現在的座標位址的tile id算出來,在去找有相同tile id的點就好。

這種做法的問題是,找出來的點,並沒有依距離排序;當 Lucene 對選出來的文件做 scoring 時,因為同一個區塊內的點都具有一樣的tile_id,所以,他們的分數都是一樣的,並不會依距離的不同,而有不同的分數。

另外,我們也發現,多數的搜尋技術,在處理地理位址時,只處理地點的資料,例如餐廳、百貨公司等小地理區塊等,但是這些地點搜詢技術,並不適合用在處理大區塊範圍的資料,如行政區及學區等。

如多數外國網站用的坐標轉行政區的GeoNames,他在搜尋某個座標目前所屬的行政區時,是計算座標點與附近行政區中心的距離來猜測,但是行政區往往不是正圓型的,行政區的劃分多是依自然環境(如河川)來切割的,所以,既始某個座標離行政區甲較近,但是,在實務上則是被劃分到乙行政區去的;類似的例子還有學區等

另一類無法用點去描述的,是線型的資料,如登山步道、腳踏車道,當我們登山時,我們不一定要從起點開始爬,而是可以從中間點開始加入,因此,當我們在找附近所有的登山步道時,該是計算所有線型資料與目前座標點的最短距離,而非是算起點與目前座標點的距離。

因此,就這兩個問題,我們實作了對地理區塊搜尋的功能,讓用戶能對點(Point)、線(LineString),多邊型(Polygon)等資料,做搜尋,並對尋結果依距離來排序。

就這功能,我們發表了搜詢座標所在行政區台北水災地圖


Search Analytics


講到搜尋,當然也不得不提用戶行為分析(Analytics),我們在我們的Search API中,內建了記錄使用者行為的功能,當一個使用者,打開程式送出第一個搜尋、翻到第二頁、點選了第三個連結、再做另一個搜尋、選了第一個結果、離開,這一整串的行為,都會被我們的後端自動記錄下來。

透過我們自己開發的 Mobile Analytics 技術,我們可以就這些使用者行為,提供底下的分析報告

  • Client Profiles
    • number of clients by device family
    • number of clients by device.
    • number of active clients by device family.
    • number of active clients by hour/day/week/month
    • the location of clients.
    • the location of active clients by hour/day/week/month.
    • average usage frequency(how often clients use your app per week)
  • Visit Trends
    • number of request of the hour/day/week/month
    • unique visits of the hour/day/week/month
    • average length of visits.
    • new vs return clients on the hour/day/week/month.
  • Search Usage
    • top search keywords of the hour/day/week/month.
    • top search keywords by location/country
    • the location of search usage of the hour/day/week
    • average search usage per visit.
    • top-exit search.
  • Content Usage:
    • popular contents.
    • popular result document ids.
    • top-exit result.

大量運算


在過去的一年多間,我們花了很多的心力在學習雲端相關技術,評估什麼技術適合我們,什麼不適合我們,像在做Analytics時,我們選用了MySQL而不是熱門的Hadoop

因為我們有群組軟體的需求,需要知道,在Cluster中,那幾台機器是負責處理 Domain X 的資料;我們嘗試了使用 jgroups 及 zookeper 來實作 service registry/discovery 的功能,這經驗,也在CloudTW發表過,投影片在此

為了在我們的 indexing 中使用 Producer and Consumer Pattern & durable message box,我們也投入了許多時間去學習Akka, Actor Model, Amazon SQS, ActiveMQ, Apache Camel,Akka的成果,我們也在 COSCUP 發表過Video on YouTube

社群服務


在我們專注於軟體開發的同時,原點科技也不忘回饋社會及社群,原點科技的創辦人加入多個社群,並多次主講議題,包括
  • TWJUG: Introduction to Tapestry
  • CloudTW: Groupware - JGroups and Zookeeper.
  • Coscup: Introduction to Actor Model and Akka
  • OpenData: 系列講座 #2 - 第一次爬資料就上手?! 碰壁!

在今年底,原點科技將協助Scala Taipei的運作

關於未來


在下一個年度,我們將會把心力放在推廣我們的產品Search Cloud之上,通過參展與競賽,增加我們的媒體知名度。另外一方面,我們也會開始接一些關於LBS, Analytics, Distributed System的外包案。

此外,我們也會把一部份的心力,放置在推廣 Scala 及 Akka 的企業應用,透過教育訓練、外包專案、雜誌文章及研討會的方式,讓台灣的軟體界對這兩者有更多的了解

Introduction to Search Cloud

When I first started my entrepreneur life, I worked on location based applications. Soon, I realized, to build an location based application, there are three things in common.
  • UI: A Mobile UI and/or Web UI is required.
  • Data: Either the developer or the users have to contribute valuable data to the application.
  • Backend: Besides a UI and valuable data, you still need a powerful backend to manage data and serve requests. Also, the backend need to track usages and analyze user behavior.


On the backend side, for most startups, when they first start, all they need is a simple data model just like an excel spreadsheet. The backend need to be able to handle data types like String, Number, Date, Location, Array, and Map.

With these simple data types, we could build powerful search applications. For example, we can ask the backend to give me a list of "fast food restaurant" "with in 2 miles radius", "which offers coupons", "by 7/30/2011".

Another good sample is a demo application, "The Paparazzis", my company is building. The idea is inspired by a Paparazzi website which I couldn't remember the name.

The idea is when you, the pararazzi, see famous stars shopping at Rodeo Drive, Hollywood, you can take a picture for the star and upload the picture to "The Paparazzis". So that, when other user wants to follow where this star has been to during the past few weeks, he/she can just open the application and check. Or if you want to know are there any famous stars shopping at Rodeo Drive at this moment, you can just open the application and check.

The SearchCloud Platform


Our LBS platform, SearchCloud, provides an programming interface for storing, managing, and querying data. Search Cloud allows developers to store business in SearchCloud and build a GUI on the top of their data and the rich search cloud API.

The data model in SearchCloud is simply a Json Object with an id. The developer has to provide a schema first and then upload json objects to SearchCloud. The SearchCloud will store and build search indices for the developer. When the developer does CRUD operation against an object, the search cloud will update the search indices as well.

Searchcloud provides a rich search interface. Data  objects can be searched by keywords, by value, by a range of value, or by location.  Combining multiple criteria into one single query is supported as well.

Also, tracking variables could be added to search requests. ie: the location of the user, the machine id of the mobile device. When tracking variables are integration into the application, SearchCloud will be able tell the developer where the customers are, what they care about, what the popular search results are, and what result the users click.

In the restaurant example we talked above, if the developer see most of his/her customers look for Italian restaurants near San Francisco, the developer can do two things. First, he may stop collecting restaurants data in other cities. Or, we can create another application for this target group.

Other Offerings


On the top of the SearchCloud platform, we will offer SDKs and UI Widgets to developers. We are building the client library in Java and Android UI widget. The client library in iOS UI Widgets are coming soon as well. Our target is to enable developers to build a LBS application in five minutes.

Other than that, an interesting application is under going. I call it 'Google AppEngine's Missing Piece'. GAE provides a powerful datastore without the support for search. Prior to GAE's Backend API, it is hard to do search on GAO.

The 'GAE's missing piece' offers a JDO lifecycle listener. When the datastore performs a CRUD operation against an object through JDO, the library will send a copy of the object to the SearchCloud. As a result, the developer would be able to perform search operation against the data.

Pricing

Our target customers are small startups in their early stage. The SearchCloud platform will help them to bring the product to productions in a short amount of time.

Our pricing plans are
Developer Plan
- Free
- Store 5000 Json objects
- 10,000 Requests per day

Plus Plan
- 50 USD per Month
- Store 25000 Json objects
- 250,000 Requests per day
Premium Plan
- 200 USD per Month
- Store 250,000 Json objects
- 250,000 Requests per day
* the pricing model for exceeding daily requests is under development.

Introduction to Search Cloud, Chinese Version.

在原點成立前,原本我是想開發 Location Based Application 來賣,但是在開發中,我發現,多數的 LBS App,在成立之初,需要做三件事
  • UI: 不論是 Mobile UI或者是Web UI,都是要提供一個讓使用者輸入的界面。
  • Data: 這些資料,可能是開發者自己去搜集來的,或者是由使用者提供的資料,總之,這仍是不可或缺的一塊。
  • Backend:有了使用者介面及資料後,資料的維護及搜集,仍是需要有伺服器端來做處理,此外,使用者的用戶分析,也要透過這一塊才能處理。


而在後端這一塊,多數的 Startup,在開始時,需要的資料模式其實很簡單,像是一張試算表一樣,有 String, Number, Date, Location, Array, Map 等資料型態

有了這些資料型態,就可以做些很簡單的搜尋運算,例如:『我要找 "方圓兩公里之內" 在 "7/30日前" 有 "提供折價券" 的 "美式餐廳』

又或者,我公司目前在開發的展示程式『全民狗仔隊』,讓用戶,當看到名人時,可以拍照後上傳照片及姓名,這樣,當其他使用者想知道,過去兩週某位名人的出沒區域,或者是想找目前所在地有那些名人,只要打開程式就可以查到了。

The Platform


在我們的LBS Platform產品Search Cloud中,我們的資料型態很簡單,就是一個 Json Object,開發者只要先提供 Data Schema,再上傳Json Object,SC就會把資料存起來並建立Search Index。當開發者對某個 ID 做CRUD時,SC也會更新索引。

在搜尋上,SC提供三種方式,Keyword Search, Range Search(for Numeric or Date field), 及 Location Based Search。

在搜尋時, LBS App可以把額外的參數加在搜尋條件之上,如使用者目前所在的位址、手機的Machine Id,我這邊會幫你做資料分析的工作。我可以告訴開發商,你的用戶在那裡、他們關心什麼,最熱門的搜尋結果是什麼,使
用者又點了那些資料。

以前面的餐廳例子來說,若是開發者發現,他的客戶多在台中找美式餐廳,那麼,他可以做兩個決策,一是不去搜集其它地區的資料,或者是,他可以對這群目標客戶,再開發專用的程式。

加值服務


在搜尋平台之上,我們還會提供許多不同的 SDK 及 UI Widget 給軟體開發者,目前,我們正在開法Java Client 及 Android UI Widget 中,另外,也在找人開發Client in Obj-C 及 iOS UI Widget 原件。目標是在五分鐘之內,讓開發者能夠建立一隻 LBS App的雛型。

此外,有個有趣的應用也在開發中,叫 Google AppEngine's Missing Piece, GAE提供了 datastore但卻不支援搜尋的功能,在GAE Backend API出現之前,在 GAE上弄搜尋是有困難的。

因此我這邊會提供 JDO Lifecycle Listener ,當開發者透過 JDO對資料做CRUD時,會多送一份到SC這邊,如此一來,開發者可以透過我這邊來完成搜尋的功能。

合作方式


我的目標客戶是三至五人的 Startup,幫助他們在最短時間內把產品上線,有興趣合作的客戶,請來信給 yho@bluetangstudio.com ,有興趣來我這邊看看的,也請寄信到 jobs@bluetangstudio.com

至於收費方式則是
Developer Plan
- Free
- Store 5000 Json objects
- 10,000 Requests per day

Plus Plan
- 50 USD per Month
- Store 25000 Json objects
- 250,000 Requests per day
Premium Plan
- 200 USD per Month
- Store 250,000 Json objects
- 250,000 Requests per day
* 以日計價的超額收費方式,正在開發中

How to enable ephemeral storage on Amazon Beanstalk instance.

Amazon's Beanstalk is a great deployment environment for startup like us. Occasionally, we see issue on the beanstalk but lucky beanstalk allow us to use custom AMI for the beanstalk instances.

One of the issue we seen is that beanstalk only comes with 8GB ESB storage. But in fact, each EC2 instance comes with ephemeral(local) storages.(the size varies based on the instance type).

Our goal is to enable these missing ephemeral storage for the tomcat application. To do create a custom AMI with ephemeral storage.

I. Disable Auto Mount


Due to the default setting in CloudInit, it always mounts the first ephemeral drive on /media/ephemeral. To make the mountpoint customizable, we have to disable automount ephemeral0 first.

To do so, you have to launch a new instance with the beanstalk AMI first.
ec2-run-instances ami-b8c539d1 -t m1.large

Log in to the server
Edit /etc/sysconfig/cloudinit, and set CONFIG_MOUNTS=no.
Edit the /etc/fstab

/dev/sdb /var/cache/tomcat6 auto defaults 0 2
  /dev/sdc /tmp auto defaults 0 2
And then delete everything in /var/cache/tomcat6.

rm -rf /var/cache/tomcat6/*

II. Create a Temporary AMI


Log in to http://aws-portal.amazon.com and create a new AMI based on the instance we just used.

III. Create the Cloud Init Script


Open your text editor and create a tomcat.init script on your local machine

#!/bin/sh

chown root:root /var/cache/tomcat6
chmod 755 /var/cache/tomcat6
chmod 1777 /tmp

if [ ! -d  /var/cache/tomcat6/temp ]
then 
  mkdir /var/cache/tomcat6/temp
  chmod 775 /var/cache/tomcat6/temp
  chown tomcat:root /var/cache/tomcat6/temp
fi
if [ ! -d  /var/cache/tomcat6/work ]
then 
  mkdir /var/cache/tomcat6/work
  chmod 775 /var/cache/tomcat6/work
  chown tomcat:root /var/cache/tomcat6/work
fi

IV. Create a New Instance with Ephemeral Storage

ec2-run-instances ami-xxxxxxx --user-data-file tomcat.init -b /dev/sdb=ephemeral0 -b /dev/sdc=ephemeral1 -t m1.large 

V. Create Beanstalk AMI



Log in to http://aws-portal.amazon.com and create a new AMI based on the instance we just initiated. This AMI is the AMI you can use in your beanstalk environment.

Why google.map don't work well in Taiwan

This is part of my posting on telnet://ptt.cc

Google街景不準,跟台灣門牌號碼制度的落後比較有關。

先說美國的門牌號碼作法,以紐約市為例 http://tinyurl.com/44arddr

直的是 Ave(大道),橫的是St(街),門牌號碼的排法是每一個路口就是重新算 100號做起始。所以算 3712 號,就是在該路第三十七跟三十八個路口間,用內差法來做排列。若是一個大樓占據了整個區間,下一個路口仍是從 xx00 開始排

至於台灣的排法,則是從路口來數大樓數,要是碰上一個大樓佔了整個區塊,下一個建物還是的門牌還是 N+1

http://tinyurl.com/4yhjwz7
所以就碰上了從市貿到NY Bagles 這邊,整排都是大型大樓,就只有幾個門牌號碼。在這一區,Google就吃鱉吃的很慘,我有次要去鴻喜花園的店,Google.Map給我的地址跑到三百公尺外的牡丹園去。

美國的門牌號碼編法,讓Google.Map等要存的資料較少,要定位時用內插法來估,也不會歪太遠;台灣的門牌編法,除了一個一個點的去存資料,不然變成沒有辦法精確的定位。

ps: 台灣門牌,原地改建時會變得更好玩,一棟變三棟就變 -1 -2 -3 ,我上禮拜有碰過 -47 的 Orz..

Monthly Review: March 2011.

It was quit busy three months for me and thing has changed a lot during the last three month.

First, we had a new hire, a full-time SDET, Ace Chen. He will be responsible for load testing and performance estimating on our platform product.

On the frontend side, Even and Michael are working hard to bring the frontend UI to market. We are approaching the final stage of design and implementation. You could expect to see the beta release in the coming two months.

A pricing and billing system is under-way too, we have surveyed few billing platform including Paypal WebPro, Recurly, Braintree, etc... and chose Chargify as our solution.

On the platform side, a lot of thing has changed. First, we switched our choice of cloud provider from Rackspace to Amazon beanstalk. The decision is based on our experience with operational support and reliability of Rackspace.

Also, the platform has written to adopt to the benefit of the Amazon offering. The cassandra cluster has been replaced by S3 and SimpleDb. Asynchronized event processing is brought to our platform with the help from Akka, Camel, and Amazon SQS.

During the load test, we already got some good numbers from the platform. We will start to build demo client applications soon.

Monthly Review: December 2010.

December is a quite interesting month for bluetangstudio.

We have hired our 1st (part-time) employer as a SDET. Also, we may have a new hire who will join bluetangstudio early next year.

Also, we have finalized the software architecture design and the customer facing API for our platform produce. At the mean time, we also finalized the web design. This is quit going well this year.

Being an entrepreneur is always an exciting thing on both good and bad side of life. One just can not expect what will happen in the next second. No matter what kind of difficulty is standing in front of you, you have to laugh at it and solve the situation.

Early this month, we lost one web designer contractor before of his tight schedule. In just few days, I met another excellent web designer, Even Wu. Even has designed few major websites in Taiwan, I can't wait to see what kind of product we will create together.