Posted 4 days ago
I got all of the pieces worked out on my app running on Apache TomEE.
Before you invest too much time reading - this is really dry reading. But, if you are planning on connecting to a MySQL database from an app running ... [More] on TomEE it might help you get it working.
You're still here! Well...
It took a bit longer than I had expected it to (of course I didn't get to work on it full time). But it works!
I've got my JSP sending data to my servlet using AJAX. The servlet is properly parsing the XML and uses JPA to access my MySQL database.
Perhaps strangely enough, the biggest headache that I ran into was getting the data source properly configured.
Here is what it took...
In the WEB-INF directory is my resources.xml file:
<?xml version="1.0" encoding="UTF-8"?>
<Resource id="pInit" type="DataSource">
In the META-INF directory is the persistence.xml file:
<?xml version="1.0" encoding="UTF-8"?> <persistence xmlns="http://java.sun.com/xml/ns/persistence" version="1.0"> <persistence-unit transaction-type="JTA" name="pilCommon"> <jta-data-source>pInit</jta-data-source> <class>com.pubint.projInit.entity.Project</class> <properties> <property name="openjpa.jdbc.DBDictionary" value="mysql" /> <property name="openjpa.AutoDetach" value="close" /> <property name="openjpa.DetachState" value="fetch-groups(AccessUnloaded=true)" /> <property name="openjpa.Multithreaded" value="true" /> <property name="openjpa.TransactionMode" value="managed" /> <property name="openjpa.NontransactionalRead" value="true" /> <property name="openjpa.RestoreState" value="all" /> <property name="openjpa.jdbc.SynchronizeMappings" value="false" /> <property name="openjpa.InverseManager" value="true" /> </properties> </persistence-unit> </persistence>
I might have gone overboard on the properties that I set in the persistence file but the minimal version that I modeled after from the TomEE site did not work. TomEE kept trying to use HSQL to connect to the database rather than MySQL (which is what it really should have been).
But that is okay - it works and I can finally move on to building the real pages and implementing the needed functionality.
Posted 4 days ago
위키피디아에 따르면 국내 최대포탈 네이버의 하루 평균 페이지뷰는 860,000,000. 웹로그 한 라인의 평균을 150 bytes라고 계산하면 (referrer URL 추가해서 이정도라고 계산),
하루: 196 ... [More] MB
일년: 69 GB
10년: 698 GB
99년도 창립 이후 지금까지 14년정도 운영해왔으니 대충 계산해도 access log는 700 GB정도 밖에 안되는거다.
그럼 테라바이트 또는 페타바이트 규모의 데이터는 어디에?
무시무시한 속도로 생성되는 웹 문서, 웹 메일 정도 되겠다. 검색엔진 서비스 개선을 위한 페이지랭크, 스팸필터링, 그리고 문서 클러스터링 요런거 말고 할 것이 별로 없다 (이런 필요성에 MapReduce 모델이 나온 것이다). 아마 분석보다는 스토리지 문제에 포커싱해야겠지.
이런 데이터가 전체 데이터의 90% 이상을 차지하는 것이고, 오늘날 빅데이터가 말하는 "사용자 성향 분석", "추천엔진" 뭐 그런 것들에 필요한 인풋 데이터는 실질적으로 네이버 규모에서도 그 사이즈가 GB 수준일 수 밖에 없다.
그러면, 네이버 웹스케일이 아닌 다른 회사들의 전자데이터는 과연 얼마나 될까? :-)
빅데이터니 수천대 스케일 관리툴이니 빅데이터 SQL 처리 솔루션이나, 여기저기서 나와있는 한국의 과제 RFP 같은걸 보면 솔직히 가끔 한심하다는 생각이 들 때가 있다. 뭐 외국도 사실 마찬가지다. 몇 테라바이트 트윗 데이터에서 어떤 분석을 위해 사용자 인터렉션 구조를 추출해봐야 몇 GB 되지 않는다.
고급 분석에서는 데이터 사이즈가 문제되는게 아니고 계산 복잡도가 문제인 것을 사람들은 알아야해. [Less]
Posted 4 days ago
A cautionary tale about building large-scale polyglot systems
‘a fucking nightmare’:
Cascading requires a compilation step, yet since you’re writing Ruby code, you get get none of the benefits of static type checking. It was ... [More] standard to discover a type issue only after kicking off a job on, oh, 10 EC2 machines, only to have it fail because of a type mismatch. And user code embedded in strings would regularly fail to compile – which you again wouldn’t discover until after your job was running. Each of these were bad individually, together, they were a fucking nightmare. The interaction between the code in strings and the type system was the worst of all possible worlds. No type checking, yet incredibly brittle, finicky and incomprehensible type errors at run time. I will never forget when one of my friends at Etsy was learning Cascading.JRuby and he couldn’t get a type cast to work. I happened to know what would work: a triple cast. You had to cast the value to the type you wanted, not once, not twice, but THREE times.
(tags: etsy scalding cascading adtuitive war-stories languages polyglot ruby java strong-typing jruby types hadoop)
It’s So Easy
Attempting to cash out of Bitcoins turns out to be absurdly difficult:
Trying to sell the coins in person, and basically saying he ether wants Cash, or a Cashiers check (since it can be handed over right then and there), has apparently been a hilarious clusterfuck. Today he met some guy infront of his bank, and apparently as soon as he mentioned that he needs to get the cash checked to make sure it is not counterfeit, the guy freaked out and basically walked away. Stuff like this has been happening all week, and he apparently so far has only sold a single coin of several hundred.
(tags: bitcoin fail funny mtgox fraud cash fiat-currency via:rsynnott buttcoin)
Florida cops used IMSI catchers over 200 times without a warrant
Harris is the leading maker of [IMSI catchers aka "stingrays"] in the U.S., and the ACLU has long suspected that the company has been loaning the devices to police departments throughout the state for product testing and promotional purposes. As the court document notes in the 2008 case, “the Tallahassee Police Department is not the owner of the equipment.” The ACLU now suspects these police departments may have all signed non-disclosure agreements with the vendor and used the agreement to avoid disclosing their use of the equipment to courts. “The police seem to have interpreted the agreement to bar them even from revealing their use of Stingrays to judges, who we usually rely on to provide oversight of police investigations,” the ACLU writes.
(tags: aclu police stingrays imsi-catchers privacy cellphones mobile-phones security wired) [Less]
Posted 5 days ago
The following guest post appears on the SourceForge blog today. I'm personally very pleased to welcome SourceForge back to ApacheCon for another year.
The Apache Software Foundation is pleased to announce ApacheCon US ... [More] 2014, which we’re presenting in conjunction with the Linux Foundation. The conference will be held in Denver, Colorado, and features three days, ten tracks of content on more than 70 of the Apache Software Foundation’s Open Source projects, including Apache OpenOffice, Apache Hadoop, Apache Lucene, and many others.
We’re especially pleased to welcome SourceForge as a media partner for this event.
See http://na.apachecon.com/ for the full schedule, as well as the evening events, BOFs, Lightning Talks, and project summits.
Co-located with the event is the Cloudstack Collaboration Conference - http://events.linuxfoundation.org/events/cloudstack-collaboration-conference-north-america - the best place to learn about Apache CloudStack.
Apache OpenOffice - http://openoffice.apache.org/ - has an entire day of content, including both technical and community talks.
Hadoop, and its ecosystem of Big Data projects, has more than five full days of content (two tracks on two days, one track on the other).
Other projects, such as Cordova, Tomcat, and the Apache http server, have a fully day, or two, of content.
If you want to learn more about Apache Allura (Incubating), an Open Source software forge (and also the code that runs SourceForge) we’ll have two presentations about Allura, by two of the engineers who work on that code: Dave Brondsema and Wayne Witzel. Learn how to use Allura to develop your own projects, and join the community to make the platform even better.
This is the place to come if you rely on any of the projects of the Apache Software Foundation, and if you want to hang out with the men and women who develop them. We’ve been doing this event since 1998, and this promises to be the best one yet, with more content than we’ve ever presented before. [Less]
Posted 5 days ago
You may remember my first blog post describing how the Lucene developers eat our own dog food by using a Lucene search application to find our Jira issues.
That application has become a powerful showcase of a number of modern Lucene ... [More] features such as drill sidewaysand dynamic rangefaceting, a new suggester based on infix matches, postings highlighter, block-join queries so you can jump to a specific issue comment that matched your search, near-real-time indexing and searching, etc. Whenever new users ask me about Lucene's capabilities, I point them to this application so they can see for themselves.
Recently, I've made some further progress so I want to give an update.
The source code for the simple Netty-based Lucene server is now available on this subversion branch(see LUCENE-5376for details). I've been gradually adding coverage for additional Lucene modules, including facets, suggesters, analysis, queryparsers, highlighting, grouping, joins and expressions. And of course normal indexing and searching! Much remains to be done (there are plenty of nocommits), and the goal here is not to build a feature rich search server but rather to demonstrate how to use Lucene's current modules in a server context with minimal "thin server" additional source code.
Separately, to test this new Lucene based server, and to complete the "dog food," I built a simple Jira search application plugin, to help us find Jira issues, here. This application has various Python tools to extract and index Jira issues using Jira's REST API and a user-interface layer running as a Python WSGI app, to send requests to the server and render responses back to the user. The goal of this Jira search application is to make it simple to point it at any Jira instance / project and enable full searching over all issues.
I just pushed some further changes to the production site: I upgraded the Jira search application to the current server branch (previously it was running on my private fork).
I switched all analysis components to Lucene's analysis factories; these factories use Java's SPI (Service Provider Interface) so that the server has access to any char filters, tokenizers and token filters in the classpath. This is very helpful when building a server because it means you don't need any special code to handle the great many analysis components that Lucene provides these days. Everything simply passes through the factories (which know how to parse their own arguments).
I've added the Tika project, so you can now find Tika issues as well. This was very simple to add, and seems be working!
I inserted WordDelimiterFilter so that CamelCaseTokens are split. For example, try searching on infix and note the highlights. As Rober Muir reminded me, WordDelimiterFilter corrupts offsets, which will mess up highlighting in some cases, so I'm going to try to set up ICUTokenizer, which I'm already using, to do this splitting instead.
I switched to Lucene's new expressions module to do blended relevance + recency sort by default when you do a text search, which is helpful because most of the time we are looking for recently touched issues. Previously I used a custom FieldComparator to achieve the same functionality, but expressions is more compact and powerful and lets me remove that custom FieldComparator.
I switched to near-real-time building of the suggestions, using AnalyzingInfixSuggester. Previously I was fully rebuilding the suggester every five minutes, so this saves a lot of CPU since now I just add new Jira issues as they come, and refresh the suggester. It also means a much shorter delay from when an index is added to when it can be suggested. See LUCENE-5477 for details.
I now commit once per day. Previously I never committed, and simply relied on near-real-time searching. This works just fine, except when I need to bring the server down (e.g. to push new changes out), it required full reindexing, which was very fast but a poor user experience for those users who happened to do a search while it was happening. Now, when I bounce the server it comes back to the last commit and then the near-real-time indexing quickly catches up on any changed issues since that last commit.
Various small issues, such as proper handling when a Jira issue is renamed (the Jira REST API does not make it so easy to discover this!); better production push automation; upgraded to a newer version of bootstrap UI library.
There are still plenty of improvements to make to this Jira search application. For fields with many possible drill-down values, I'd like to have a simple suggester so the user can quickly drill down. I'd like to fix the suggester to filter suggestions according to the project. For example, if you've drilled down into Tika issues, then when you type a new search you should see only Tika issues suggested. For that we need to make AnalzyingInfixSuggester context aware. I'd also like a more compact UI for all of the facet fields; maybe I need to hide the less commonly used facet fields under a "More"...
Please send me any feedback / problems when you're searching for issues! [Less]
Posted 5 days ago by edwardyoon
The Hama team is pleased to announce the Hama 0.6.4 release.
Apache Hama is a pure BSP (Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph ... [More] and network algorithms.
This release improves memory usage by 3 times better than before (without significant performance degradation) and adds runtime message compression.
The artifacts are published and ready for you to download either from the Apache mirrors or from the Maven repository. We welcome your help, feedback, and suggestions. For more information on how to report problems, and to get involved, visit the project website and wiki.
3. http://wiki.apache.org/hama/ [Less]
Posted 5 days ago
I'm not sure what exactly to make of Dear Esther.
Is it an adventure game?
Many would say yes, and I think I'd agree. You are placed on an island, and your controls allow you to move around the island, to explore different areas ... [More] and visit different locations. You can look at items, zoom in on them, view them from different angles.
However, you can't actually interact with anything.
The closest you get to interacting is that, from time to time, the game issues a short speech, as if from an off-stage narrator.
The speeches, as you progress through the game, form a story; at least, they form the fragments of a story. It's never really clear whether the narrator of these speeches is you, or somebody who came before you. And it's never really clear whether the character being referred to in the 3rd person in these speeches is you, or whether the narrator is telling somebody else's story to you.
The story clearly takes place on the island, though.
So in that respect many would call Dear Esther a work of art, an interactive story, a computer-driven epic poem. And I think I'd agree with that, too. It's clearly a performance, and, as art, it certainly is well-crafted.
The story is a compelling one, too. It's a tragic story, one told many times before, but told quite well, both in words and in more subtle ways (e.g., via the placement of candles on the path, or the paintings on the walls of the cave). More than once I found myself overcome with emotion as the game delivered yet another glimpse into its private world.
Dear Esther also compels on a purely aesthetic level. The artwork is beautiful; the music is gorgeous. The voyage through the caves in the middle of the island is just stunning. It reminds me of the first time I played Myst, 20 years ago; during those days I was often content just to walk around the island and gaze at the scenery.
It's clear, I believe, that Dear Esther is trying to do more than "just" tell a story. The visuals and the monologues and the wandering around all work together, full of allegory and metaphor. There are biblical references: quotations from Acts are painted on objects in the game, as well as referenced in the speeches, and stories such as Lot's wife, or the pilgrimage to Damascus, are touched upon, as well as less literal, but still clear, biblical references:
I have run out of places to climb. I will abandon this body and take to the air. And there are other literary references, such as this fairly clear nod to John Donne's No Man Is An Island:
he would have realised he was his own shoreline, as am I. Just as I am becoming this island ... I played through Dear Esther rather quickly, thirstily, eager to push the story forward. Others, I am told, savor this journey more slowly, even re-play it to gain a different perspective. The verses delivered by the narrator are randomized, so you may find your second time through the game to be a quite different experience.
So what, then, is Dear Esther? Is it a meditation on the perils of addiction? Is it a transcendent expression of pure love? Is it a commentary on the impermanence and mortality of man as he strives to create meaning that will outlive him? Is it "simply" a work of art, simultaneously none and all of the above?
I'll close with this, my favorite speech, hoping it reveals both nothing and everything:
I’ve begun my voyage in a paper boat without a bottom; I will fly to the moon in it. I have been folded along a crease in time, a weakness in the sheet of life. Now, you’ve settled on the opposite side of the paper to me; I can see your traces in the ink that soaks through the fibre, the pulped vegetation. When we become waterlogged, and the cage disintegrates, we will intermingle. When this paper aeroplane leaves the cliff edge, and carves parallel vapour trails in the dark, we will come together. [Less]
Posted 5 days ago
The Netflix Dynamic Scripting Platform
At the core of the redesign is a Dynamic Scripting Platform which provides us the ability to inject code into a running Java application at any time. This means we can alter the behavior of the ... [More] application without a full scale deployment. As you can imagine, this powerful capability is useful in many scenarios. The API Server is one use case, and in this post, we describe how we use this platform to support a distributed development model at Netflix. Holy crap.
(tags: scripting dynamic-languages groovy java server-side architecture netflix)
ZooKeeper Resilience at Pinterest
essentially decoupling the client services from ZK using a local daemon on each client host; very similar to Airbnb’s Smartstack. This is a bit of an indictment of ZK’s usability though
(tags: ops architecture clustering network partitions cap reliability smartstack airbnb pinterest zookeeper) [Less]
Posted 5 days ago
I was at CloudExpo Europe in London last week for the Open Cloud Forum to give a tutorial on CloudStack tools. A decent crowd showed up, all carrying phones. Kind of problematic for a tutorial where I wanted the audience to install python packages ... [More] and actually work :) Luckily I made it self-paced so you can follow at home. Giles from Shapeblue was there too and he was part of a panel on Open Cloud. He was told once again "But Apache CloudStack is a Citrix project !" This in itself is a paradox and as @jzb told me on twitter yesterday "Citrix donated CloudStack to Apache, the end". Apache projects do not have any company affiliation.
I don't blame folks, with all the vendors seemingly supporting OpenStack, it does seem that CloudStack is a one supporter project. The commit stats are also pretty clear with 39% of commits coming from Citrix. This number is also probably higher since those stats are reporting gmail and apache as domain contributing 20 and 15% respectively, let's say 60% is from Citrix. But nonetheless, this is ignoring and mis-understanding what Apache is and looking at the glass half empty.
When Citrix donated CloudStack to the Apache Software Foundation (ASF) it relinquished control of the software and the brand. This actually put Citrix in a bind, not being able to easily promote the CloudStack project. Indeed, CloudStack is now a trademark of the ASF and Citrix had to rename their own product CloudPlatform (powered by Apache CloudStack). Citrix cannot promote CloudStack directly, it needs to get approval to donate sponsoring and follow the ASF trademark guidelines. Every committer and especially PMC members of Apache CloudStack are now supposed to work and protect the CloudStack brand as part of the ASF and make sure that any confusion is cleared. This is what I am doing here.
Of course when the software was donated, an initial set of committers was defined, all from Citrix and mostly from the former cloud.com startup. Part of the incubating process at the ASF is to make sure that we can add committers from other organization and attract a community. "Community over Code" is the bread and butter of ASF and so this is what we have all been working on, expanding the community outside Citrix, welcoming anyone who thinks CloudStack is interesting enough to contribute a little bit of time and effort. Looking at the glass half empty is saying that CloudStack is a Citrix project "Hey look 60% of their commits is from Citrix", looking at it half full like I do is saying "Oh wow, in a year since graduation, they have diversified the committer based, 40% are not from Citrix". Is 40% enough ? of course not, I wish it were the other way around, I wish Citrix were only a minority in the development of CloudStack.
Couple other numbers: Out of the 26 members of the project management committee (PMC) only seven are from Citrix and looking at mailing lists participation since the beginning of the year, 20% of the folks on the users mailing list and 25% on the developer list are from Citrix. We have diversified the community a great deal but the "hand-over", that moment when new community members are actually writing more code than the folks who started it, has not happened yet. A community is not just about writing code, but I will give it to you that it is not good for a single company to "control" 60% of the development, this is not where we/I want to be.
This whole discussion is actually against Apache's modus operandi. Since one of the biggest tenant of the foundation is non-affiliation. When I participate on the list I am Sebastien, I am not a Citrix employee. Certainly this can put some folks in conflicting situations at times, but the bottom line is that we do not and should not take into account company affiliation when working and making decisions for the project. But if you really want some company name dropping, let's do an ASF faux-pas and let's look at a few features:
The Nicira/NSX and OpenDaylight SDN integration was done by Schuberg Phillis, the OpenContrail plugin was done by Juniper, Midokura created it's own plugin for Midonet and Stratosphere as well, giving us a great SDN coverage. The LXC integration was done by Gilt, Klarna is contributing in the ecosystem with the vagrant and packer plugins, CloudOps has been doing terrific job with Chef recipes, Palo-Alto networks integration and Netscaler support, a google summer of code intern did a brand new LDAP plugin and another GSoC did the GRE support for KVM. RedHat contributed the Gluster plugin and PCExtreme contributed the Ceph interface while Basho of course contributed the S3 plugin for secondary storage as well as major design decisions on the storage refactor. The Solidfire plugin was done by, well Solidfire and Netapp has developed a plugin as well for their virtual storage console. NTT contributed the CloudFoundry interface via BOSH. On the user side, Shapeblue is leading the user support company. So no it's not just Citrix.
Are all these companies members of the CloudStack project ? No. There is no such thing as a company being a member of an ASF project. There is no company affiliation, there is no lock in, just a bunch of guys trying to make good software and build a community. And yes, I work for Citrix and my job here will be done when Citrix only contributes 49% of the commits. Citrix is paying me to make sure they loose control of the software, that a healthy ecosystem develops and that CloudStack keeps on becoming a strong and vibrant Apache project. I hope one day folks will understand what CloudStack has become, an ASF project, like HTTP, Hadoop, Mesos, Ant, Maven, Lucene, Solr and 150 other projects. Come to Denver for #apachecon you will see ! The end. [Less]
Posted 6 days ago
FOI is better than tea and biscuits
Good post on the ‘FOI costs too much’ talking point.
I realise if you’re a councillor, tea and biscuits sounds much more appealing than transparency and being held accountable and actually ... [More] having to answer to voters, but those things are what you signed up to when you stood for election.
(tags: foi open-data politics government funding) [Less]