This is an automatically generated IRC chat log made by the perl IRC logger bot from the Semantic Web Interest Group IRC chat at server irc.freenode.net channel #swig. Provided by Planet RDF.
See also the Semantic Web Interest GroupIRC Scratchpad for the collaboratively written weblog and ESW wiki.
Semantic Web Interest Group Logs > 2012 > 2012-02 > 2012-02-11 (Latest) (Search)
00:04:15 * kasei missed what it is that's streaming in this supposed impossible situation...
00:05:37 <drobilla> Well, e.g. for triple patterns you can just iterate over (a range of) the store, pumping out a result in steps. You don't have to build the whole result before returning anything at all.
00:05:42 <drobilla> I am not sure this is possible for graph patterns.
00:05:49 * rszeno something like processing 2T triples without keeping something in memory, i suppose
00:06:41 * drobilla cringes at /me abuse
00:10:29 <rszeno> i don't think one or another solution solve any problem, streaming is a solution for some problems
00:11:02 <drobilla> ? streaming is an attribute of a solution
00:12:05 <AndyS> Most patterns are possible - its more to do with var scoping. call-cc is your friend; or less lispy, iterators.
00:12:06 <rszeno> yes, :)
00:19:44 <drobilla> That's more to do with interface than implementation
00:20:14 <drobilla> I will probably just have to see what rasqal and others do. Mostly I was just trolling around to see if someone happened to have implemented it in a couple hundred lines of python or something :)
00:20:41 <rszeno> domain specific languages?
00:22:34 * dajobe looks up
00:22:56 <AndyS> Original RDQL BGP was <100 lines but that was long, long ago.
00:23:29 <dajobe> rasqal query engine could be optimized to stream simple 1 triple pattern BGPs (no joins) but that's not terrible exciting. just use grep
00:24:06 <drobilla> dajobe: well, yeah, store pretty much already does that anyway
00:24:35 <drobilla> AndyS: links/keywords?
00:25:39 <AndyS> currently code is maybe 200 lines in ARQ. Inc blank lines and {}
00:26:31 <AndyS> so about 10 in scala :-)
00:27:24 <AndyS> In ARQ : QueryIterBlockTriples + QueryIterTriplePattern for the basic stuff that works on any storage. Various others specific to storage but they are larger.
00:28:26 <AndyS> (wow - some of that code is old)
00:39:41 <drobilla> "Chain" of triple patterns. I guess in less Objectey terms this is essentially equivalent to a pile of nested foreach (pat), one for each triple pattern. Each iteration (possibly) fixes some vars.
00:40:11 <kasei> you can always trade memory for streaming. e.g. double piplined hash join.
00:40:12 <drobilla> Pretty simple, I guess I was looking for a "join" concept in the implementation that isn't necessary.
00:41:28 <kasei> but I'm not sure that makes sense in scenarios that don't involve contiuous queries or variable latency/throughput.
00:48:44 <drobilla> kasei: well, the main reason I seek a streaming solution if at all possible is to scale
00:49:05 <drobilla> kasei: not streaming = entire result set in memory at once = large and slow + can not deal with more matches than fit in memory
00:54:53 <kasei> yeah, understood
00:55:04 <kasei> the double pipeline hash join doesn't really solve that.
00:55:34 <kasei> single pipeline sort of does, in that you only have to keep one side of the join in memory
00:56:15 <kasei> but the problem still exists unless your BGP has all but one of the patterns being highly selective
01:00:36 <kasei> the only real alternative is to materialize subqueries to disk
01:02:24 <drobilla> I am trying to discern if the framing of the problem in terms of joins is inherent, or an artifact of relational database thinking
01:02:50 <kasei> relational thinking, I think
01:03:42 <drobilla> Yeah, I think so. I certainly do not share it, but I havn't actually implemented *graph* pattern matching, so who knows.
01:04:20 <kasei> i think a lot of that might stem from the data in most systems being stored in the same was a a relational system (trees, tables, etc.)
01:04:49 <kasei> to properly benefit from some graph algorithm approaches, I think you might need physical storage that aids in graph traversal.
01:05:05 <kasei> that is, thinking in terms of triples reinforces the relational view
01:05:29 <drobilla> I do store triples in trees, and think in terms of them
01:07:07 <drobilla> I don't know. This is mostly idle musing of me in pursuit of the most small and KISS solution possible
01:09:00 <drobilla> (if I can't do it in a few KLOC of dependency-free C, I probably won't do it at all. if you want a big fully featured implementation, use redland)
01:10:01 <kasei> what's the context of this conversation? serd-related? (I missed the beginning...)
01:15:23 <drobilla> yeah. well, my store is sord, but I now plan to merge all these things into one project
01:15:56 <drobilla> originally sord was separate because it had dependencies, but now it doesn't (actually serd is larger)
01:16:36 <drobilla> sort of a turtle/n3 based rdf sqlite. the spread of sql because of the niceties of that implementation doesn't sit right with me :)
01:18:29 <drobilla> (but considerably more 'lite'; people working on using my stuff for phones and such, a 600k library does not fly there)
01:20:14 <drobilla> also, great need for a validator in that domain, which I would much rather implement with n3/cwm esque rules (which requires BGP matching) than banging out a bunch of special purpose C code
01:24:26 <rszeno> validate?
01:37:45 <drobilla> rszeno: use of nonexistent properties, missing required properties, etc
01:38:25 <rszeno> aha,
01:39:43 <rszeno> i have problem doing this with n3 and cwm, :)
01:40:17 <drobilla> Sort of ties in with the quads++ provenance discussion earlier actually, to be really useful such a tool would need file, line (and perhaps column) information for each statement
01:40:43 <drobilla> rszeno: yeah, I never really did get it to work well for that either
01:44:06 <drobilla> unfortunately, mountains of more urgent things to do first. this is my favourite sphere of things to work on
01:44:21 <drobilla> aaaaaaalllll of which must be done before next September Ph.D time :)
02:11:50 <rszeno> drobilla, take a look to essays.n3 from http://www.w3.org/wiki/CwmTips
02:12:51 <rszeno> i didn't try with log:includes, log:notIncludes in my case
02:14:39 <rszeno> in fact i solved my problem but i don't like the soultion, :)
02:40:27 <supermoose_> Hi, I'm binding c++ code to lua using swig, and its really amazing. Can someone point to me how to pass an existing object to lua?
02:42:40 <rszeno> supermoose_ see topic, :)
02:44:52 <supermoose_> hehe wrong place?
02:45:19 <rszeno> yes, :)
02:46:57 <supermoose_> hehe sorry about that. Good night all.
02:47:42 <rszeno> np, good night
04:48:01 <Rich_Morin> I'm getting timeouts for a query on dbpedia.org/snorql - help? http://pastie.org/3358955
07:25:42 <lheuer1> lheuer1 is now known as lheuer
07:58:36 <lheuer1> lheuer1 is now known as lheuer
08:12:26 <libby_> libby_ is now known as libby
08:35:36 <danbri_> danbri_ is now known as danbri
10:36:11 <mhausenblas> moin moin
10:36:14 <mhausenblas>http://data-economy.com/a-baseline-in-early-2012
10:36:15 <dc_swig_> A: http://data-economy.com/a-baseline-in-early-2012 from mhausenblas
10:36:41 <mhausenblas> A:| A baseline for the data economy in early 2012
10:36:42 <dc_swig_> Titled item A.
The IRC chat here was automatically logged without editing and contains content written by the chat participants identified by their IRC nick. No other identity is recorded.
Alternate versions:
and
Text
Provided by Dave Beckett as part of Planet RDF