Semantic Web Interest Group IRC Chat Logs for 2012-02-11

This is an automatically generated IRC chat log made by the perl IRC logger bot from the Semantic Web Interest Group IRC chat at server irc.freenode.net channel #swig. Provided by Planet RDF.

See also the Semantic Web Interest GroupIRC Scratchpad for the collaboratively written weblog and ESW wiki.


Semantic Web Interest Group Logs > 2012 > 2012-02 > 2012-02-11 (Latest) (Search)

00:04:15 * kasei missed what it is that's streaming in this supposed impossible situation...

00:05:37 <drobilla> Well, e.g. for triple patterns you can just iterate over (a range of) the store, pumping out a result in steps. You don't have to build the whole result before returning anything at all.

00:05:42 <drobilla> I am not sure this is possible for graph patterns.

00:05:49 * rszeno something like processing 2T triples without keeping something in memory, i suppose

00:06:41 * drobilla cringes at /me abuse

00:10:29 <rszeno> i don't think one or another solution solve any problem, streaming is a solution for some problems

00:11:02 <drobilla> ? streaming is an attribute of a solution

00:12:05 <AndyS> Most patterns are possible - its more to do with var scoping. call-cc is your friend; or less lispy, iterators.

00:12:06 <rszeno> yes, :)

00:19:44 <drobilla> That's more to do with interface than implementation

00:20:14 <drobilla> I will probably just have to see what rasqal and others do. Mostly I was just trolling around to see if someone happened to have implemented it in a couple hundred lines of python or something :)

00:20:41 <rszeno> domain specific languages?

00:22:34 * dajobe looks up

00:22:56 <AndyS> Original RDQL BGP was <100 lines but that was long, long ago.

00:23:29 <dajobe> rasqal query engine could be optimized to stream simple 1 triple pattern BGPs (no joins) but that's not terrible exciting. just use grep

00:24:06 <drobilla> dajobe: well, yeah, store pretty much already does that anyway

00:24:35 <drobilla> AndyS: links/keywords?

00:25:39 <AndyS> currently code is maybe 200 lines in ARQ. Inc blank lines and {}

00:26:31 <AndyS> so about 10 in scala :-)

00:27:24 <AndyS> In ARQ : QueryIterBlockTriples + QueryIterTriplePattern for the basic stuff that works on any storage. Various others specific to storage but they are larger.

00:28:26 <AndyS> (wow - some of that code is old)

00:39:41 <drobilla> "Chain" of triple patterns. I guess in less Objectey terms this is essentially equivalent to a pile of nested foreach (pat), one for each triple pattern. Each iteration (possibly) fixes some vars.

00:40:11 <kasei> you can always trade memory for streaming. e.g. double piplined hash join.

00:40:12 <drobilla> Pretty simple, I guess I was looking for a "join" concept in the implementation that isn't necessary.

00:41:28 <kasei> but I'm not sure that makes sense in scenarios that don't involve contiuous queries or variable latency/throughput.

00:48:44 <drobilla> kasei: well, the main reason I seek a streaming solution if at all possible is to scale

00:49:05 <drobilla> kasei: not streaming = entire result set in memory at once = large and slow + can not deal with more matches than fit in memory

00:54:53 <kasei> yeah, understood

00:55:04 <kasei> the double pipeline hash join doesn't really solve that.

00:55:34 <kasei> single pipeline sort of does, in that you only have to keep one side of the join in memory

00:56:15 <kasei> but the problem still exists unless your BGP has all but one of the patterns being highly selective

01:00:36 <kasei> the only real alternative is to materialize subqueries to disk

01:02:24 <drobilla> I am trying to discern if the framing of the problem in terms of joins is inherent, or an artifact of relational database thinking

01:02:50 <kasei> relational thinking, I think

01:03:42 <drobilla> Yeah, I think so. I certainly do not share it, but I havn't actually implemented *graph* pattern matching, so who knows.

01:04:20 <kasei> i think a lot of that might stem from the data in most systems being stored in the same was a a relational system (trees, tables, etc.)

01:04:49 <kasei> to properly benefit from some graph algorithm approaches, I think you might need physical storage that aids in graph traversal.

01:05:05 <kasei> that is, thinking in terms of triples reinforces the relational view

01:05:29 <drobilla> I do store triples in trees, and think in terms of them

01:07:07 <drobilla> I don't know. This is mostly idle musing of me in pursuit of the most small and KISS solution possible

01:09:00 <drobilla> (if I can't do it in a few KLOC of dependency-free C, I probably won't do it at all. if you want a big fully featured implementation, use redland)

01:10:01 <kasei> what's the context of this conversation? serd-related? (I missed the beginning...)

01:15:23 <drobilla> yeah. well, my store is sord, but I now plan to merge all these things into one project

01:15:56 <drobilla> originally sord was separate because it had dependencies, but now it doesn't (actually serd is larger)

01:16:36 <drobilla> sort of a turtle/n3 based rdf sqlite. the spread of sql because of the niceties of that implementation doesn't sit right with me :)

01:18:29 <drobilla> (but considerably more 'lite'; people working on using my stuff for phones and such, a 600k library does not fly there)

01:20:14 <drobilla> also, great need for a validator in that domain, which I would much rather implement with n3/cwm esque rules (which requires BGP matching) than banging out a bunch of special purpose C code

01:24:26 <rszeno> validate?

01:37:45 <drobilla> rszeno: use of nonexistent properties, missing required properties, etc

01:38:25 <rszeno> aha,

01:39:43 <rszeno> i have problem doing this with n3 and cwm, :)

01:40:17 <drobilla> Sort of ties in with the quads++ provenance discussion earlier actually, to be really useful such a tool would need file, line (and perhaps column) information for each statement

01:40:43 <drobilla> rszeno: yeah, I never really did get it to work well for that either

01:44:06 <drobilla> unfortunately, mountains of more urgent things to do first. this is my favourite sphere of things to work on

01:44:21 <drobilla> aaaaaaalllll of which must be done before next September Ph.D time :)

02:11:50 <rszeno> drobilla, take a look to essays.n3 from http://www.w3.org/wiki/CwmTips

02:12:51 <rszeno> i didn't try with log:includes, log:notIncludes in my case

02:14:39 <rszeno> in fact i solved my problem but i don't like the soultion, :)

02:40:27 <supermoose_> Hi, I'm binding c++ code to lua using swig, and its really amazing. Can someone point to me how to pass an existing object to lua?

02:42:40 <rszeno> supermoose_ see topic, :)

02:44:52 <supermoose_> hehe wrong place?

02:45:19 <rszeno> yes, :)

02:46:57 <supermoose_> hehe sorry about that. Good night all.

02:47:42 <rszeno> np, good night

04:48:01 <Rich_Morin> I'm getting timeouts for a query on dbpedia.org/snorql - help? http://pastie.org/3358955

07:25:42 <lheuer1> lheuer1 is now known as lheuer

07:58:36 <lheuer1> lheuer1 is now known as lheuer

08:12:26 <libby_> libby_ is now known as libby

08:35:36 <danbri_> danbri_ is now known as danbri

10:36:11 <mhausenblas> moin moin

10:36:14 <mhausenblas>http://data-economy.com/a-baseline-in-early-2012

10:36:15 <dc_swig_> A: http://data-economy.com/a-baseline-in-early-2012 from mhausenblas

10:36:41 <mhausenblas> A:| A baseline for the data economy in early 2012

10:36:42 <dc_swig_> Titled item A.


The IRC chat here was automatically logged without editing and contains content written by the chat participants identified by their IRC nick. No other identity is recorded.

Alternate versions: RDF Resource Description Framework Metadata and Text

Provided by Dave Beckett as part of Planet RDF