-
Notifications
You must be signed in to change notification settings - Fork 109
/
TODO
97 lines (71 loc) · 4.84 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
-- if run out file descriptors in mogilefsd, mogilefsd shouldn't crash. it might
now. needs a test.
-- change debug level at runtime from mgmt port, propagate to children.
-- MogileFS::Device ->make_directory on lighttpd isn't exactly right, as WebDAV
spec says MKCOL on an existing directory should return 405 (Method Not Allowed),
which I think is the same as a server without WebDAV enabled... we need to distinguish
between those two cases. (perhaps in Monitor job?)
405 (Method Not Allowed) - MKCOL can only be executed on a deleted/non-existent resource.
-- mogdbsetup should use /etc/mogilefs/mogilefsd.conf for upgrade dsn info
-- fix 5/second in mogadm fsck status. hard-coded to 5. whoops.
-- in MogileFS::Device,
+ # FIXME: don't use local machine's time() for this. time sync
+ # issues! instead, the monitor process should track this,
+ # noting the difference in relative time between the server's
+ # time (in Date: response header) and time in the usage.txt
+ # file.
-- make mogilefsd trackers speak FUSE, so we could mount all of mogilefs using, say:
http://noedler.de/projekte/wdfs/index.html
things like paths could be exposed as extended attributes, or as pseudo files:
cat /mnt/mogile/<domain>/paths/<key>
cat /mnt/mogile/<domain>/contents/<key>
-- if create_open before monitor runs yet, block and wait for a round? better than 'no_devices'
might need some marker for 'end of monitoring'? or just wait until we have 1 or 3?
or if they asked for 3, only return 1 in that rare case, preferring latency to redundancy.
plus, we'd know that 1 is writable in last few seconds!
-- update the 'repl' command for new file_to_replicate table
-- replication policy error storm when a device is known to be observably down:
[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197214
[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197237
[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197219
[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197256
[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197226
-- telnet to 7001 and send "!"<enter> will crash.
-- fsck job/command. need 'last fsck' table per fid? or column?
-- fix haphazard CapitalStyle vs capital_style in ProcManager for class methods
-- 'every' func should select on psock, to process parent-sent commands
during worker's breaks
-- create close could wake a replicate process.
-- optional 'wait_until_replicated=1' flag to create close, so client doesn't
get success until file is everywhere.
-- redo/reevaluate the 'unreachable_fids' logic: unreachable should only mean
host/device are up, but file is 404.
-- test database failures
-- identify idempotent commands and replay them 'n' times if query worker dies
during processing.
-- have queries workers be able to broadcast back up to parent "can't parse this"
at which point parent parses it (e.g. "help" command), so admins don't
need to remember the "!" prefix. of course, "!" prefix can always be used to
reach parent faster.
-- mb_asof handling in find_deviceid seems broken. less than max age? wrong units.
-- make generic script to write out usage files for people not using mogstored
-- or, let mogstored be run in 'usage' file writing only mode
-- wake up deleter process? totally overkill, but why not?
* 404 storms during replicating: (1.5 year old email, might be fixed, verify)
:: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404
:: [replicate(12648)] Copier failed replicating 15693821
:: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404
:: [replicate(12648)] Copier failed replicating 15693819
:: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404
:: [replicate(12648)] Copier failed replicating 15693844
:: [replicate(12646)] Copier failed replicating 15693846
:: [replicate(12646)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404
:: [replicate(12646)] Copier failed replicating 15693821
:: [replicate(12646)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404
:: [replicate(12646)] Copier failed replicating 15693844
:: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev3/0/015/693/0015693848.fid failed: HTTP 404
:: [replicate(12650)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404
:: [replicate(12648)] Copier failed replicating 15693848
......
-- fsck for case where row from file_to_replicate(fid,fromdevid) does not exist
in file_on(fid,devid). This is a byproduct of a failed inject.